Query¶
Examples¶
For all string parameters you can use % as wildcard (please check the documentation below). All methods
have a parameter limit
which allows to limit the number of results and as_df
which allows to return
a pandas.DataFrame.
Initialize query object
import pyuniprot
pyuniprot.update(taxids=[9606,10090,10116]) # human, mouse, rat update
query = pyuniprot.query()
Methods by examples¶
search for ...
- human proteins with gene name ‘TP53’ (taxid=9606)
>>> query.entry(gene_name='TP53', taxid=9606) [Cellular tumor antigen p53]
- human proteins with recommended full name starts with ‘Myeloid cell surface’ (use % at the end)
>>> query.entry(recommended_full_name='Myeloid cell surface%', taxid=9606) [Myeloid cell surface antigen CD33]
find all UniProt entries where the recommended full name contains ‘CD33’ (% at the start and end of search term) and return as pandas.DataFrame
>>> results = query.entry(name='%CD33%', taxid=9606, as_df=True)
# get first 2 lines of results with columns 'name','recommended_full_name', 'taxid'
>>> my_results_as_data_frame.ix[:2,('name','recommended_full_name', 'taxid')]
name recommended_full_name taxid
0 CD33_HUMAN Myeloid cell surface antigen CD33 9606
1 CCD33_HUMAN Coiled-coil domain-containing protein 33 9606
find entries by a list of gene names
>>> query.entry(name=('TREM2_HUMAN', 'CD33_HUMAN'))
[Myeloid cell surface antigen CD33, Triggering receptor expressed on myeloid cells 2]
If an attribute ends of an s it a clear hint that this is an 1:n or n:m relationship like keywords. There could be several proteins linked to a keyword, but also several keywords are linked to one protein. Next lines of code shows how to query for all proteins linked to the keyword ‘Neurodegenaration’ and returns the gene names.
>>> results = query.entry(keywords='Neurodegeneration')
>>> len(results) # number of results
322
>>> [x.gene_name for x in results][:3] # show only the first 2 gene names
['CHMP1A', 'CLN3', 'COQ8A']
Every element in the list represents a pyuniprot.manager.models.Entry instance:
>>> first_protein = results[0] # fetch first result
>>> type(first_protein)
pyuniprot.manager.models.Entry
>>> first_protein
Charged multivesicular body protein 1a
# get first 3 of all other keywords to this protein
>>> first_protein.keywords[:3]
[Reference proteome:KW-1185, Coiled coil:KW-0175, Repressor:KW-0678]
Properties¶
q.gene_forms
q.interaction_actions
q.actions
q.pathways
Query Manager Reference¶
-
class
pyuniprot.manager.query.
QueryManager
(connection=None, echo=False)[source]¶ Query interface to database.
-
accession
(accession=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Accession
Parameters: - accession (str) – UniProt Accession number
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Accession
objects orpandas.DataFrame
-
alternative_full_name
(name=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.AlternativeFullName
Parameters: - name (str) – alternative full name
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.AlternativeFullName
objects orpandas.DataFrame
-
alternative_short_name
(name=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.AlternativeShortlName
Parameters: - name (str) – alternative short name
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.AlternativeShortName
objects orpandas.DataFrame
-
datasets
¶ Distinct datasets (
dataset
) inpyuniprot.manager.models.Entry
Distinct datasets are SwissProt or/and TrEMBL
Returns: all distinct dataset types Return type: [str,]
-
db_reference
(type_=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.DbReference
Check list of available databases with on
dbreference_types
See also
Parameters: - type – type (or name) of database
- identifier – unique identifier in database
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DbReference
objects orpandas.DataFrame
Links
-
dbreference_types
¶ Distinct database reference types (
type_
) inpyuniprot.manager.models.DbReference
Returns: List of strings for all available database cross reference types used in model DbReference Return type: [str,]
-
disease
(identifier=None, ref_id=None, ref_type=None, name=None, acronym=None, description=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.Disease
See also
Parameters: - identifier – disease UniProt identifier
- ref_id – identifier of referenced database
- ref_type – database name
- name – disease name
- acronym – disease acronym
- description – disease description
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Disease
objects orpandas.DataFrame
-
disease_comment
(comment=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.DiseaseComment
Parameters: - comment – Comment to disease
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – Number of results, if limit=`None`, all results returned
- as_df (bool) – If True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DiseaseComment
objects orpandas.DataFrame
-
diseases
¶ Distinct diseases (
name
inpyuniprot.manager.models.Disease
)Returns: all distinct disease names Return type: [str,]
-
ec_number
(ec_number=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.ECNumber
See also
Parameters: - ec_number – Enzyme Commission number
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.ECNumber
objects orpandas.DataFrame
-
entry
(name=None, dataset=None, recommended_full_name=None, recommended_short_name=None, gene_name=None, taxid=None, accession=None, organism_host=None, feature_type=None, function_=None, ec_number=None, db_reference=None, alternative_name=None, disease_comment=None, disease_name=None, tissue_specificity=None, pmid=None, keyword=None, subcellular_location=None, tissue_in_reference=None, sequence=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Entry
An entry is the root element in UniProt datasets. Everything is linked to entry and can be accessed from :param dataset:
models.Entry
object. % can be used as wildcard for string parameters (see examples below).See also
Parameters: - name (str,tuple) – UniProt entry name(s)
- recommended_full_name (str,tuple) – recommended full protein name(s)
- recommended_short_name (str,tuple) – recommended short protein name(s)
- tissue_in_reference (str,tuple) – tissue mentioned in reference
- subcellular_location (str,tuple) – subcellular location(s)
- keyword (str,tuple) – keyword
- pmid (str,tuple) – PubMed identifier
- tissue_specificity (str,tuple) – tissue specificities
- disease_comment (str,tuple) – disease_comments
- alternative_name (str,tuple) –
- db_reference (str,tuple) – cross reference identifier
- ec_number (str,tuple) – enzyme classification number, e.g. 1.1.1.1
- function (str,tuple) – description of protein functions
- feature_type (str,tuple) – feature types
- organism_host (str,tuple) – organism hosts
- accession (str,tuple) – UniProt accession number
- disease_name (str,tuple) – disease name
- gene_name (str,tuple) – gene name
- taxid (str,tuple) – NCBI taxonomy identifier
- limit (int,tuple) – maximum number of results
- sequence (str,tuple) – Amino acid sequence
- as_df (bool) – if set to True result returns as pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Entry
objects orpandas.DataFrame
-
feature
(type_=None, identifier=None, description=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Feature
Check available features types with
pyuniprot.query().feature_types
See also
Parameters: - type – type of feature
- identifier – feature identifier
- description – description of feature
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Feature
objects orpandas.DataFrame
-
feature_types
¶ Distinct types (
type_
) inpyuniprot.manager.models.Feature
Returns: all distinct feature types Return type: [str,]
-
function
(text=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Function
See also
Parameters: - text – description of function
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Function
objects orpandas.DataFrame
-
keyword
(name=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Pmid
See also
Parameters: - name (str) – keyword name
- identifier (str) – keyword identifier
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Keyword
objects orpandas.DataFrame
-
keywords
¶ Distinct keywords (
name
inpyuniprot.manager.models.Keyword
)Returns: all distinct keywords Return type: [str,]
-
organism_host
(taxid=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.OrganismHost
Parameters: - taxid – NCBI taxonomy identifier
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.OrganismHostt
objects orpandas.DataFrame
-
other_gene_name
(type_=None, name=None, entry_name=None, limit=None, as_df=None)[source]¶ Method to query
pyuniprot.manager.OtherGeneName
Parameters: - type (str) – type of gene name e.g. synonym
- name (str) – other gene name
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – Number of results, if limit=`None`, all results returned
- as_df (bool) – If True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DiseaseComment
objects orpandas.DataFrame
-
pmid
(pmid=None, entry_name=None, first=None, last=None, volume=None, name=None, date=None, title=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Pmid
See also
Parameters: - pmid (int) – PubMed identifier
- entry_name (str) – name in
models.Entry
- first – first page
- last – last page
- volume – volume
- name – name of journal
- date – publication date
- title – title of publication
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Pmid
objects orpandas.DataFrame
-
sequence
(sequence=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.Sequence
See also
Parameters: - sequence – AA sequence
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.SubcellularLocation
objects orpandas.DataFrame
-
subcellular_location
(location=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.SubcellularLocation
Parameters: - location – subcellular location
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.SubcellularLocation
objects orpandas.DataFrame
-
subcellular_locations
¶ Distinct subcellular locations (
location
inpyuniprot.manager.models.SubcellularLocation
)Returns: all distinct subcellular locations Return type: [str,]
-
taxids
¶ Distinct NCBI taxonomy identifiers (
taxid
) inpyuniprot.manager.models.Entry
Returns: NCBI taxonomy identifiers Return type: [int,]
-
tissue_in_reference
(tissue=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.TissueInReference
Parameters: - tissue (str) – tissue linked to reference
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
models.TissueInReference
objects orpandas.DataFrame
Return type:
-
tissue_specificity
(comment=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.TissueSpecificity
Provides information on the expression of a gene at the mRNA or protein level in cells or in tissues of multicellular organisms. By default, the information is derived from experiments at the mRNA level, unless specified ‘at protein level
Parameters: - comment (str) – Comment describing tissue specificity
- entry_name (str) – name in
models.Entry
- limit (int,tuple) – number of results, if None, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.TissueSpecificity
objects orpandas.DataFrame
-
tissues_in_references
¶ Distinct tissues (
tissue
inpyuniprot.manager.models.TissueInReference
)Returns: all distinct tissues in references Return type: [str,]
-