Query¶
Examples¶
For all string parameters you can use % as wildcard (please check the documentation below). All methods
have a parameter limit which allows to limit the number of results and as_df which allows to return
a pandas.DataFrame.
Initialize query object
import pyuniprot
pyuniprot.update(taxids=[9606,10090,10116]) # human, mouse, rat update
query = pyuniprot.query()
Methods by examples¶
search for ...
- human proteins with gene name ‘TP53’ (taxid=9606)
>>> query.entry(gene_name='TP53', taxid=9606) [Cellular tumor antigen p53]
- human proteins with recommended full name starts with ‘Myeloid cell surface’ (use % at the end)
>>> query.entry(recommended_full_name='Myeloid cell surface%', taxid=9606) [Myeloid cell surface antigen CD33]
find all UniProt entries where the recommended full name contains ‘CD33’ (% at the start and end of search term) and return as pandas.DataFrame
>>> results = query.entry(name='%CD33%', taxid=9606, as_df=True)
# get first 2 lines of results with columns 'name','recommended_full_name', 'taxid'
>>> my_results_as_data_frame.ix[:2,('name','recommended_full_name', 'taxid')]
name recommended_full_name taxid
0 CD33_HUMAN Myeloid cell surface antigen CD33 9606
1 CCD33_HUMAN Coiled-coil domain-containing protein 33 9606
find entries by a list of gene names
>>> query.entry(name=('TREM2_HUMAN', 'CD33_HUMAN'))
[Myeloid cell surface antigen CD33, Triggering receptor expressed on myeloid cells 2]
If an attribute ends of an s it a clear hint that this is an 1:n or n:m relationship like keywords. There could be several proteins linked to a keyword, but also several keywords are linked to one protein. Next lines of code shows how to query for all proteins linked to the keyword ‘Neurodegenaration’ and returns the gene names.
>>> results = query.entry(keywords='Neurodegeneration')
>>> len(results) # number of results
322
>>> [x.gene_name for x in results][:3] # show only the first 2 gene names
['CHMP1A', 'CLN3', 'COQ8A']
Every element in the list represents a pyuniprot.manager.models.Entry instance:
>>> first_protein = results[0] # fetch first result
>>> type(first_protein)
pyuniprot.manager.models.Entry
>>> first_protein
Charged multivesicular body protein 1a
# get first 3 of all other keywords to this protein
>>> first_protein.keywords[:3]
[Reference proteome:KW-1185, Coiled coil:KW-0175, Repressor:KW-0678]
Properties¶
q.gene_forms
q.interaction_actions
q.actions
q.pathways
Query Manager Reference¶
-
class
pyuniprot.manager.query.QueryManager(connection=None, echo=False)[source]¶ Query interface to database.
-
accession(accession=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.AccessionParameters: - accession (str) – UniProt Accession number
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Accessionobjects orpandas.DataFrame
-
alternative_full_name(name=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.AlternativeFullNameParameters: - name (str) – alternative full name
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.AlternativeFullNameobjects orpandas.DataFrame
-
alternative_short_name(name=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.AlternativeShortlNameParameters: - name (str) – alternative short name
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.AlternativeShortNameobjects orpandas.DataFrame
-
datasets¶ Distinct datasets (
dataset) inpyuniprot.manager.models.EntryDistinct datasets are SwissProt or/and TrEMBL
Returns: all distinct dataset types Return type: [str,]
-
db_reference(type_=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.DbReferenceCheck list of available databases with on
dbreference_typesSee also
Parameters: - type – type (or name) of database
- identifier – unique identifier in database
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DbReferenceobjects orpandas.DataFrameLinks
-
dbreference_types¶ Distinct database reference types (
type_) inpyuniprot.manager.models.DbReferenceReturns: List of strings for all available database cross reference types used in model DbReference Return type: [str,]
-
disease(identifier=None, ref_id=None, ref_type=None, name=None, acronym=None, description=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.DiseaseSee also
Parameters: - identifier – disease UniProt identifier
- ref_id – identifier of referenced database
- ref_type – database name
- name – disease name
- acronym – disease acronym
- description – disease description
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Diseaseobjects orpandas.DataFrame
-
disease_comment(comment=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.models.DiseaseCommentParameters: - comment – Comment to disease
- entry_name (str) – name in
models.Entry - limit (int,tuple) – Number of results, if limit=`None`, all results returned
- as_df (bool) – If True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DiseaseCommentobjects orpandas.DataFrame
-
diseases¶ Distinct diseases (
nameinpyuniprot.manager.models.Disease)Returns: all distinct disease names Return type: [str,]
-
ec_number(ec_number=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.ECNumberSee also
Parameters: - ec_number – Enzyme Commission number
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.ECNumberobjects orpandas.DataFrame
-
entry(name=None, dataset=None, recommended_full_name=None, recommended_short_name=None, gene_name=None, taxid=None, accession=None, organism_host=None, feature_type=None, function_=None, ec_number=None, db_reference=None, alternative_name=None, disease_comment=None, disease_name=None, tissue_specificity=None, pmid=None, keyword=None, subcellular_location=None, tissue_in_reference=None, sequence=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.EntryAn entry is the root element in UniProt datasets. Everything is linked to entry and can be accessed from :param dataset:
models.Entryobject. % can be used as wildcard for string parameters (see examples below).See also
Parameters: - name (str,tuple) – UniProt entry name(s)
- recommended_full_name (str,tuple) – recommended full protein name(s)
- recommended_short_name (str,tuple) – recommended short protein name(s)
- tissue_in_reference (str,tuple) – tissue mentioned in reference
- subcellular_location (str,tuple) – subcellular location(s)
- keyword (str,tuple) – keyword
- pmid (str,tuple) – PubMed identifier
- tissue_specificity (str,tuple) – tissue specificities
- disease_comment (str,tuple) – disease_comments
- alternative_name (str,tuple) –
- db_reference (str,tuple) – cross reference identifier
- ec_number (str,tuple) – enzyme classification number, e.g. 1.1.1.1
- function (str,tuple) – description of protein functions
- feature_type (str,tuple) – feature types
- organism_host (str,tuple) – organism hosts
- accession (str,tuple) – UniProt accession number
- disease_name (str,tuple) – disease name
- gene_name (str,tuple) – gene name
- taxid (str,tuple) – NCBI taxonomy identifier
- limit (int,tuple) – maximum number of results
- sequence (str,tuple) – Amino acid sequence
- as_df (bool) – if set to True result returns as pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Entryobjects orpandas.DataFrame
-
feature(type_=None, identifier=None, description=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.FeatureCheck available features types with
pyuniprot.query().feature_typesSee also
Parameters: - type – type of feature
- identifier – feature identifier
- description – description of feature
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Featureobjects orpandas.DataFrame
-
feature_types¶ Distinct types (
type_) inpyuniprot.manager.models.FeatureReturns: all distinct feature types Return type: [str,]
-
function(text=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.FunctionSee also
Parameters: - text – description of function
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Functionobjects orpandas.DataFrame
-
keyword(name=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.PmidSee also
Parameters: - name (str) – keyword name
- identifier (str) – keyword identifier
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Keywordobjects orpandas.DataFrame
-
keywords¶ Distinct keywords (
nameinpyuniprot.manager.models.Keyword)Returns: all distinct keywords Return type: [str,]
-
organism_host(taxid=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.OrganismHostParameters: - taxid – NCBI taxonomy identifier
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.OrganismHosttobjects orpandas.DataFrame
-
other_gene_name(type_=None, name=None, entry_name=None, limit=None, as_df=None)[source]¶ Method to query
pyuniprot.manager.OtherGeneNameParameters: - type (str) – type of gene name e.g. synonym
- name (str) – other gene name
- entry_name (str) – name in
models.Entry - limit (int,tuple) – Number of results, if limit=`None`, all results returned
- as_df (bool) – If True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.DiseaseCommentobjects orpandas.DataFrame
-
pmid(pmid=None, entry_name=None, first=None, last=None, volume=None, name=None, date=None, title=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.PmidSee also
Parameters: - pmid (int) – PubMed identifier
- entry_name (str) – name in
models.Entry - first – first page
- last – last page
- volume – volume
- name – name of journal
- date – publication date
- title – title of publication
- limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.Pmidobjects orpandas.DataFrame
-
sequence(sequence=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.SequenceSee also
Parameters: - sequence – AA sequence
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.SubcellularLocationobjects orpandas.DataFrame
-
subcellular_location(location=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.SubcellularLocationParameters: - location – subcellular location
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.SubcellularLocationobjects orpandas.DataFrame
-
subcellular_locations¶ Distinct subcellular locations (
locationinpyuniprot.manager.models.SubcellularLocation)Returns: all distinct subcellular locations Return type: [str,]
-
taxids¶ Distinct NCBI taxonomy identifiers (
taxid) inpyuniprot.manager.models.EntryReturns: NCBI taxonomy identifiers Return type: [int,]
-
tissue_in_reference(tissue=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.TissueInReferenceParameters: - tissue (str) – tissue linked to reference
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if limit=`None`, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
models.TissueInReferenceobjects orpandas.DataFrameReturn type:
-
tissue_specificity(comment=None, entry_name=None, limit=None, as_df=False)[source]¶ Method to query
pyuniprot.manager.TissueSpecificityProvides information on the expression of a gene at the mRNA or protein level in cells or in tissues of multicellular organisms. By default, the information is derived from experiments at the mRNA level, unless specified ‘at protein level
Parameters: - comment (str) – Comment describing tissue specificity
- entry_name (str) – name in
models.Entry - limit (int,tuple) – number of results, if None, all results returned
- as_df (bool) – if True results are returned as
pandas.DataFrame
Returns: list of
pyuniprot.manager.models.TissueSpecificityobjects orpandas.DataFrame
-
tissues_in_references¶ Distinct tissues (
tissueinpyuniprot.manager.models.TissueInReference)Returns: all distinct tissues in references Return type: [str,]
-