Query

Examples

For all string parameters you can use % as wildcard (please check the documentation below). All methods have a parameter limit which allows to limit the number of results and as_df which allows to return a pandas.DataFrame.

Initialize query object

import pyuniprot
pyuniprot.update(taxids=[9606,10090,10116]) # human, mouse, rat update
query = pyuniprot.query()

Methods by examples

search for ...

human proteins with gene name ‘TP53’ (taxid=9606)
>>> query.entry(gene_name='TP53', taxid=9606)
[Cellular tumor antigen p53]
human proteins with recommended full name starts with ‘Myeloid cell surface’ (use % at the end)
>>> query.entry(recommended_full_name='Myeloid cell surface%', taxid=9606)
[Myeloid cell surface antigen CD33]

find all UniProt entries where the recommended full name contains ‘CD33’ (% at the start and end of search term) and return as pandas.DataFrame

>>> results = query.entry(name='%CD33%', taxid=9606, as_df=True)
# get first 2 lines of results with columns 'name','recommended_full_name', 'taxid'
>>> my_results_as_data_frame.ix[:2,('name','recommended_full_name', 'taxid')]
          name                     recommended_full_name  taxid
0   CD33_HUMAN         Myeloid cell surface antigen CD33   9606
1  CCD33_HUMAN  Coiled-coil domain-containing protein 33   9606

find entries by a list of gene names

>>> query.entry(name=('TREM2_HUMAN', 'CD33_HUMAN'))
[Myeloid cell surface antigen CD33, Triggering receptor expressed on myeloid cells 2]

If an attribute ends of an s it a clear hint that this is an 1:n or n:m relationship like keywords. There could be several proteins linked to a keyword, but also several keywords are linked to one protein. Next lines of code shows how to query for all proteins linked to the keyword ‘Neurodegenaration’ and returns the gene names.

>>> results = query.entry(keywords='Neurodegeneration')
>>> len(results) # number of results
322
>>> [x.gene_name for x in results][:3] # show only the first 2 gene names
['CHMP1A', 'CLN3', 'COQ8A']

Every element in the list represents a pyuniprot.manager.models.Entry instance:

>>> first_protein = results[0] # fetch first result
>>> type(first_protein)
pyuniprot.manager.models.Entry
>>> first_protein
Charged multivesicular body protein 1a
# get first 3 of all other keywords to this protein
>>> first_protein.keywords[:3]
[Reference proteome:KW-1185, Coiled coil:KW-0175, Repressor:KW-0678]

Properties

q.gene_forms
q.interaction_actions
q.actions
q.pathways

Query Manager Reference

class pyuniprot.manager.query.QueryManager(connection=None, echo=False)[source]

Query interface to database.

accession(accession=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Accession

Parameters:
  • accession (str) – UniProt Accession number
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Accession objects or pandas.DataFrame

alternative_full_name(name=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.AlternativeFullName

Parameters:
Returns:

list of pyuniprot.manager.models.AlternativeFullName objects or pandas.DataFrame

alternative_short_name(name=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.AlternativeShortlName

Parameters:
Returns:

list of pyuniprot.manager.models.AlternativeShortName objects or pandas.DataFrame

datasets

Distinct datasets (dataset) in pyuniprot.manager.models.Entry

Distinct datasets are SwissProt or/and TrEMBL

Returns:all distinct dataset types
Return type:[str,]
db_reference(type_=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.models.DbReference

Check list of available databases with on dbreference_types

Parameters:
  • type – type (or name) of database
  • identifier – unique identifier in database
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.DbReference objects or pandas.DataFrame

Links

dbreference_types

Distinct database reference types (type_) in pyuniprot.manager.models.DbReference

Returns:List of strings for all available database cross reference types used in model DbReference
Return type:[str,]
disease(identifier=None, ref_id=None, ref_type=None, name=None, acronym=None, description=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.models.Disease

Parameters:
  • identifier – disease UniProt identifier
  • ref_id – identifier of referenced database
  • ref_type – database name
  • name – disease name
  • acronym – disease acronym
  • description – disease description
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Disease objects or pandas.DataFrame

disease_comment(comment=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.models.DiseaseComment

Parameters:
  • comment – Comment to disease
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – Number of results, if limit=`None`, all results returned
  • as_df (bool) – If True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.DiseaseComment objects or pandas.DataFrame

diseases

Distinct diseases (name in pyuniprot.manager.models.Disease)

Returns:all distinct disease names
Return type:[str,]
ec_number(ec_number=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.ECNumber

Parameters:
  • ec_number – Enzyme Commission number
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.ECNumber objects or pandas.DataFrame

entry(name=None, dataset=None, recommended_full_name=None, recommended_short_name=None, gene_name=None, taxid=None, accession=None, organism_host=None, feature_type=None, function_=None, ec_number=None, db_reference=None, alternative_name=None, disease_comment=None, disease_name=None, tissue_specificity=None, pmid=None, keyword=None, subcellular_location=None, tissue_in_reference=None, sequence=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Entry

An entry is the root element in UniProt datasets. Everything is linked to entry and can be accessed from :param dataset: models.Entry object. % can be used as wildcard for string parameters (see examples below).

Parameters:
  • name (str,tuple) – UniProt entry name(s)
  • recommended_full_name (str,tuple) – recommended full protein name(s)
  • recommended_short_name (str,tuple) – recommended short protein name(s)
  • tissue_in_reference (str,tuple) – tissue mentioned in reference
  • subcellular_location (str,tuple) – subcellular location(s)
  • keyword (str,tuple) – keyword
  • pmid (str,tuple) – PubMed identifier
  • tissue_specificity (str,tuple) – tissue specificities
  • disease_comment (str,tuple) – disease_comments
  • alternative_name (str,tuple) –
  • db_reference (str,tuple) – cross reference identifier
  • ec_number (str,tuple) – enzyme classification number, e.g. 1.1.1.1
  • function (str,tuple) – description of protein functions
  • feature_type (str,tuple) – feature types
  • organism_host (str,tuple) – organism hosts
  • accession (str,tuple) – UniProt accession number
  • disease_name (str,tuple) – disease name
  • gene_name (str,tuple) – gene name
  • taxid (str,tuple) – NCBI taxonomy identifier
  • limit (int,tuple) – maximum number of results
  • sequence (str,tuple) – Amino acid sequence
  • as_df (bool) – if set to True result returns as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Entry objects or pandas.DataFrame

feature(type_=None, identifier=None, description=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Feature

Check available features types with pyuniprot.query().feature_types

Parameters:
  • type – type of feature
  • identifier – feature identifier
  • description – description of feature
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Feature objects or pandas.DataFrame

feature_types

Distinct types (type_) in pyuniprot.manager.models.Feature

Returns:all distinct feature types
Return type:[str,]
function(text=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Function

Parameters:
  • text – description of function
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Function objects or pandas.DataFrame

keyword(name=None, identifier=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Pmid

Parameters:
  • name (str) – keyword name
  • identifier (str) – keyword identifier
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Keyword objects or pandas.DataFrame

keywords

Distinct keywords (name in pyuniprot.manager.models.Keyword)

Returns:all distinct keywords
Return type:[str,]
organism_host(taxid=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.OrganismHost

Parameters:
  • taxid – NCBI taxonomy identifier
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.OrganismHostt objects or pandas.DataFrame

other_gene_name(type_=None, name=None, entry_name=None, limit=None, as_df=None)[source]

Method to query pyuniprot.manager.OtherGeneName

Parameters:
  • type (str) – type of gene name e.g. synonym
  • name (str) – other gene name
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – Number of results, if limit=`None`, all results returned
  • as_df (bool) – If True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.DiseaseComment objects or pandas.DataFrame

pmid(pmid=None, entry_name=None, first=None, last=None, volume=None, name=None, date=None, title=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Pmid

Parameters:
  • pmid (int) – PubMed identifier
  • entry_name (str) – name in models.Entry
  • first – first page
  • last – last page
  • volume – volume
  • name – name of journal
  • date – publication date
  • title – title of publication
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.Pmid objects or pandas.DataFrame

sequence(sequence=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.Sequence

Parameters:
Returns:

list of pyuniprot.manager.models.SubcellularLocation objects or pandas.DataFrame

subcellular_location(location=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.SubcellularLocation

Parameters:
  • location – subcellular location
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.SubcellularLocation objects or pandas.DataFrame

subcellular_locations

Distinct subcellular locations (location in pyuniprot.manager.models.SubcellularLocation)

Returns:all distinct subcellular locations
Return type:[str,]
taxids

Distinct NCBI taxonomy identifiers (taxid) in pyuniprot.manager.models.Entry

Returns:NCBI taxonomy identifiers
Return type:[int,]
tissue_in_reference(tissue=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.TissueInReference

Parameters:
  • tissue (str) – tissue linked to reference
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if limit=`None`, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of models.TissueInReference objects or pandas.DataFrame

Return type:

[models.TissueInReference,] or [pandas.DataFrame]

tissue_specificity(comment=None, entry_name=None, limit=None, as_df=False)[source]

Method to query pyuniprot.manager.TissueSpecificity

Provides information on the expression of a gene at the mRNA or protein level in cells or in tissues of multicellular organisms. By default, the information is derived from experiments at the mRNA level, unless specified ‘at protein level

Parameters:
  • comment (str) – Comment describing tissue specificity
  • entry_name (str) – name in models.Entry
  • limit (int,tuple) – number of results, if None, all results returned
  • as_df (bool) – if True results are returned as pandas.DataFrame
Returns:

list of pyuniprot.manager.models.TissueSpecificity objects or pandas.DataFrame

tissues_in_references

Distinct tissues (tissue in pyuniprot.manager.models.TissueInReference)

Returns:all distinct tissues in references
Return type:[str,]
version

Version of UniPort knowledgebase

Returns:dictionary with version info
Return type:dict