Query¶
Examples¶
For all string parameters you can use % as wildcard (please check the documentation below). All methods
have a parameter limit
which allows to limit the number of results and as_df
which allows to return
a pandas.DataFrame.
Initialize query object
import pyuniprot
pyuniprot.update(taxids=[9606,10090,10116]) # human, mouse, rat update
query = pyuniprot.query()
Methods by examples¶
search for …
- human proteins with gene name ‘TP53’ (taxid=9606)
>>> query.entry(gene_name='TP53', taxid=9606) [Cellular tumor antigen p53]
- human proteins with recommended full name starts with ‘Myeloid cell surface’ (use % at the end)
>>> query.entry(recommended_full_name='Myeloid cell surface%', taxid=9606) [Myeloid cell surface antigen CD33]
find all UniProt entries where the recommended full name contains ‘CD33’ (% at the start and end of search term) and return as pandas.DataFrame
>>> results = query.entry(name='%CD33%', taxid=9606, as_df=True)
# get first 2 lines of results with columns 'name','recommended_full_name', 'taxid'
>>> my_results_as_data_frame.ix[:2,('name','recommended_full_name', 'taxid')]
name recommended_full_name taxid
0 CD33_HUMAN Myeloid cell surface antigen CD33 9606
1 CCD33_HUMAN Coiled-coil domain-containing protein 33 9606
find entries by a list of gene names
>>> query.entry(name=('TREM2_HUMAN', 'CD33_HUMAN'))
[Myeloid cell surface antigen CD33, Triggering receptor expressed on myeloid cells 2]
If an attribute ends of an s it a clear hint that this is an 1:n or n:m relationship like keywords. There could be several proteins linked to a keyword, but also several keywords are linked to one protein. Next lines of code shows how to query for all proteins linked to the keyword ‘Neurodegenaration’ and returns the gene names.
>>> results = query.entry(keywords='Neurodegeneration')
>>> len(results) # number of results
322
>>> [x.gene_name for x in results][:3] # show only the first 2 gene names
['CHMP1A', 'CLN3', 'COQ8A']
Every element in the list represents a pyuniprot.manager.models.Entry instance:
>>> first_protein = results[0] # fetch first result
>>> type(first_protein)
pyuniprot.manager.models.Entry
>>> first_protein
Charged multivesicular body protein 1a
# get first 3 of all other keywords to this protein
>>> first_protein.keywords[:3]
[Reference proteome:KW-1185, Coiled coil:KW-0175, Repressor:KW-0678]
Properties¶
q.gene_forms
q.interaction_actions
q.actions
q.pathways