The curation process involved the creation of three manually curated gene sets by PENTACON curators: the Arachidonic Acid Pathway (AAP) list, the Arachidonic Acid Extended (AAE) list and the Blood Pressure regulation (BP) list. Genes from the three lists were subdivided into three Gene Set Qualifiers: Gold Standard, Likely and Predicted categories.
Statistics for the PENTACON Curated Gene Sets (human genes):
| Arachidonic Acid Pathway | Blood Pressure | Arachidonic Acid Extended |
Gold Standard (Direct) Genes |
118 | 2 | 17 |
Likely (Indirect) Genes | 13 | 23 | 5 |
Predicted Genes | 27 | 36 | 8 |
Total Number of Genes | 158 | 61 | 30 |
Arachidonic Acid Pathway (AAP) Genes: genes directly involved in the 'arachidonic acid pathway'. These genes are supported by evidence showing they are involved in arachidonic acid metabolism and remodeling in humans. This list includes genes from the
KEGG arachidonic acid metabolism pathway,
Reactome pathways (arachidonic acid metabolism, eicosanoids, COX reactions, eicosanoid ligand-binding receptors), Rat Genome Database Pathways, relevant Gene Ontology (
GO) annotations, investigator suggested genes and relevant genes found in literature.
Blood Pressure (BP) Genes: genes related to the phenotype of blood pressure as shown by their presence in
Reactome pathways (pathways annotated with experimental evidence) identified as related to the phenotype of blood pressure, experimentally-supported GO annotations for blood pressure, and the literature. The genes for the identified
Reactome pathways were taken from "
Reactome Pathways Gene Set." Genes obtained from GO were assembled from those directly annotated to "regulation of blood pressure" (GO:0008217), "regulation of blood vessel size" (GO:0050880), and manually-selected children of these two terms.
Arachidonic Acid Extended (AAE) Genes: genes related to the arachidonic acid pathway. These genes were identified by GO query for arachidonic acid metabolism related terms. Genes associated with arachidonic acid metabolism (in a regulatory manner) found in literature by curators have also been added to this list in the gene set AAE. The search was restricted to experimental annotations.
Genes were classified by PENTACON curators as Gold Standard, Likely, or Predicted based on the level of evidence supporting their involvement in the relevant pathway as follows:
Gold Standard (Direct):
The gene set qualifier "Gold Standard" is assigned when experimental evidence demonstrates involvement of the gene in the arachidonic acid metabolism and arachidonic acid remodeling pathways, regulation of these pathways or involvement in blood pressure regulation in humans. For AAP and AAE genes experimental evidence means that an enzyme has been assayed with substrates that are in these pathways, a receptor binds ligands in these pathways, or the protein interacts with another protein in the pathway. For BP genes experimental evidence means (a combination of) (1)mutations were found in the gene found in people displaying the phenotype of interest, in this case altered BP, and there is associate familial evidence showing that individuals without the mutation do not display the phenotype (2) drugs produce a change in phenotype in humans by targeting a gene in the list (3) Mendelian disease. Genes assigned the "Gold Standard" gene set qualifier can be used for computational analyses.
Likely (Indirect):
The gene set qualifier "Likely" is assigned when genes ‘likely’ participate in the arachidonic acid metabolism and remodelling pathway, regulation of this pathway or in the blood pressure regulation pathway. Genes are assigned "Likely" for AAP and AAE when there is experimental evidence for the predicted/expected activity for a relevant probe substrate, but not definitive experimental evidence for participation in the arachidonic acid metabolism and remodelling pathway. For example, an enzyme expected to be involved in AA remodelling for which activity was demonstrated using palmitic, oleic, or linoleic acid, but not arachidonic acid, would be assigned the gene set qualifier "Likely". The gene set qualifier "Likely" is assigned for BP genes when the experimental evidence is not a direct measure of BP. Examples: (1) in vitro/ex vivo evidence only, such as vasoconstriction (2) genetic association studies and mouse KO study showing physiological evidence of gene involved in regulation of BP (3) phenotype observed in an organism (eg. change in BP in mice) and in vitro in human study. These genes can be included in a computational analysis based on programmer discretion.
Predicted:
The gene set qualifier "Predicted" is assigned when genes have been inferred to be involved in the arachidonic acid metabolism and remodelling pathways (AAP) and regulation (AAE) based on (1) evidence from other organisms or (2) homology. For the BP gene set they have been inferred based on (1) evidence from other organisms (2) only in vitro evidence for human genes that demonstrate that an expressed gene has a given function (eg PC5 when co-expressed with a prorenin expression vector results in cells secreting renin) (3) only genetic association study (no familial evidence). "Predicted" is assigned to genes for which participation in the relevant pathway is purely predicted and/or gene products have not been characterized. These genes should not be used in a computational analysis.
At the top of the Gene Summary page, the gene symbol, aliases and gene name are provided in addition to
UniProt IDs,
NCBI Gene IDs and associated EC numbers.
Pentacon Gene Set
The Gene set and gene set qualifier are displayed along with the relevant information source and evidence type. Information sources include
GO - Gene Ontology,
Reactome (when Reactome is the information source, the stable identifier is used. This looks like: REACT_147707.2), PTHR -
Panther,
KEGG and Rat Genome Database (
RGD) pathways. The Information source is provided followed by the information source's internal identifier. Genes identified from GO blood pressure annotations have GO:BP. 'Literature' denotes that genes were added based on evidence present in literature reviews; this information source should be accompanied by entries in the
PubMed ID column.
The evidence code C is used to denote review articles. The evidence code E is used for articles that present experimental evidence including, but not limited to, tissue distribution and enzyme characterization. The evidence code P is used for publications that (1) predict presence based on evidence in mice/rabbits (2) use bioinformatics tools to identify human genes and (3) contain non-traceable author statements. Bioinformatics approaches would include using conserved sequence motifs to identify candidate genes, using a known human gene to identify sequences with significant identity (and finding cDNA in EST database).
Below the Gene Set section, full curated entries for the gene of interest are provided, including the following topics: function, isoforms, natural variants, mutagenesis sites, tissues and cell types, localization and diseases. These annotations are associated with the original sources of these data in the form of links to the relevant resources, such as
UniProt or
PubMed.
All of the annotations extracted from other resources were converted by PENTACON curators to standardized ontological terms such as DOID (disease),
BRENDA tissue ontology (tissues, cell types), and GO cellular component (cellular localization), where applicable, in order to facilitate data searches and computational analyses.
Orthologs
PENTACON curators also determined predicted orthologs in model organisms of interest for each of the genes, beginning with the AAP gene set. Links to orthologous proteins in species of interest (mouse, rat, zebrafish and budding yeast) are provided.
Pentacon Links
Links to other resources developed by PENTACON are provided, including the corresponding interactive AAP figure developed by Pentacon curators and NSAIDnet, a tool for predicting functional networks which was developed by computational biologists as part of the PENTACON project. The Gold Standard gene sets underlying NSAIDnet were produced by PENTACON curators, allowing users to explore functional networks of genes based on NSAID-related genes sets, such as the core AAP gene set.
External Links
Additional external links to GIANT, a tissue specific protein interaction network analysis tool, and
P-POD, a protein ortholog database.
Kinetics data
The last section of the gene summary page contains kinetics data obtained from various resources or curated by PENTACON curators from the literature. These values include, but are not limited to, measurements of KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on and K-off. Chemical terms related to the value have been parsed from
UniProt,
BRENDA, or
BindingDB or added by Pentacon curators directly from the literature and converted to the appropriate
ChEBI term and ID. For chemical terms with no
ChEBI ID available, a new
ChEBI ID has been requested and a temporary placeholder ID has been entered. These placeholders containing a
ChEBI term referred to as
ChEBI:NONE (for example:
ChEBI:NONE_148513). This term has not yet been assigned an ID by
ChEBI; the number shown is an internal PENTACON ID. An additional
ChEBI term is available (
ChEBI Term B) when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate. The primary source (usually the PMID) and the secondary source (originating database), if applicable, are also provided. This information can also be downloaded by selecting Kinetics Data from the drop-down menu under the Download button at the top right of the Gene Summary page.
Download File Information
More detailed references are provided for each annotation in the download file which can be obtained for individual genes using the blue download button on the top right side of the gene summary page.
Curation was performed by Pentacon curators based on information available in
UniProt,
BRENDA,
BindingDB, and the published literature. The curation process involved verifying a specific list of genes for their involvement in certain pathways based on experimental evidence or valid reviews. Data pertaining to kinetics and tissue/cell type expression of these genes/proteins were parsed from
UniProt,
BRENDA and
BindingDB; in early data releases Ontomaton spreadsheets were pre-populated with the resulting datatypes, and datatypes were reviewed for accuracy by curation of the associated papers. More recently reviewed data was imported into a gene curation tool, Curatus and reviewed for accuracy by curation of associated papers within this tool. In some cases data were added directly from the published literature rather than from the databases.
For a portion of the data, the pre-populated data types were converted to the requested ontology terms and corresponding IDs in the Google spreadsheet using OntoMaton. For recently added data, parsed data was imported directly into the gene curation tool, Curatus, where requested ontology terms could be added or converted.
Curation Topics and Ontologies
Curation Topic |
Originating Source Data Type |
Ontology Term |
Any conditional (drug) |
UniProt: various fields
BRENDA: various fields
BindingDB: various fields
|
ChEBI
|
Any conditional (environmental) |
UniProt: None |
PATO |
Cell-type specificity/dependency |
UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
|
Cell Ontology
BRENDA Tissue Ontology
|
Cell line |
UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
|
Cell Line Ontology
BRENDA Tissue Ontology
|
Cellular location |
UniProt: Comment/subcellular location |
GO Cellular Component |
Pathway |
UniProt: Comment/pathway; Comment/function |
GO Biological Process |
Human Disease |
UniProt*: Comment/Involvement in disease
BRENDA: Various fields**
|
Human Disease Ontology
SNOMED |
Tissue specificity/dependency |
UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
|
BRENDA
Uberon
|
Developmental stage |
UniProt: Comment/developmental stage |
BRENDA Tissue Ontology
|
Isoforms |
UniProt: Comment/alternative products/isoform |
None |
Variants/Genetic Background |
UniProt: Feature/sequence variant |
None |
Kinetics |
UniProt: Comment/biophysicochemical properties/KM or Vmax
BRENDA: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off
BindingDB: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off
| None |
*For disease information derived from
UniProt, notes are included in the "Secondary Source Notes" column of the file that indicate whether a disease is caused by a mutation in a gene, or else is associated with variations in a gene.
**Note that disease associations for
BRENDA data are curated only for cell lines or cells that are derived from tissue in the diseased state.
Additional Download File Fields
Evidence Code
Evidence Code Ontology ID (ECO:0000311 = Imported Information, ECO:0000006 = Experimental, this code has been partially replaced by ECO:0000269 = manually curated information for which there is published experimental evidence , ECO:0000033 = traceable author statement, ECO:0000035 = no biological data found, this code has been partially replaced by ECO:0000303 = manually curated information that is based on statements in scientific articles for which there is no experimental support)
Primary Source
Typically the
PubMed ID. Occasionally a
PubMed ID may be proceeded by "submission:", this indicates that a variant was directly submitted to
UniProt rather than curated from a
PubMed ID by
UniProt curators.
Secondary Source
Typically an external resource, e.g.
UniProtKB,
BRENDA. If Secondary Source = Pentacon, then the curated information and ontology terms were added by Pentacon curators and associated with relevant
PubMed IDs.
PENTACON Notes
Pentacon curator notes; may contain relevant citation information for the associated papers, and also additional notes that are preceded by the phrase "Pentacon Notes". Additional rows of data added by Pentacon curators and not directly parsed from another resource are indicated as "Added by Pentacon".
Secondary Source Version Date
Version date of downloaded information from external resource (e.g.
UniProtKB,
BRENDA,
BindingDB) or of date released by PENTACON for each annotation row.
Secondary Source Version
Version number of downloaded information from external resource (e.g.
UniProtKB,
BRENDA,
BindingDB) or of version released by PENTACON for each annotation row.
PENTACON Annotation No
Unique annotation ID assigned by Pentacon.
Kinetics Specific Fields
Data Value
Text parsed according to
UniProt,
BRENDA, or
BindingDB data type listed in curation topics above, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.
Value
Text parsed from
UniProt,
BRENDA, or
BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.
Unit
Units (e.g. pmol/min/mg) parsed from
UniProt,
BRENDA, or
BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, units added by Pentacon curators.
ChEBI Chemical Term A
Chemical term parsed from
UniProt,
BRENDA, or
BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, and converted to
ChEBI term. in cases where Secondary Source = PENTACON, chemical terms were added by Pentacon curators.
ChEBI Chemical ID A
For annotation lines containing a
ChEBI term referred to as
ChEBI:NEW (for example:
ChEBI:NEW_148513), this term has not yet been assigned an ID by
ChEBI; the number shown is an internal PENTACON ID.
ChEBI Chemical Term B
Additional chemical entry converted to
ChEBI term when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate.
ChEBI Chemical ID B
For annotation lines containing a
ChEBI term referred to as
ChEBI:NEW (for example:
ChEBI:NEW_148513), this term has not yet been assigned an ID by
ChEBI; the number shown is an internal PENTACON ID.
Secondary Source IDs
IDs that are assigned by the database from which the data was originally parsed. For example, secondary source IDs may be
UniProt IDs (e.g. variants have VAR_IDs, isoforms have VSP_IDs), EC numbers (assigned by
BRENDA), or
BindingDB IDs. The source of the ID is noted in the Secondary Source column (see below). When the Secondary Source is PENTACON, there is no Secondary Source ID.
Secondary Source Notes
Notes parsed from secondary resources (e.g.
UniProt,
BRENDA,
BindingDB)
The orthologs and analogs page provides a list of orthologs manually selected by PENTACON curators for AAP genes in a subset of key model organisms, including: mouse (Mus musculus), rat (Rattus norvegicus), zebrafish (Danio rerio), and yeast (Saccharomyces cerevisiae).
PENTACON curators identified consensus orthologs/analogs using
P-POD version 4 and
IMP. Orthologs were identified using P-POD's OrthoMCL analysis when possible or, when the human gene was not assigned to an OrthoMCL family, using P-POD's MultiParanoid analysis. Functional analogs were obtained from IMP using a cutoff of p < 0.05. In some cases, functional analogs were identified directly from the literature; in these cases the supporting PMID and evidence code "9" are noted. PENTACON curators reviewed the ortholog and analog calls and evidence and, using the following evidence codes, identified the consensus ortholog/analog:
Evidence code descriptions
1 | P-POD identifies a single ortholog, IMP identifies a single analog, and they agree. |
1P | P-POD identifies a single ortholog, IMP identifies a single analog, and they agree; P-POD ortholog is found in MultiParanoid family. |
2 | Call based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog. |
2P | Call based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog; P-POD ortholog is found in MultiParanoid family. |
3 | Call based on analogy (IMP) only. IMP identified 1 or more orthologs, but P-POD did not identify an ortholog. |
4 | Multiple or ambiguous orthologs resolved by referring to analogs. |
5 | Multiple or ambiguous orthologs resolved by referring to both IMP and P-POD. |
6 | P-POD and IMP disagree. Selection made by curator judgment. |
7 | IMP identifies no analog, and P-POD identifies no ortholog. |
8 | P-POD and IMP identify the same proteins, and curator selected a subset based on additional evidence. |
9 | Manual annotation of functional analogs based on published literature. |
The P-POD and IMP web sites can be found here: