PENTACON Curation Resource Help Document

PENTACON overview

PENTACON curators compiled lists of human genes involved in the Arachidonic Acid Pathway (AAP) and related networks based on the expertise of PENTACON researchers, as well as curated information from external resources including KEGG, Reactome, and GO. The genes were ranked as "Gold Standard", "Likely" or "Predicted" to be involved in the Arachidonic Acid or related pathways according to available experimental evidence. Specific gene curation was then performed by PENTACON curators based on information available in UniProt, BRENDA, BindingDB and the literature. The manually curated information, including chemicals, tissue specificity and disease involvement, was captured using standardized ontology terms. All of the curated information is available from the PENTACON Curated Data Resource, as well as from the PENTACON Data page. More information on how to navigate this resource and details about the curation are provided below. For additional questions not addressed in this help document, please contact pentacon@genomics.princeton.edu.

Table of Contents

Curated Gene Sets:

Curated Gene Sets
The Curated Gene Sets may be accessed via the top left menu option in the green Pentacon banner. The Curated Gene Sets table provides the gene sets that were manually curated by PENTACON curators. The gene sets include the following: 1. Core genes of the Arachidonic Acid Pathway (AAP), 2. Genes related to the AAP, which are referred to as Arachidonic Acid Extended (AAE), and 3. Genes involved in Blood Pressure (BP) regulation. The Gene Sets table gives the gene names, NCBI IDs and UniProt IDs for the different curated gene sets along with some summary data. These gene sets are also subdivided into three separate Gene Set Qualifier levels--- Gold Standard (GS), Likely (L) and Predicted (P) --- based on calls made by PENTACON curators and experts in the field.. The gene sets can be sorted based on these classifications using the Gene Set Drop-down menu in the right column of the table. The curated gene sets and qualifiers are described in greater detail below.
Curated Gene Sets
The Curated Gene Sets table also provides a summary version of the information curated for each gene, including: function, involvement in disease, tissue/cell type and cellular localization, kinetics data, and other topics. The “Other” category includes isoform, polymorphism, mutant or variant information. If this data type has been captured along with kinetics information, then a corresponding clickable icon will be present in the appropriate column (see the legend at the top right of the page for an icon key). If no icon is present that information is not available for a given gene. Minimal data for each gene is provided in this summary table view and users may click on the gene names to access more details on the individual Gene Summary Pages. The Curated Gene Set table can also be filtered on the fly using various search terms in text boxes under the column headers. Additionally, all of the curated information, including detailed information on sources of curated data, may be downloaded for the entire gene sets or for filtered subsets using the Download buttons at the top of the page or from the PENTACON Data download page.

Curation Process

The curation process involved the creation of three manually curated gene sets by PENTACON curators: the Arachidonic Acid Pathway (AAP) list, the Arachidonic Acid Extended (AAE) list and the Blood Pressure regulation (BP) list. Genes from the three lists were subdivided into three Gene Set Qualifiers: Gold Standard, Likely and Predicted categories.
Statistics for the PENTACON Curated Gene Sets (human genes):
Arachidonic Acid Pathway Blood Pressure Arachidonic Acid Extended
Gold Standard (Direct) Genes 118217
Likely (Indirect) Genes13235
Predicted Genes27368
Total Number of Genes1586130

Gene Set Descriptions

Arachidonic Acid Pathway (AAP) Genes: genes directly involved in the 'arachidonic acid pathway'. These genes are supported by evidence showing they are involved in arachidonic acid metabolism and remodeling in humans. This list includes genes from the KEGG arachidonic acid metabolism pathway, Reactome pathways (arachidonic acid metabolism, eicosanoids, COX reactions, eicosanoid ligand-binding receptors), Rat Genome Database Pathways, relevant Gene Ontology (GO) annotations, investigator suggested genes and relevant genes found in literature. Blood Pressure (BP) Genes: genes related to the phenotype of blood pressure as shown by their presence in Reactome pathways (pathways annotated with experimental evidence) identified as related to the phenotype of blood pressure, experimentally-supported GO annotations for blood pressure, and the literature. The genes for the identified Reactome pathways were taken from "Reactome Pathways Gene Set." Genes obtained from GO were assembled from those directly annotated to "regulation of blood pressure" (GO:0008217), "regulation of blood vessel size" (GO:0050880), and manually-selected children of these two terms. Arachidonic Acid Extended (AAE) Genes: genes related to the arachidonic acid pathway. These genes were identified by GO query for arachidonic acid metabolism related terms. Genes associated with arachidonic acid metabolism (in a regulatory manner) found in literature by curators have also been added to this list in the gene set AAE. The search was restricted to experimental annotations.

Gene Set Qualifiers

Genes were classified by PENTACON curators as Gold Standard, Likely, or Predicted based on the level of evidence supporting their involvement in the relevant pathway as follows:
Gold Standard (Direct):
The gene set qualifier "Gold Standard" is assigned when experimental evidence demonstrates involvement of the gene in the arachidonic acid metabolism and arachidonic acid remodeling pathways, regulation of these pathways or involvement in blood pressure regulation in humans. For AAP and AAE genes experimental evidence means that an enzyme has been assayed with substrates that are in these pathways, a receptor binds ligands in these pathways, or the protein interacts with another protein in the pathway. For BP genes experimental evidence means (a combination of) (1)mutations were found in the gene found in people displaying the phenotype of interest, in this case altered BP, and there is associate familial evidence showing that individuals without the mutation do not display the phenotype (2) drugs produce a change in phenotype in humans by targeting a gene in the list (3) Mendelian disease. Genes assigned the "Gold Standard" gene set qualifier can be used for computational analyses.
Likely (Indirect):
The gene set qualifier "Likely" is assigned when genes ‘likely’ participate in the arachidonic acid metabolism and remodelling pathway, regulation of this pathway or in the blood pressure regulation pathway. Genes are assigned "Likely" for AAP and AAE when there is experimental evidence for the predicted/expected activity for a relevant probe substrate, but not definitive experimental evidence for participation in the arachidonic acid metabolism and remodelling pathway. For example, an enzyme expected to be involved in AA remodelling for which activity was demonstrated using palmitic, oleic, or linoleic acid, but not arachidonic acid, would be assigned the gene set qualifier "Likely". The gene set qualifier "Likely" is assigned for BP genes when the experimental evidence is not a direct measure of BP. Examples: (1) in vitro/ex vivo evidence only, such as vasoconstriction (2) genetic association studies and mouse KO study showing physiological evidence of gene involved in regulation of BP (3) phenotype observed in an organism (eg. change in BP in mice) and in vitro in human study. These genes can be included in a computational analysis based on programmer discretion.
Predicted:
The gene set qualifier "Predicted" is assigned when genes have been inferred to be involved in the arachidonic acid metabolism and remodelling pathways (AAP) and regulation (AAE) based on (1) evidence from other organisms or (2) homology. For the BP gene set they have been inferred based on (1) evidence from other organisms (2) only in vitro evidence for human genes that demonstrate that an expressed gene has a given function (eg PC5 when co-expressed with a prorenin expression vector results in cells secreting renin) (3) only genetic association study (no familial evidence). "Predicted" is assigned to genes for which participation in the relevant pathway is purely predicted and/or gene products have not been characterized. These genes should not be used in a computational analysis.

Gene Summary Pages

Curated Gene Sets
At the top of the Gene Summary page, the gene symbol, aliases and gene name are provided in addition to UniProt IDs, NCBI Gene IDs and associated EC numbers.
Pentacon Gene Set
The Gene set and gene set qualifier are displayed along with the relevant information source and evidence type. Information sources include GO - Gene Ontology, Reactome (when Reactome is the information source, the stable identifier is used. This looks like: REACT_147707.2), PTHR - Panther, KEGG and Rat Genome Database (RGD) pathways. The Information source is provided followed by the information source's internal identifier. Genes identified from GO blood pressure annotations have GO:BP. 'Literature' denotes that genes were added based on evidence present in literature reviews; this information source should be accompanied by entries in the PubMed ID column. The evidence code C is used to denote review articles. The evidence code E is used for articles that present experimental evidence including, but not limited to, tissue distribution and enzyme characterization. The evidence code P is used for publications that (1) predict presence based on evidence in mice/rabbits (2) use bioinformatics tools to identify human genes and (3) contain non-traceable author statements. Bioinformatics approaches would include using conserved sequence motifs to identify candidate genes, using a known human gene to identify sequences with significant identity (and finding cDNA in EST database). Below the Gene Set section, full curated entries for the gene of interest are provided, including the following topics: function, isoforms, natural variants, mutagenesis sites, tissues and cell types, localization and diseases. These annotations are associated with the original sources of these data in the form of links to the relevant resources, such as UniProt or PubMed.
Curated Gene Sets
All of the annotations extracted from other resources were converted by PENTACON curators to standardized ontological terms such as DOID (disease), BRENDA tissue ontology (tissues, cell types), and GO cellular component (cellular localization), where applicable, in order to facilitate data searches and computational analyses.
Orthologs
PENTACON curators also determined predicted orthologs in model organisms of interest for each of the genes, beginning with the AAP gene set. Links to orthologous proteins in species of interest (mouse, rat, zebrafish and budding yeast) are provided.
Pentacon Links
Links to other resources developed by PENTACON are provided, including the corresponding interactive AAP figure developed by Pentacon curators and NSAIDnet, a tool for predicting functional networks which was developed by computational biologists as part of the PENTACON project. The Gold Standard gene sets underlying NSAIDnet were produced by PENTACON curators, allowing users to explore functional networks of genes based on NSAID-related genes sets, such as the core AAP gene set.
External Links
Additional external links to GIANT, a tissue specific protein interaction network analysis tool, and P-POD, a protein ortholog database.
Kinetics data
The last section of the gene summary page contains kinetics data obtained from various resources or curated by PENTACON curators from the literature. These values include, but are not limited to, measurements of KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on and K-off. Chemical terms related to the value have been parsed from UniProt, BRENDA, or BindingDB or added by Pentacon curators directly from the literature and converted to the appropriate ChEBI term and ID. For chemical terms with no ChEBI ID available, a new ChEBI ID has been requested and a temporary placeholder ID has been entered. These placeholders containing a ChEBI term referred to as ChEBI:NONE (for example: ChEBI:NONE_148513). This term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID. An additional ChEBI term is available (ChEBI Term B) when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate. The primary source (usually the PMID) and the secondary source (originating database), if applicable, are also provided. This information can also be downloaded by selecting Kinetics Data from the drop-down menu under the Download button at the top right of the Gene Summary page.

Download File Information

More detailed references are provided for each annotation in the download file which can be obtained for individual genes using the blue download button on the top right side of the gene summary page. Curation was performed by Pentacon curators based on information available in UniProt, BRENDA, BindingDB, and the published literature. The curation process involved verifying a specific list of genes for their involvement in certain pathways based on experimental evidence or valid reviews. Data pertaining to kinetics and tissue/cell type expression of these genes/proteins were parsed from UniProt, BRENDA and BindingDB; in early data releases Ontomaton spreadsheets were pre-populated with the resulting datatypes, and datatypes were reviewed for accuracy by curation of the associated papers. More recently reviewed data was imported into a gene curation tool, Curatus and reviewed for accuracy by curation of associated papers within this tool. In some cases data were added directly from the published literature rather than from the databases. For a portion of the data, the pre-populated data types were converted to the requested ontology terms and corresponding IDs in the Google spreadsheet using OntoMaton. For recently added data, parsed data was imported directly into the gene curation tool, Curatus, where requested ontology terms could be added or converted.
Curation Topics and Ontologies
Curation Topic Originating Source Data Type Ontology Term
Any conditional (drug) UniProt: various fields
BRENDA: various fields
BindingDB: various fields
ChEBI
Any conditional (environmental) UniProt: None PATO
Cell-type specificity/dependency UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
Cell Ontology BRENDA Tissue Ontology
Cell line UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
Cell Line Ontology BRENDA Tissue Ontology
Cellular location UniProt: Comment/subcellular location GO Cellular Component
Pathway UniProt: Comment/pathway; Comment/function GO Biological Process
Human Disease UniProt*: Comment/Involvement in disease
BRENDA: Various fields**
Human Disease Ontology
SNOMED
Tissue specificity/dependency UniProt: Comment/tissue specificity
BRENDA: Comment/tissue specificity
BRENDA Uberon
Developmental stage UniProt: Comment/developmental stage BRENDA Tissue Ontology
Isoforms UniProt: Comment/alternative products/isoform None
Variants/Genetic Background UniProt: Feature/sequence variant None
Kinetics UniProt: Comment/biophysicochemical properties/KM or Vmax
BRENDA: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off
BindingDB: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off
None
*For disease information derived from UniProt, notes are included in the "Secondary Source Notes" column of the file that indicate whether a disease is caused by a mutation in a gene, or else is associated with variations in a gene.
**Note that disease associations for BRENDA data are curated only for cell lines or cells that are derived from tissue in the diseased state.
Additional Download File Fields
Evidence Code
Evidence Code Ontology ID (ECO:0000311 = Imported Information, ECO:0000006 = Experimental, this code has been partially replaced by ECO:0000269 = manually curated information for which there is published experimental evidence , ECO:0000033 = traceable author statement, ECO:0000035 = no biological data found, this code has been partially replaced by ECO:0000303 = manually curated information that is based on statements in scientific articles for which there is no experimental support)
Primary Source
Typically the PubMed ID. Occasionally a PubMed ID may be proceeded by "submission:", this indicates that a variant was directly submitted to UniProt rather than curated from a PubMed ID by UniProt curators.
Secondary Source
Typically an external resource, e.g. UniProtKB, BRENDA. If Secondary Source = Pentacon, then the curated information and ontology terms were added by Pentacon curators and associated with relevant PubMed IDs.
PENTACON Notes
Pentacon curator notes; may contain relevant citation information for the associated papers, and also additional notes that are preceded by the phrase "Pentacon Notes". Additional rows of data added by Pentacon curators and not directly parsed from another resource are indicated as "Added by Pentacon".
Secondary Source Version Date
Version date of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) or of date released by PENTACON for each annotation row.
Secondary Source Version
Version number of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) or of version released by PENTACON for each annotation row.
PENTACON Annotation No
Unique annotation ID assigned by Pentacon.
Kinetics Specific Fields
Data Value
Text parsed according to UniProt, BRENDA, or BindingDB data type listed in curation topics above, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.
Value
Text parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.
Unit
Units (e.g. pmol/min/mg) parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, units added by Pentacon curators. ChEBI Chemical Term A Chemical term parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, and converted to ChEBI term. in cases where Secondary Source = PENTACON, chemical terms were added by Pentacon curators.
ChEBI Chemical ID A
For annotation lines containing a ChEBI term referred to as ChEBI:NEW (for example: ChEBI:NEW_148513), this term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID.
ChEBI Chemical Term B
Additional chemical entry converted to ChEBI term when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate.
ChEBI Chemical ID B
For annotation lines containing a ChEBI term referred to as ChEBI:NEW (for example: ChEBI:NEW_148513), this term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID.
Secondary Source IDs
IDs that are assigned by the database from which the data was originally parsed. For example, secondary source IDs may be UniProt IDs (e.g. variants have VAR_IDs, isoforms have VSP_IDs), EC numbers (assigned by BRENDA), or BindingDB IDs. The source of the ID is noted in the Secondary Source column (see below). When the Secondary Source is PENTACON, there is no Secondary Source ID.
Secondary Source Notes
Notes parsed from secondary resources (e.g. UniProt, BRENDA, BindingDB)

Ortholog List

The orthologs and analogs page provides a list of orthologs manually selected by PENTACON curators for AAP genes in a subset of key model organisms, including: mouse (Mus musculus), rat (Rattus norvegicus), zebrafish (Danio rerio), and yeast (Saccharomyces cerevisiae).
Orthologs
PENTACON curators identified consensus orthologs/analogs using P-POD version 4 and IMP. Orthologs were identified using P-POD's OrthoMCL analysis when possible or, when the human gene was not assigned to an OrthoMCL family, using P-POD's MultiParanoid analysis. Functional analogs were obtained from IMP using a cutoff of p < 0.05. In some cases, functional analogs were identified directly from the literature; in these cases the supporting PMID and evidence code "9" are noted. PENTACON curators reviewed the ortholog and analog calls and evidence and, using the following evidence codes, identified the consensus ortholog/analog:

Evidence code descriptions

1P-POD identifies a single ortholog, IMP identifies a single analog, and they agree.
1PP-POD identifies a single ortholog, IMP identifies a single analog, and they agree; P-POD ortholog is found in MultiParanoid family.
2Call based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog.
2PCall based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog; P-POD ortholog is found in MultiParanoid family.
3Call based on analogy (IMP) only. IMP identified 1 or more orthologs, but P-POD did not identify an ortholog.
4Multiple or ambiguous orthologs resolved by referring to analogs.
5Multiple or ambiguous orthologs resolved by referring to both IMP and P-POD.
6P-POD and IMP disagree. Selection made by curator judgment.
7IMP identifies no analog, and P-POD identifies no ortholog.
8P-POD and IMP identify the same proteins, and curator selected a subset based on additional evidence.
9Manual annotation of functional analogs based on published literature.
The P-POD and IMP web sites can be found here: