Pentacon Curated Data Resource (CDR)

PENTACON Curation Resource Help Document

PENTACON overview

PENTACON curators compiled lists of human genes involved in the Arachidonic Acid Pathway (AAP) and related networks based on the expertise of PENTACON researchers, as well as curated information from external resources including KEGG, Reactome, and GO. The genes were ranked as "Gold Standard", "Likely" or "Predicted" to be involved in the Arachidonic Acid or related pathways according to available experimental evidence. Specific gene curation was then performed by PENTACON curators based on information available in UniProt, BRENDA, BindingDB and the literature. The manually curated information, including chemicals, tissue specificity and disease involvement, was captured using standardized ontology terms. All of the curated information is available from the PENTACON Curated Data Resource, as well as from the PENTACON Data page. More information on how to navigate this resource and details about the curation are provided below. For additional questions not addressed in this help document, please contact pentacon@genomics.princeton.edu.

Curated Gene Sets
Ortholog List
- Evidence Code Descriptions
- Links and Resources

Curated Gene Sets:

The Curated Gene Sets may be accessed via the top left menu option in the green Pentacon banner. The Curated Gene Sets table provides the gene sets that were manually curated by PENTACON curators. The gene sets include the following: 1. Core genes of the Arachidonic Acid Pathway (AAP), 2. Genes related to the AAP, which are referred to as Arachidonic Acid Extended (AAE), and 3. Genes involved in Blood Pressure (BP) regulation. The Gene Sets table gives the gene names, NCBI IDs and UniProt IDs for the different curated gene sets along with some summary data. These gene sets are also subdivided into three separate Gene Set Qualifier levels--- Gold Standard (GS), Likely (L) and Predicted (P) --- based on calls made by PENTACON curators and experts in the field.. The gene sets can be sorted based on these classifications using the Gene Set Drop-down menu in the right column of the table. The curated gene sets and qualifiers are described in greater detail below.

The Curated Gene Sets table also provides a summary version of the information curated for each gene, including: function, involvement in disease, tissue/cell type and cellular localization, kinetics data, and other topics. The “Other” category includes isoform, polymorphism, mutant or variant information. If this data type has been captured along with kinetics information, then a corresponding clickable icon will be present in the appropriate column (see the legend at the top right of the page for an icon key). If no icon is present that information is not available for a given gene. Minimal data for each gene is provided in this summary table view and users may click on the gene names to access more details on the individual Gene Summary Pages. The Curated Gene Set table can also be filtered on the fly using various search terms in text boxes under the column headers. Additionally, all of the curated information, including detailed information on sources of curated data, may be downloaded for the entire gene sets or for filtered subsets using the Download buttons at the top of the page or from the PENTACON Data download page.

Curation Process

The curation process involved the creation of three manually curated gene sets by PENTACON curators: the Arachidonic Acid Pathway (AAP) list, the Arachidonic Acid Extended (AAE) list and the Blood Pressure regulation (BP) list. Genes from the three lists were subdivided into three Gene Set Qualifiers: Gold Standard, Likely and Predicted categories.

Statistics for the PENTACON Curated Gene Sets (human genes):

	Arachidonic Acid Pathway	Blood Pressure	Arachidonic Acid Extended
Gold Standard (Direct) Genes	118	2	17
Likely (Indirect) Genes	13	23	5
Predicted Genes	27	36	8
Total Number of Genes	158	61	30

Gene Set Descriptions

Arachidonic Acid Pathway (AAP) Genes: genes directly involved in the 'arachidonic acid pathway'. These genes are supported by evidence showing they are involved in arachidonic acid metabolism and remodeling in humans. This list includes genes from the KEGG arachidonic acid metabolism pathway, Reactome pathways (arachidonic acid metabolism, eicosanoids, COX reactions, eicosanoid ligand-binding receptors), Rat Genome Database Pathways, relevant Gene Ontology (GO) annotations, investigator suggested genes and relevant genes found in literature. Blood Pressure (BP) Genes: genes related to the phenotype of blood pressure as shown by their presence in Reactome pathways (pathways annotated with experimental evidence) identified as related to the phenotype of blood pressure, experimentally-supported GO annotations for blood pressure, and the literature. The genes for the identified Reactome pathways were taken from "Reactome Pathways Gene Set." Genes obtained from GO were assembled from those directly annotated to "regulation of blood pressure" (GO:0008217), "regulation of blood vessel size" (GO:0050880), and manually-selected children of these two terms. Arachidonic Acid Extended (AAE) Genes: genes related to the arachidonic acid pathway. These genes were identified by GO query for arachidonic acid metabolism related terms. Genes associated with arachidonic acid metabolism (in a regulatory manner) found in literature by curators have also been added to this list in the gene set AAE. The search was restricted to experimental annotations.

Gene Set Qualifiers

Genes were classified by PENTACON curators as Gold Standard, Likely, or Predicted based on the level of evidence supporting their involvement in the relevant pathway as follows:

Gold Standard (Direct):

The gene set qualifier "Gold Standard" is assigned when experimental evidence demonstrates involvement of the gene in the arachidonic acid metabolism and arachidonic acid remodeling pathways, regulation of these pathways or involvement in blood pressure regulation in humans. For AAP and AAE genes experimental evidence means that an enzyme has been assayed with substrates that are in these pathways, a receptor binds ligands in these pathways, or the protein interacts with another protein in the pathway. For BP genes experimental evidence means (a combination of) (1)mutations were found in the gene found in people displaying the phenotype of interest, in this case altered BP, and there is associate familial evidence showing that individuals without the mutation do not display the phenotype (2) drugs produce a change in phenotype in humans by targeting a gene in the list (3) Mendelian disease. Genes assigned the "Gold Standard" gene set qualifier can be used for computational analyses.

Likely (Indirect):

The gene set qualifier "Likely" is assigned when genes ‘likely’ participate in the arachidonic acid metabolism and remodelling pathway, regulation of this pathway or in the blood pressure regulation pathway. Genes are assigned "Likely" for AAP and AAE when there is experimental evidence for the predicted/expected activity for a relevant probe substrate, but not definitive experimental evidence for participation in the arachidonic acid metabolism and remodelling pathway. For example, an enzyme expected to be involved in AA remodelling for which activity was demonstrated using palmitic, oleic, or linoleic acid, but not arachidonic acid, would be assigned the gene set qualifier "Likely". The gene set qualifier "Likely" is assigned for BP genes when the experimental evidence is not a direct measure of BP. Examples: (1) in vitro/ex vivo evidence only, such as vasoconstriction (2) genetic association studies and mouse KO study showing physiological evidence of gene involved in regulation of BP (3) phenotype observed in an organism (eg. change in BP in mice) and in vitro in human study. These genes can be included in a computational analysis based on programmer discretion.

Predicted:

The gene set qualifier "Predicted" is assigned when genes have been inferred to be involved in the arachidonic acid metabolism and remodelling pathways (AAP) and regulation (AAE) based on (1) evidence from other organisms or (2) homology. For the BP gene set they have been inferred based on (1) evidence from other organisms (2) only in vitro evidence for human genes that demonstrate that an expressed gene has a given function (eg PC5 when co-expressed with a prorenin expression vector results in cells secreting renin) (3) only genetic association study (no familial evidence). "Predicted" is assigned to genes for which participation in the relevant pathway is purely predicted and/or gene products have not been characterized. These genes should not be used in a computational analysis.

Gene Summary Pages

At the top of the Gene Summary page, the gene symbol, aliases and gene name are provided in addition to UniProt IDs, NCBI Gene IDs and associated EC numbers.

Pentacon Gene Set

The Gene set and gene set qualifier are displayed along with the relevant information source and evidence type. Information sources include GO - Gene Ontology, Reactome (when Reactome is the information source, the stable identifier is used. This looks like: REACT_147707.2), PTHR - Panther, KEGG and Rat Genome Database (RGD) pathways. The Information source is provided followed by the information source's internal identifier. Genes identified from GO blood pressure annotations have GO:BP. 'Literature' denotes that genes were added based on evidence present in literature reviews; this information source should be accompanied by entries in the PubMed ID column. The evidence code C is used to denote review articles. The evidence code E is used for articles that present experimental evidence including, but not limited to, tissue distribution and enzyme characterization. The evidence code P is used for publications that (1) predict presence based on evidence in mice/rabbits (2) use bioinformatics tools to identify human genes and (3) contain non-traceable author statements. Bioinformatics approaches would include using conserved sequence motifs to identify candidate genes, using a known human gene to identify sequences with significant identity (and finding cDNA in EST database). Below the Gene Set section, full curated entries for the gene of interest are provided, including the following topics: function, isoforms, natural variants, mutagenesis sites, tissues and cell types, localization and diseases. These annotations are associated with the original sources of these data in the form of links to the relevant resources, such as UniProt or PubMed.

All of the annotations extracted from other resources were converted by PENTACON curators to standardized ontological terms such as DOID (disease), BRENDA tissue ontology (tissues, cell types), and GO cellular component (cellular localization), where applicable, in order to facilitate data searches and computational analyses.

Orthologs

PENTACON curators also determined predicted orthologs in model organisms of interest for each of the genes, beginning with the AAP gene set. Links to orthologous proteins in species of interest (mouse, rat, zebrafish and budding yeast) are provided.

Pentacon Links

Links to other resources developed by PENTACON are provided, including the corresponding interactive AAP figure developed by Pentacon curators and NSAIDnet, a tool for predicting functional networks which was developed by computational biologists as part of the PENTACON project. The Gold Standard gene sets underlying NSAIDnet were produced by PENTACON curators, allowing users to explore functional networks of genes based on NSAID-related genes sets, such as the core AAP gene set.

External Links

Additional external links to GIANT, a tissue specific protein interaction network analysis tool, and P-POD, a protein ortholog database.

Kinetics data

The last section of the gene summary page contains kinetics data obtained from various resources or curated by PENTACON curators from the literature. These values include, but are not limited to, measurements of KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on and K-off. Chemical terms related to the value have been parsed from UniProt, BRENDA, or BindingDB or added by Pentacon curators directly from the literature and converted to the appropriate ChEBI term and ID. For chemical terms with no ChEBI ID available, a new ChEBI ID has been requested and a temporary placeholder ID has been entered. These placeholders containing a ChEBI term referred to as ChEBI:NONE (for example: ChEBI:NONE_148513). This term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID. An additional ChEBI term is available (ChEBI Term B) when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate. The primary source (usually the PMID) and the secondary source (originating database), if applicable, are also provided. This information can also be downloaded by selecting Kinetics Data from the drop-down menu under the Download button at the top right of the Gene Summary page.

Download File Information

More detailed references are provided for each annotation in the download file which can be obtained for individual genes using the blue download button on the top right side of the gene summary page. Curation was performed by Pentacon curators based on information available in UniProt, BRENDA, BindingDB, and the published literature. The curation process involved verifying a specific list of genes for their involvement in certain pathways based on experimental evidence or valid reviews. Data pertaining to kinetics and tissue/cell type expression of these genes/proteins were parsed from UniProt, BRENDA and BindingDB; in early data releases Ontomaton spreadsheets were pre-populated with the resulting datatypes, and datatypes were reviewed for accuracy by curation of the associated papers. More recently reviewed data was imported into a gene curation tool, Curatus and reviewed for accuracy by curation of associated papers within this tool. In some cases data were added directly from the published literature rather than from the databases. For a portion of the data, the pre-populated data types were converted to the requested ontology terms and corresponding IDs in the Google spreadsheet using OntoMaton. For recently added data, parsed data was imported directly into the gene curation tool, Curatus, where requested ontology terms could be added or converted.

Curation Topics and Ontologies

Curation Topic	Originating Source Data Type	Ontology Term
Any conditional (drug)	UniProt: various fields BRENDA: various fields BindingDB: various fields	ChEBI
Any conditional (environmental)	UniProt: None	PATO
Cell-type specificity/dependency	UniProt: Comment/tissue specificity BRENDA: Comment/tissue specificity	Cell Ontology BRENDA Tissue Ontology
Cell line	UniProt: Comment/tissue specificity BRENDA: Comment/tissue specificity	Cell Line Ontology BRENDA Tissue Ontology
Cellular location	UniProt: Comment/subcellular location	GO Cellular Component
Pathway	UniProt: Comment/pathway; Comment/function	GO Biological Process
Human Disease	UniProt: Comment/Involvement in disease BRENDA: Various fields*	Human Disease Ontology SNOMED
Tissue specificity/dependency	UniProt: Comment/tissue specificity BRENDA: Comment/tissue specificity	BRENDA Uberon
Developmental stage	UniProt: Comment/developmental stage	BRENDA Tissue Ontology
Isoforms	UniProt: Comment/alternative products/isoform	None
Variants/Genetic Background	UniProt: Feature/sequence variant	None
Kinetics	UniProt: Comment/biophysicochemical properties/KM or Vmax BRENDA: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off BindingDB: Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off	None

*For disease information derived from UniProt, notes are included in the "Secondary Source Notes" column of the file that indicate whether a disease is caused by a mutation in a gene, or else is associated with variations in a gene.

**Note that disease associations for BRENDA data are curated only for cell lines or cells that are derived from tissue in the diseased state.

Additional Download File Fields

Evidence Code

Evidence Code Ontology ID (ECO:0000311 = Imported Information, ECO:0000006 = Experimental, this code has been partially replaced by ECO:0000269 = manually curated information for which there is published experimental evidence , ECO:0000033 = traceable author statement, ECO:0000035 = no biological data found, this code has been partially replaced by ECO:0000303 = manually curated information that is based on statements in scientific articles for which there is no experimental support)

Primary Source

Typically the PubMed ID. Occasionally a PubMed ID may be proceeded by "submission:", this indicates that a variant was directly submitted to UniProt rather than curated from a PubMed ID by UniProt curators.

Secondary Source

Typically an external resource, e.g. UniProtKB, BRENDA. If Secondary Source = Pentacon, then the curated information and ontology terms were added by Pentacon curators and associated with relevant PubMed IDs.

PENTACON Notes

Pentacon curator notes; may contain relevant citation information for the associated papers, and also additional notes that are preceded by the phrase "Pentacon Notes". Additional rows of data added by Pentacon curators and not directly parsed from another resource are indicated as "Added by Pentacon".

Secondary Source Version Date

Version date of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) or of date released by PENTACON for each annotation row.

Secondary Source Version

Version number of downloaded information from external resource (e.g. UniProtKB, BRENDA, BindingDB) or of version released by PENTACON for each annotation row.

PENTACON Annotation No

Unique annotation ID assigned by Pentacon.

Kinetics Specific Fields

Data Value

Text parsed according to UniProt, BRENDA, or BindingDB data type listed in curation topics above, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.

Value

Text parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, text added by Pentacon curators.

Unit

Units (e.g. pmol/min/mg) parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, or, in cases where the Secondary Source = PENTACON, units added by Pentacon curators. ChEBI Chemical Term A Chemical term parsed from UniProt, BRENDA, or BindingDB data type Comment/biophysicochemical properties/KM, K0.5, Vmax, EC50, IC50, Ki, Kd, K-on, or K-off, and converted to ChEBI term. in cases where Secondary Source = PENTACON, chemical terms were added by Pentacon curators.

ChEBI Chemical ID A

For annotation lines containing a ChEBI term referred to as ChEBI:NEW (for example: ChEBI:NEW_148513), this term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID.

ChEBI Chemical Term B

Additional chemical entry converted to ChEBI term when two substrates are identified for the reaction, or when another compound (such as an inhibitor) is used in the presence of a substrate.

ChEBI Chemical ID B

For annotation lines containing a ChEBI term referred to as ChEBI:NEW (for example: ChEBI:NEW_148513), this term has not yet been assigned an ID by ChEBI; the number shown is an internal PENTACON ID.

Secondary Source IDs

IDs that are assigned by the database from which the data was originally parsed. For example, secondary source IDs may be UniProt IDs (e.g. variants have VAR_IDs, isoforms have VSP_IDs), EC numbers (assigned by BRENDA), or BindingDB IDs. The source of the ID is noted in the Secondary Source column (see below). When the Secondary Source is PENTACON, there is no Secondary Source ID.

Secondary Source Notes

Notes parsed from secondary resources (e.g. UniProt, BRENDA, BindingDB)

Ortholog List

The orthologs and analogs page provides a list of orthologs manually selected by PENTACON curators for AAP genes in a subset of key model organisms, including: mouse (Mus musculus), rat (Rattus norvegicus), zebrafish (Danio rerio), and yeast (Saccharomyces cerevisiae).

PENTACON curators identified consensus orthologs/analogs using P-POD version 4 and IMP. Orthologs were identified using P-POD's OrthoMCL analysis when possible or, when the human gene was not assigned to an OrthoMCL family, using P-POD's MultiParanoid analysis. Functional analogs were obtained from IMP using a cutoff of p < 0.05. In some cases, functional analogs were identified directly from the literature; in these cases the supporting PMID and evidence code "9" are noted. PENTACON curators reviewed the ortholog and analog calls and evidence and, using the following evidence codes, identified the consensus ortholog/analog:

Evidence code descriptions

1	P-POD identifies a single ortholog, IMP identifies a single analog, and they agree.
1P	P-POD identifies a single ortholog, IMP identifies a single analog, and they agree; P-POD ortholog is found in MultiParanoid family.
2	Call based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog.
2P	Call based on orthology (P-POD) only. P-POD identified 1 or more orthologs, but IMP did not identify an analog; P-POD ortholog is found in MultiParanoid family.
3	Call based on analogy (IMP) only. IMP identified 1 or more orthologs, but P-POD did not identify an ortholog.
4	Multiple or ambiguous orthologs resolved by referring to analogs.
5	Multiple or ambiguous orthologs resolved by referring to both IMP and P-POD.
6	P-POD and IMP disagree. Selection made by curator judgment.
7	IMP identifies no analog, and P-POD identifies no ortholog.
8	P-POD and IMP identify the same proteins, and curator selected a subset based on additional evidence.
9	Manual annotation of functional analogs based on published literature.

Links and Resources

The P-POD and IMP web sites can be found here:

PENTACON Curation Resource Help Document

PENTACON overview

Table of Contents

Curated Gene Sets:

Curation Process

Statistics for the PENTACON Curated Gene Sets (human genes):

Gene Set Descriptions

Gene Set Qualifiers

Gold Standard (Direct):

Likely (Indirect):

Predicted:

Gene Summary Pages

Pentacon Gene Set

Orthologs

Pentacon Links

External Links

Kinetics data

Download File Information

Curation Topics and Ontologies

Additional Download File Fields

Evidence Code

Primary Source

Secondary Source

PENTACON Notes

Secondary Source Version Date

Secondary Source Version

PENTACON Annotation No

Kinetics Specific Fields

Data Value

Value

Unit

ChEBI Chemical ID A

ChEBI Chemical Term B

ChEBI Chemical ID B

Secondary Source IDs

Secondary Source Notes

Ortholog List

Evidence code descriptions

Links and Resources