Tutorial: Extracting participating molecules using the Graph Database

The participating molecules use case
Retrieving objects based on their identifier
Identifiers for proteins or chemicals
Breaking down complexes and sets to get their participants
Retrieving pathways,subpathways and superpathways
Retrieving the reactions for a given pathway
Retrieving the participants of a Reaction
Joining the pieces: Participating molecules for a pathway

The Reactome Graph Database, also called a graph-oriented database, is a type of NoSQL database that uses graph theory to store, map and query relationships relating to our data content. Each node represents an entity (such as a pathway, reaction or proteins) and each edge represents a relationship between two nodes. Every node in our graph database is defined by a unique identifier, a set of outgoing edges and/or incoming edges and a set of properties expressed as key/value pairs. Each edge is defined by a unique identifier, a starting-place and/or ending-place node and a set of properties.

Cypher is Neo4j’s open graph query language. Cypher’s syntax provides a familiar way to match patterns of nodes and relationships in a graph. If you want to learn more about what is Cypher, please visit the following link.

Graph databases are well-suited for analyzing interconnections, which is why there has been a lot of interest in using graph databases to mine data from biological pathways and reactions. The Reactome Graph database has many advantages, but one is its responsiveness in managing data. Furthermore, even though data queries increase exponentially, the performance of a graph database does not drop, compared to what happens with relational databases. When software developers work with data, they are looking for flexibility and scalability. Our Graph Database contributes a lot in this regard because when needs increase, the possibilities of adding more nodes and relationships to an existing graph are huge.

The participating molecules use case

This tutorial explains how to query Reactome using Cypher. It's assumed that the reader has a basic knowledge of Cypher as well as an understanding of our data model and how data is stored in our schema.

Even though it is not possible to cover all possible queries of the Reactome Graph Database in a single tutorial, the sections in this document build up a query, which will retrieve the resource and identifier of each participating molecule of a given Pathway. There are several intermediate stages to be explained before reaching that point:

How to retrieve objects like proteins, reactions, pathways, etc.
How to get the identifier of proteins or chemicals
How to deconstruct complexes or sets to get their participants
How to retrieve the subpathways for a given pathway
How to retrieve the reactions of a pathway
How to retrieve the participants of a reaction

The enumeration above represents the basic bricks from which to construct the final query that retrieves the participating molecules for a given pathway.

Retrieving objects based on their identifier

To retrieve the Pathway "Antigen processing-Cross presentation" with identifier R-HSA-1236975, the query is as follows:

//Selecting an Pathway by its stable identifier
MATCH (pathway:Pathway{stId:"R-HSA-1236975"})
RETURN pathway

The result of the query is:

╒═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│pathway                                                                                                              │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│{speciesName: Homo sapiens, oldStId: REACT_111119, isInDisease: false, releaseDate: 2011-09-20, displayName: Antigen │
│ processing-Cross presentation, stIdVersion: R-HSA-1236975.1, dbId: 1236975, releaseStatus: UPDATED, name: [Antigen  │
│ processing-Cross presentation], stId: R-HSA-1236975, hasDiagram: false, isInferred: false}                          │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

In the same way, let's focus on an EntityWithAccessionedSequence (EWAS) which corresponds to a protein in Reactome. For this example we use one form of PTEN in the cytosol with identifier R-HSA-199420

//Selecting an EWAS by its stable identifier
MATCH (ewas:EntityWithAccessionedSequence{stId:"R-HSA-199420"})
RETURN ewas

The result is one node of the database:

╒════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│ewas                                                                                                                        │
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│{speciesName: Homo sapiens, startCoordinate: 1, isInDisease: false, displayName: PTEN [cytosol], dbId: 199420, name: [PTEN, │
│ Phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase PTEN, PTEN_HUMAN, MMAC1, TEP1], referenceType: ReferenceGeneProduct,│
│ endCoordinate: 403, stId: R-HSA-199420}                                                                                    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Identifiers for proteins or chemicals

Continuing with the same form of PTEN, another use case could be to retrieve only a couple of fields for the target node. The following query retrieves the EWAS display name and its identifier (from the reference entity):

//Following the reference entity link in order to get the identifier
MATCH (ewas:EntityWithAccessionedSequence{stId:"R-HSA-199420"}),
      (ewas)-[:referenceEntity]->(re:ReferenceEntity)
RETURN ewas.displayName AS EWAS, re.identifier AS Identifier

Please note that the identifier is not directly stored in the node for the EWAS but is a property of another node pointed from the EWAS which is a ReferenceEntity. The previous query accesses that node from the EWAS following the referenceEntity edge. The result of this query is as shown below:

╒══════════════╤══════════╕
│EWAS          │Identifier│
╞══════════════╪══════════╡
│PTEN [cytosol]│P60484    │
└──────────────┴──────────┘

TIP: The MATCH part of the previous query could be written in one line as follows:

//Equivalent to the previous one
MATCH (ewas:EntityWithAccessionedSequence{stId:"R-HSA-199420"})-[:referenceEntity]->(re:ReferenceEntity)
RETURN ewas.displayName AS EWAS, re.identifier AS Identifier

Continuing on, it is possible to construct a query to retrieve the reference database on top of the previously retrieved fields. Please note that in this case the reference database is a node pointed from ReferenceEntity by an edge called referenceDatabase:

//Following the reference entity and database links in order to get the identifier and the database of reference
MATCH (ewas:EntityWithAccessionedSequence{stId:"R-HSA-199420"}),
      (ewas)-[:referenceEntity]->(re:ReferenceEntity)-[:referenceDatabase]->(rd:ReferenceDatabase)
RETURN ewas.displayName AS EWAS, re.identifier AS Identifier, rd.displayName AS Database

╒══════════════╤══════════╤════════╕
│EWAS          │Identifier│Database│
╞══════════════╪══════════╪════════╡
│PTEN [cytosol]│P60484    │UniProt │
└──────────────┴──────────┴────────┘

Breaking down complexes and sets to get their participants

The components of a complex, which are also physical entities, are stored in the "hasComponent" slot. Let's use the complex "Ag-substrate:E3:E2:Ub" with identifier R-HSA-983126 as example in this case:

//First level components for the complex with stable identifier R-HSA-983126
MATCH (Complex{stId:"R-HSA-983126"})-[:hasComponent]->(pe:PhysicalEntity)
RETURN pe.stId AS component_stId, pe.displayName AS component

The result of the query is

╒══════════════╤═══════════════════════════════════════════════╕
│component_stId│component                                      │
╞══════════════╪═══════════════════════════════════════════════╡
│R-NUL-983035  │antigenic substrate [cytosol]                  │
├──────────────┼───────────────────────────────────────────────┤
│R-HSA-976075  │E3 ligases in proteasomal degradation [cytosol]│
├──────────────┼───────────────────────────────────────────────┤
│R-HSA-976165  │Ubiquitin:E2 conjugating enzymes [cytosol]     │
└──────────────┴───────────────────────────────────────────────┘

In this example, the "E3 ligases in proteasomal degradation" is a Set and "E3 ligases in proteasomal degradation" is a Complex. To further deconstruct the initial complex there are some minor changes which should be applied to the Cypher query. Sets can either be DefineSets, OpenSets or CandidateSets. The way to find out which other physical entities are part of them is "traversing" through the "hasMember" or "hasCandidate" slots. The following query will break down the initial complex into ALL its participants:

//All distinct components for the complex with stable identifier R-HSA-983126
MATCH (Complex{stId:"R-HSA-983126"})-[:hasComponent|hasMember|hasCandidate*]->(pe:PhysicalEntity)
RETURN DISTINCT pe.stId AS component_stId, pe.displayName AS component

This query returns 284 entities for v63:

╒════════════════╤════════════════════════╕
│ component_stId │ component              │
╞════════════════╪════════════════════════╡
│ R-HSA-141412   │ CDC20 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174242   │ ANAPC7 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174211   │ ANAPC5 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174052   │ CDC26 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174244   │ UBE2C [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174126   │ ANAPC11 [cytosol]      │
├────────────────┼────────────────────────┤
│ R-HSA-174156   │ CDC16 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174189   │ ANAPC1 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174100   │ UBE2E1 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174229   │ ANAPC2 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174168   │ ANAPC4 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-174073   │ CDC27 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174137   │ CDC23 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-174142   │ ANAPC10 [cytosol]      │
├────────────────┼────────────────────────┤
│ R-HSA-174236   │ UBE2D1 [cytosol]       │
├────────────────┼────────────────────────┤
│ R-HSA-976009   │ CBLB [cytosol]         │
├────────────────┼────────────────────────┤
│ R-HSA-976042   │ MKRN1 [cytosol]        │
├────────────────┼────────────────────────┤
│ R-HSA-939214   │ UBB(1-76) [cytosol]    │
├────────────────┼────────────────────────┤
│ R-HSA-939213   │ UBB(77-152) [cytosol]  │
├────────────────┼────────────────────────┤
│ R-HSA-939239   │ UBC(533-608) [cytosol] │
├────────────────┼────────────────────────┤
│...             │...                     │
└────────────────┴────────────────────────┘

Retrieving pathways, subpathways and superpathways

In this example, we focus on the pathway "Class I MHC mediated antigen processing & presentation" with identifier R-HSA-983169. To find out its subpathways, the slot to query is "hasEvent":

//Direct subpathways for the pathway with stable identifier R-HSA-198933
MATCH (p:Pathway{stId:"R-HSA-983169"})-[:hasEvent]->(sp:Pathway)
RETURN p.stId AS Pathway, sp.stId AS SubPathway, sp.displayName as DisplayName

The result for v63 returns 3 subpathways:

╒════════════╤═════════════╤══════════════════════════════════════════════════════════════════════════╕
│Pathway     │SubPathway   │DisplayName                                                               │
╞════════════╪═════════════╪══════════════════════════════════════════════════════════════════════════╡
│R-HSA-983169│R-HSA-983168 │Antigen processing: Ubiquitination & Proteasome degradation               │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236975│Antigen processing-Cross presentation                                     │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-983170 │Antigen Presentation: Folding, assembly and peptide loading of class I MHC│
└────────────┴─────────────┴──────────────────────────────────────────────────────────────────────────┘

It is important to note that subpathways might contain other subpathways, so to get ALL the supathways of R-HSA-198933, the query is as follows:

//ALL subpathways for the pathway with stable identifier R-HSA-198933
MATCH (p:Pathway{stId:"R-HSA-983169"})-[:hasEvent*]->(sp:Pathway)
RETURN p.stId AS Pathway, sp.stId AS SubPathway, sp.displayName as DisplayName

In this case the number of subpathways is increased to 7:

╒════════════╤═════════════╤══════════════════════════════════════════════════════════════════════════╕
│Pathway     │SubPathway   │DisplayName                                                               │
╞════════════╪═════════════╪══════════════════════════════════════════════════════════════════════════╡
│R-HSA-983169│R-HSA-983170 │Antigen Presentation: Folding, assembly and peptide loading of class I MHC│
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236975│Antigen processing-Cross presentation                                     │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236978│Cross-presentation of soluble exogenous antigens (endosomes)              │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236977│Endosomal/Vacuolar pathway                                                │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236974│ER-Phagosome pathway                                                      │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-1236973│Cross-presentation of particulate exogenous antigens (phagosomes)         │
├────────────┼─────────────┼──────────────────────────────────────────────────────────────────────────┤
│R-HSA-983169│R-HSA-983168 │Antigen processing: Ubiquitination & Proteasome degradation               │
└────────────┴─────────────┴──────────────────────────────────────────────────────────────────────────┘

Following the same approach, retrieving the superpathway is as easy as changing the direction of the edge in the query:

//Direct superpathway for the pathway with stable identifier R-HSA-198933
MATCH (p:Pathway{stId:"R-HSA-983169"})<-[:hasEvent]-(sp:Pathway)
RETURN p.stId AS Pathway, sp.stId AS SuperPathway, sp.displayName as DisplayName

It will then retrieve the only one pathway containing R-HSA-198933:

╒════════════╤═════════════╤══════════════════════╕
│Pathway     │SuperPathway │DisplayName           │
╞════════════╪═════════════╪══════════════════════╡
│R-HSA-983169│R-HSA-1280218│Adaptive Immune System│
└────────────┴─────────────┴──────────────────────┘

As subpathways, the superpathways might have other superpathways, so following the "hasEvent" slot recursively will show ALL the superpathways up to the root:

//ALL superpathways for the pathway with stable identifier R-HSA-198933
MATCH (p:Pathway{stId:"R-HSA-983169"})<-[:hasEvent*]-(sp:Pathway)
RETURN p.stId AS Pathway, sp.stId AS SuperPathway, sp.displayName as DisplayName

There are 2 superpathways for R-HSA-198933 in v63:

╒════════════╤═════════════╤══════════════════════╕
│Pathway     │SuperPathway │DisplayName           │
╞════════════╪═════════════╪══════════════════════╡
│R-HSA-983169│R-HSA-1280218│Adaptive Immune System│
├────────────┼─────────────┼──────────────────────┤
│R-HSA-983169│R-HSA-168256 │Immune System         │
└────────────┴─────────────┴──────────────────────┘

Retrieving the reactions for a given pathway

Continuing with the pathway "Class I MHC mediated antigen processing & presentation" with identifier R-HSA-983169, to get ALL the reactions contained either directly in it or as part of any of its subpathways, the query has to recursively traverse the "hasEvent" slot:

//All reactions for the pathway with stable identifier R-HSA-198933
MATCH (p:Pathway{stId:"R-HSA-983169"})-[:hasEvent*]->(rle:ReactionLikeEvent)
RETURN p.stId AS Pathway, rle.stId AS Reaction, rle.displayName AS ReactionName

As shown in the table below, this pathway contains 51 reactions for v63:

╒══════════════╤═══════════════╤═══════════════════════════════════════════════════════════════════════════════════╕
│ Pathway      │ Reaction      │ ReactionName                                                                      │
╞══════════════╪═══════════════╪═══════════════════════════════════════════════════════════════════════════════════╡
│ R-HSA-983169 │ R-HSA-983148  │ Interaction of Erp57 with MHC class I HC                                          │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-8951499 │ Loading of antigenic peptides on to class I MHC                                   │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983146  │ Interaction of beta-2-microglobulin (B2M) chain with  class I HC                  │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983145  │ Binding of newly synthesized MHC class I heavy chain (HC) with calnexin           │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983144  │ Transport of Antigen peptide in to ER                                             │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-203979  │ Coat Assembly                                                                     │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983142  │ Formation of peptide loading complex (PLC)                                        │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983427  │ Expression of peptide bound class I MHC on cell surface                           │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983138  │ Transport of MHC heterotrimer to ER exit site                                     │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983426  │ Capturing cargo and formation of prebudding complex                               │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983425  │ Recruitment of Sec31p:Sec13p to prebudding complex and formation of COPII vesicle │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983424  │ Budding of COPII coated vesicle                                                   │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983422  │ Disassembly of COPII coated vesicle                                               │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983421  │ Journey of cargo through Golgi complex                                            │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-983169 │ R-HSA-983161  │ Dissociation of the Antigenic peptide:MHC:B2M peptide loading complex             │
├──────────────┼───────────────┼───────────────────────────────────────────────────────────────────────────────────┤
│...           │...            │...                                                                                │
└──────────────┴───────────────┴───────────────────────────────────────────────────────────────────────────────────┘

Retrieving the participants of a Reaction

In this case let's use the reaction "IKKB phosphorylates SNAP23" with identifier R-HSA-8863895. Reactions have inputs, outputs, catalysts and regulations, so to know the participants of a reaction, all these slots have to be taken into account. Please note that the physical entity acting as catalyst is stored in the "physicalEntity" slot of the class "CatalystActivity" and the one belonging to the regulation is stored in the "regulator" slot of the "Regulation" class. So the query is as follows:

//First level paticipating molecules for reaction R-HSA-8863895
MATCH (r:ReactionLikeEvent{stId:"R-HSA-8863895"})-[:input|output|catalystActivity|physicalEntity|regulatedBy|regulator*]->(pe:PhysicalEntity)
RETURN DISTINCT r.stId AS Reaction, pe.stId as Participant, pe.displayName AS DisplayName

The result of it is 6 physical entities, where two of them are complexes:

╒═════════════╤═════════════╤═══════════════════════════════════════════════════╕
│Reaction     │Participant  │DisplayName                                        │
╞═════════════╪═════════════╪═══════════════════════════════════════════════════╡
│R-HSA-8863895│R-HSA-168113 │CHUK:IKBKB:IKBKG [cytosol]                         │
├─────────────┼─────────────┼───────────────────────────────────────────────────┤
│R-HSA-8863895│R-ALL-113592 │ATP [cytosol]                                      │
├─────────────┼─────────────┼───────────────────────────────────────────────────┤
│R-HSA-8863895│R-HSA-8863966│SNAP23 [phagocytic vesicle membrane]               │
├─────────────┼─────────────┼───────────────────────────────────────────────────┤
│R-HSA-8863895│R-HSA-8863923│p-S95-SNAP23 [phagocytic vesicle membrane]         │
├─────────────┼─────────────┼───────────────────────────────────────────────────┤
│R-HSA-8863895│R-ALL-29370  │ADP [cytosol]                                      │
├─────────────┼─────────────┼───────────────────────────────────────────────────┤
│R-HSA-8863895│R-HSA-937033 │oligo-MyD88:Mal:BTK:activated TLR [plasma membrane]│
└─────────────┴─────────────┴───────────────────────────────────────────────────┘

As shown above, to break down complexes and sets into their participants, the "hasComponent", "hasMember" and "hasCandidate" slots have to be taken into account. Adding them into the previous query will retrieve ALL the participants of the reaction:

//ALL paticipating molecules for reaction R-HSA-8863895
MATCH (r:ReactionLikeEvent{stId:"R-HSA-8863895"})-[:input|output|catalystActivity|physicalEntity|regulatedBy|regulator|hasComponent|hasMember|hasCandidate*]->(pe:PhysicalEntity)
RETURN DISTINCT r.stId AS Reaction, pe.stId as Participant, pe.displayName AS DisplayName

With the modifications, the result of the query goes up to 43 physical entities in v63:

╒═══════════════╤═══════════════╤══════════════════════════════════════════════════════════════════════════════╕
│ Reaction      │ Participant   │ DisplayName                                                                  │
╞═══════════════╪═══════════════╪══════════════════════════════════════════════════════════════════════════════╡
│ R-HSA-8863895 │ R-HSA-168113  │ CHUK:IKBKB:IKBKG [cytosol]                                                   │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-168114  │ IKBKB [cytosol]                                                              │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-168104  │ CHUK [cytosol]                                                               │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-168108  │ IKBKG [cytosol]                                                              │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-ALL-113592  │ ATP [cytosol]                                                                │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-8863966 │ SNAP23 [phagocytic vesicle membrane]                                         │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-8863923 │ p-S95-SNAP23 [phagocytic vesicle membrane]                                   │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-ALL-29370   │ ADP [cytosol]                                                                │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-937033  │ oligo-MyD88:Mal:BTK:activated TLR [plasma membrane]                          │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-937013  │ MyD88 oligomer [plasma membrane]                                             │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-937017  │ MYD88 [plasma membrane]                                                      │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-2201325 │ activated TLR2/4:p-4Y-MAL:PI(4,5)P2:BTK [plasma membrane]                    │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-5365824 │ p-4Y-TIRAP:PI(4,5)P2 [plasma membrane]                                       │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-ALL-179856  │ PI(4,5)P2 [plasma membrane]                                                  │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-2201321 │ p-4Y-TIRAP [plasma membrane]                                                 │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-181230  │ Activated TLR1:2 or TLR 2:6 heterodimers or TLR4 homodimer [plasma membrane] │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-181410  │ TLR6:TLR2:ligand:CD14:CD36 [plasma membrane]                                 │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-2559461 │ TLR6/2 ligand:CD14:CD36 [plasma membrane]                                    │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│ R-HSA-8863895 │ R-HSA-166033  │ GPIN-CD14(20-345) [plasma membrane]                                          │
├───────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────┤
│...            │...            │...                                                                           │
└───────────────┴───────────────┴──────────────────────────────────────────────────────────────────────────────┘

Joining the pieces: Participating molecules for a pathway

The aim of this tutorial was to describe a number of different queries to the Reactome Graph Database that joined together would retrieve the resource and identifier of each participating molecule for a given Pathway. This final example will demonstrate how to concatenate all the individual query described previously into a single query.

Starting from the pathway "Class I MHC mediated antigen processing & presentation" with identifier R-HSA-983169, first we need to find out all the reactions contained in it. For each reaction we want to find their participants and for the cases of complexes and sets, we want to break them down into single physical entities. Finally, for each physical entity we are interested in their identifier and resource:

//ALL paticipating molecules for pathway R-HSA-983169
MATCH (p:Pathway{stId:"R-HSA-983169"})-[:hasEvent*]->(rle:ReactionLikeEvent),
      (rle)-[:input|output|catalystActivity|physicalEntity|regulatedBy|regulator|hasComponent|hasMember|hasCandidate*]->(pe:PhysicalEntity),
      (pe)-[:referenceEntity]->(re:ReferenceEntity)-[:referenceDatabase]->(rd:ReferenceDatabase)
RETURN DISTINCT re.identifier AS Identifier, rd.displayName AS Database

For version 63, the pathway has 463 participating molecules as shown below:

╒════════════╤══════════╕
│ Identifier │ Database │
╞════════════╪══════════╡
│ P11021     │ UniProt  │
├────────────┼──────────┤
│ P27824     │ UniProt  │
├────────────┼──────────┤
│ P30501     │ UniProt  │
├────────────┼──────────┤
│ P30486     │ UniProt  │
├────────────┼──────────┤
│ P01893     │ UniProt  │
├────────────┼──────────┤
│ P30447     │ UniProt  │
├────────────┼──────────┤
│ P30685     │ UniProt  │
├────────────┼──────────┤
│ P18465     │ UniProt  │
├────────────┼──────────┤
│ P18464     │ UniProt  │
├────────────┼──────────┤
│ P30460     │ UniProt  │
├────────────┼──────────┤
│ P30490     │ UniProt  │
├────────────┼──────────┤
│ P30495     │ UniProt  │
├────────────┼──────────┤
│ P30493     │ UniProt  │
├────────────┼──────────┤
│ P13747     │ UniProt  │
├────────────┼──────────┤
│ P30488     │ UniProt  │
├────────────┼──────────┤
│ P30511     │ UniProt  │
├────────────┼──────────┤
│ P30483     │ UniProt  │
├────────────┼──────────┤
│ P30462     │ UniProt  │
├────────────┼──────────┤
│ P04439     │ UniProt  │
├────────────┼──────────┤
│ ...        │ ...      │
└────────────┴──────────┘

This concludes the introductory tutorial on how to build a Cypher query to retrieve all the participating molecules for a given pathway. For questions or suggestions, please get in touch our This email address is being protected from spambots. You need JavaScript enabled to view it.