Quantitative, multi-dataset Pathway Analysis (ReactomeGSA)

ReactomeGSA is a new pathway analysis tool integrated into the Reactome ecosystem. Its main feature is that it performs quantitative pathway analyses (so-called gene set analyses). This increases the statistical power of the differential expression analysis, which is directly performed on the pathway level.

ReactomeGSA can analyse multiple datasets simultaneously resulting in a comparative pathway analysis. Thereby, it is possible to quickly assess whether the same effect was observed in independent experiments or studies.

ReactomeGSA currently supports quantitative proteomics, transcriptomics, and microarray data. Datasets from all of these methods can be combined in a single analysis. Thereby, ReactomeGSA can perform multi-omics pathway analyses.

Using ReactomeGSA

We currently offer three ways to access ReactomeGSA:

  • Reactome’s web-based pathway browser (see below)
  • From R using our ReactomeGSA Bioconductor R package (see here)
  • Programmatically, using the ReactomeGSA API at https://gsa.reactome.org

ReactomeGSA in the Pathway Browser

ReactomeGSA is integrated into Reactome’s pathway browser under the “Analyse gene expression” tab.

select

In this first screen, you need to select the algorithm to use for the differential pathway analysis. At the time of writing, ReactomeGSA offers three algorithms. PADOG and Camera perform a differential expression analysis between two groups of samples. ssGSEA is a so-called gene set variation approach that returns pathway-level quantitative data for each sample.

The detailed parameters for each algorithm can be adapted by clicking the blue icon on the left of the algorithm’s box.

Adding datasets

After clicking “Next”, you are presented with the now empty list of datasets. Click the “+ Add dataset” button to add a new dataset.

add dataset

First, you have to select the type of dataset you want to load.

To upload your own data, select one of the options under “Select a file from a local folder”. The file must be a tab-delimited text file (CSV or TSV file) where the first column contains the gene or protein identifiers and all subsequent columns the respective samples. The first row contains the sample names and all subsequent rows the genes / proteins.

To test the tool, it is possible to quickly load example data. Currently, ReactomeGSA provides three example datasets, two (matched) datasets on melanoma associated B cells (proteomics and transcriptomics measurements) and one scRNA-seq dataset on B cells.

Finally, it is possible to directly load datasets from ExpressionAtlas. To do so, first navigate to ExpressionAtlas (in a separate tab or window) at https://www.ebi.ac.uk/gxa. Once you have identified a dataset of interest, you can get the dataset’s id by opening the dataset and copying the portion after “www.ebi.ac.uk/” and before the next “/”.

For example, if you opened https://www.ebi.ac.uk/gxa/experiments/E-MTAB-6592/Results the identifier of this dataset would be “E-MTAB-6592”.

For single-cell experiments you additionally have to define the parameter “k” in order to define which clusters should be used. The effect of “k” on the results can be visualised on the first page of the respective single cell experiment in ExpressionAtlas (see https://www.ebi.ac.uk/gxa/sc/experiments/E-CURD-46/results/tsne as an example). In this case, the dataset identifier would be “E-CURD-46” and possible values for “k” would be 17, 25, 32, for example.

Annotating experimental metadata

Once a dataset is added, you need to annotate the experimental metadata. This is necessary in order to define the groups for the differential expression analysis in the next step.

You can adapt the dataset’s name in the “Dataset name” box at the top. This name will be used for all results. To increase the readability, we suggest to use as short names as possible.

In case you load data from ExpressionAtlas or choose one of the example datasets (as shown in the screenshot) the sample annotation table will already be pre-filled with certain metadata. In case you uploaded your own dataset, the table will only show the orange sample labels on the left.

To add an annotation, click the “plus” symbol on the right. This will add a new empty column to the table. First, add a heading to define the name of the property (for example, “treatment”). Next, add the values for every sample that you want to include in your comparisons. Samples without any values will simply be ignored.

Defining the experimental design

add dataset exp

In the final step of adding a dataset, you have to define which groups to compare. The “comparison factor” drop-down menu contains all parameters that were annotated in the sample table before (if they contain at least two different values). “1st group” and “2nd group” define which groups of samples to compare against each other.

Depending on which “comparison factor” you select, the available values for the “1st group” and “2nd group” will change automatically.

Additionally, some gene set analysis methods allow you to define so-called “covariates”. These are parameters that might cause a bias in your result (ie. the sequencing facility used) that you would like to correct for. Simply select the relevant ones for your experiment.

Once you click “Continue” you will be returned to the list of datasets where you will now see your annotated dataset in the list. If you want, you can add any number of datasets to a single request.

Starting the analysis

start analysis

Once you click “Continue” from the dataset list view, the final analysis options are shown.

“Create REACTOME visualizations” is always selected. If it is de-selected, the result cannot be visualised in Reactome’s pathway browser. This option is generally only relevant to users of the ReactomeGSA R package.

If you select “Create reports” ReactomeGSA will automatically create a Microsoft Excel and PDF report of your results. Additionally, it will create a short R script that allows you to load your data directly into an R session.

In case you provide your email address, you are automatically notified as soon as the analysis is complete. The mail will contain direct links to the generated reports (if you chose to create them) and a link to the visualisation in the PathwayBrowser.

Launch the analysis by clicking the “GO” button.

If you provided an email address, you can also close your browser. The analysis will still continue on our servers and you will be notified as soon as it is done. Some analysis with many large datasets (for example comparing five TCGA datasets in a single analysis) may require up to an hour to complete.

Citing ReactomeGSA

If you use ReactomeGSA in your research, please cite the following publication:

ReactomeGSA - Efficient Multi-Omics Comparative Pathway Analysis

Johannes Griss, Guilherme Viteri, Konstantinos Sidiropoulos, Vy Nguyen, Antonio Fabregat, Henning Hermjakob

bioRxiv 2020.04.16.044958; doi: https://doi.org/10.1101/2020.04.16.044958

Cite Us!