Vignette 3: Evaluating methods under two scenarios: offtarget recovery and pathway activity
Welcome to our third vignette! Here we assume you already know the basics of Networkcommons. If you don’t, please check our first vignette, where we contextualise a simple network, and our second vignette, where we apply different methods to a problem and visualise the different solution networks.
In this vignette, we will start exploring how we can evaluate the different methods used for network contextualisation. As already hinted at the end of the second vignette, it is not trivial how to decide which method performs better. In Networkcommons, we have included four strategies to do so, whose details can be found in this section.
Here, we will showcase two of them: offtarget recovery and pathway enrichment analysis. In the first one, we aim to recover as many drug offtargets as possible, while in the latter setting, the components of a perturbed pathway should be overrepresented in the solution networks.
[4]:
import networkcommons as nc
import pandas as pd
import decoupler as dc
1. Loading preprocessed transcriptomics data
Like in the previous vignettes, we will use a specific contrast from the PANACEA (Afatinib versus DMSO in ASPC cell line) to extract the transcription factors that are dysregulated in this scenario. We will skip the explanation of the data processing, but it can be found in the PANACEA data section. Here, we will use H1793 cells treated with neratinib (an EGFR inhibitor) vs DMSO as control.
[5]:
# downstream layer
dc_estimates = nc.data.omics.panacea_tables(cell_line='H1793', drug='NERATINIB', type='TF_scores')
dc_estimates.set_index('items', inplace=True)
measurements = nc.utils.targetlayer_formatter(dc_estimates, act_col='act')
# upstream layer
source_df = pd.DataFrame({'source': ['EGFR'],
'sign': [-1]}, columns=['source', 'sign'])
source_df.set_index('source', inplace=True)
sources = source_df['sign'].to_dict()
2. Network inference
Now we have our sources and targets, we will import the PKN from OmniPath:
[6]:
network = nc.data.network.get_omnipath()
graph = nc.utils.network_from_df(network)
Now, we will contextualise different subnetworks using different methods.
[ ]:
# topological methods
shortest_path_network, shortest_paths_list = nc.methods.run_shortest_paths(graph, sources, measurements)
shortest_sc_network, shortest_sc_list = nc.methods.run_sign_consistency(shortest_path_network, shortest_paths_list, sources, measurements)
all_paths_network, all_paths_list = nc.methods.run_all_paths(graph, sources, measurements, depth_cutoff=3)
allpaths_sc_network, allpaths_sc_list = nc.methods.run_sign_consistency(all_paths_network, all_paths_list, sources, measurements)
# diffusion-like methods
ppr_network = nc.methods.add_pagerank_scores(graph, sources, measurements, personalize_for='source')
ppr_network = nc.methods.add_pagerank_scores(ppr_network, sources, measurements, personalize_for='target')
ppr_network = nc.methods.compute_ppr_overlap(ppr_network, percentage=1)
shortest_ppr_network, shortest_ppr_list = nc.methods.run_shortest_paths(ppr_network, sources, measurements)
shortest_sc_ppr_network, shortest_sc_ppr_list = nc.methods.run_sign_consistency(shortest_ppr_network, shortest_ppr_list, sources, measurements)
# ILP-based
corneto_network = nc.methods.run_corneto_carnival(graph, sources, measurements, betaWeight=0.01, solver='GUROBI')
We will now store the networks in a dictionary to ease handling.
[8]:
# we include all the networks in a dictionary with custom labels
networks = {
'shortest_path': shortest_path_network,
'shortest_path_sc': shortest_sc_network,
'all_paths': all_paths_network,
'all_paths_sc': allpaths_sc_network,
'shortest_ppr_network': shortest_ppr_network,
'shortest_ppr_sc_network': shortest_sc_ppr_network,
'corneto': corneto_network
}
3. Evaluation strategy using offtarget recovery
Now, we are ready to start our assessment of the methods’ performance! A first evaluation strategy we can use is based on the recovery of offtarget(s) in the solution networks. Here, we assume that the measurements we use as downstream values are a product not only of the perturbation of the primary drug target, but also of the perturbation of one or more offtarget nodes. Therefore, a network that can accurately contextualise these effects, will have a higher share of offtargets among the subnetwork nodes. You can find more details about this setting in the Evaluation strategies details page.
In this specific case, we want to evaluate whether the networks contain the offtarget MAPKAPK2, as contained in the PANACEA gold standard.
[9]:
panacea_gold_standard = nc.data.omics.panacea_gold_standard()
neratinib_offtargets = panacea_gold_standard[(panacea_gold_standard['cmpd'] == 'NERATINIB') & (~panacea_gold_standard['target'].isin(sources.keys()))].target.tolist()
offtarget_res = nc.eval.get_metric_from_networks(networks, nc.eval.get_recovered_offtargets, offtargets=neratinib_offtargets)
offtarget_res
[9]:
| n_offtargets | perc_offtargets | network | type | method | |
|---|---|---|---|---|---|
| 0 | 0 | 0.0 | shortest_path | real | shortest_path |
| 1 | 0 | 0.0 | shortest_path_sc | real | shortest_path_sc |
| 2 | 0 | 0.0 | all_paths | real | all_paths |
| 3 | 0 | 0.0 | all_paths_sc | real | all_paths_sc |
| 4 | 0 | 0.0 | shortest_ppr_network | real | shortest_ppr_network |
| 5 | 0 | 0.0 | shortest_ppr_sc_network | real | shortest_ppr_sc_network |
| 6 | 0 | 0.0 | corneto | real | corneto |
Unfortunately, none of them managed to recover this effect. This might be due to several reasons, but the main limitation is that only offtargets that are reachable from the source node can be considered. Everything not reachable from the source cannot be part of a solution network. In addition, having only one offtarget greatly reduces the statistical power to draw conclusions here (is this one offtarget missing just per chance, or is there a mechanistic explanation behind?). While this approach can be useful in some situations, it is equally important to assess whether the results might be biased or not in every case.
However, this was just an example using one drug and one cell line combination. However, in order to get a more unbiased metric, we can repeat the same analysis using the whole PANACEA dataset. Here, we provide a one_liner that performs systematic network inference using the methods implemented in NetworkCommons, using the different cell line and drug combinations available in the PANACEA dataset. For this, the function nc.eval.get_offtarget_panacea_evaluation() wraps the panacea data
loaders, the current methods, and the evaluation functions, to provide an end-to-end framework to obtain offtarget recovery results across many different biological conditions.
Warning
Please beware that this function is very computationally intensive. A strong computing environment, such as HPC, is recommended for this.
[ ]:
offtarget_results = nc.eval.get_offtarget_panacea_evaluation(cell='H1793')
4. Evaluation strategy using pathway enrichment analysis
For the second evaluation setting, we will follow here a very intuitive approach: if a network contextualises properly a given perturbation, a large share of members of the perturbed pathway will appear in the solution network. Therefore, we will perform Overrepresentation analysis between the members of these networks, and a gene set from Biocarta containing the members of the EGF canonical pathway, in addition to an alternative pathway involving SMRT inhibition by the Tyrosine-Kinase signalling pathway. If the network contextualises properly, the affected pathway(s) will have a higher ORA score (they will be ranked higher than other pathways). You can find more details about this strategy in the Evaluation strategies details page.
First, we neet the pathway signatures from BioCarta, which are available in the MSigDB database.
[ ]:
signatures = dc.get_resource('MSigDB', organism='human')
biocarta_elements = signatures[signatures['collection'] == 'biocarta_pathways']
biocarta_elements.rename(columns={'geneset': 'source', 'genesymbol': 'target'}, inplace=True)
biocarta_elements.drop_duplicates(inplace=True)
[11]:
ora_results = nc.eval.get_metric_from_networks(networks, nc.eval.run_ora, net=biocarta_elements)
[12]:
biocarta_elements
[12]:
| target | collection | source | |
|---|---|---|---|
| 260 | ICOSLG | biocarta_pathways | BIOCARTA_CTLA4_PATHWAY |
| 387 | FOSL2 | biocarta_pathways | BIOCARTA_RANKL_PATHWAY |
| 938 | PLAU | biocarta_pathways | BIOCARTA_FIBRINOLYSIS_PATHWAY |
| 1091 | PLAU | biocarta_pathways | BIOCARTA_PLATELETAPP_PATHWAY |
| 1684 | BTG1 | biocarta_pathways | BIOCARTA_BTG2_PATHWAY |
| ... | ... | ... | ... |
| 2392873 | IFNA13 | biocarta_pathways | BIOCARTA_INFLAM_PATHWAY |
| 2397261 | GSTA2 | biocarta_pathways | BIOCARTA_ARENRF2_PATHWAY |
| 2401665 | SAG | biocarta_pathways | BIOCARTA_RHODOPSIN_PATHWAY |
| 2402175 | GRK1 | biocarta_pathways | BIOCARTA_RHODOPSIN_PATHWAY |
| 2403146 | SLC25A22 | biocarta_pathways | BIOCARTA_RHODOPSIN_PATHWAY |
4803 rows × 3 columns
[16]:
# we manually input the expected perturbed pathways
elements = ['BIOCARTA_EGF_PATHWAY', 'BIOCARTA_EGFR_SMRTE_PATHWAY']
# and we take 10 random pathways to complete the list (some of them might be missing in the vis since an ora score might have not been computed for them)
elements = elements + biocarta_elements[~biocarta_elements['source'].isin(elements)].source.sample(n=10).tolist()
[18]:
p = nc.visual.create_heatmap(ora_results, elements)
Almost all methods correctly identified the perturbed pathway. Shortest paths performed best, while ILP-based corneto performed worst under this assumption.
And that was it for now! We hope you found this tutorial helpful. We focused here on strategies to evaluate the networks, showcasing two of them. However, this is just the beginning! We aim to expand this collection of tools and methods by incorporating more data sources, methods, evaluation strategies and visualizations.
If you have any questions or feedback, or you would like to contribute, please reach out!. In addition, check our Contribution guidelines for Data, Methods and Evaluation strategies.