Vignette B: Pertpy bridging to NetworkCommons

In this vignette, we showcase how the resources from pertpy can be used with NetworkCommons. Here, we will use part of the vignette Use-case: Deconvoluting drug responses in cancer cell lines, available in the pertpy documentation. This dataset contains single-cell RNA-seq perturbational profiles from 172 cancer cell lines treated with 13 drugs. Due to computational power constrains, we will only showcase this with one cell line and one dataset. However, this approach can be easily expanded to several cell lines and datasets.

[ ]:

import pertpy as pt
import scanpy as sc
import numpy as np
import pandas as pd
import networkcommons as nc

1. Fetch metadata with pertpy

This section was taken from the pertpy documentation, Use-case: Deconvoluting drug responses in cancer cell lines. Please refer to this for further details.

[ ]:

adata = pt.dt.mcfarland_2020()
adata

[4]:

adata.obs["perturbation"].unique().tolist()

[4]:

['Navitoclax',
 'BRD3379',
 'AZD5591',
 'Taselisib',
 'Everolimus',
 'Idasanutlin',
 'Bortezomib',
 'sgLACZ',
 'sgGPX4-1',
 'control',
 'Trametinib',
 'sgOR2J2',
 'Afatinib',
 'Dabrafenib',
 'sgGPX4-2',
 'Gemcitabine',
 'JQ1',
 'Prexasertib']

[ ]:

sc.pp.filter_genes(adata, min_cells=30)

[ ]:

adata.layers["raw_counts"] = adata.X.copy()

[ ]:

# Metadata annotation
cl_metadata = pt.md.CellLine()
cl_metadata.annotate(
    adata,
    query_id="DepMap_ID",
    reference_id="ModelID",
    fetch=["CellLineName", "Age", "OncotreePrimaryDisease", "SangerModelID", "OncotreeLineage"],
)

moa_metadata = pt.md.Moa()
moa_metadata.annotate(
    adata,
    query_id="perturbation",
)

# Add control annotations
adata.obs["moa"] = ["Control" if pert == "control" else moa for moa, pert in zip(adata.obs["moa"], adata.obs["perturbation"])]
adata.obs["target"] = ["Control" if pert == "control" else target for target, pert in zip(adata.obs["target"], adata.obs["perturbation"])]

[ ]:

sc.pl.umap(adata, color=["moa"])

[ ]:

cl_metadata.annotate_from_gdsc(
    adata,
    query_id="SangerModelID",
    reference_id="sanger_model_id",
    query_perturbation='perturbation',
    gdsc_dataset="gdsc_1",
)
adata.obs["ln_ic50_GDSC1"] = adata.obs["ln_ic50"].copy()

cl_metadata.annotate_from_gdsc(
    adata,
    query_id="SangerModelID",
    reference_id="sanger_model_id",
    query_perturbation='perturbation',
    gdsc_dataset="gdsc_2",
)
adata.obs["ln_ic50_GDSC2"] = adata.obs["ln_ic50"].copy()

del adata.obs["ln_ic50"]

[ ]:

adata_dabrafenib = adata[adata.obs["perturbation"].isin(["control", "Dabrafenib"])]

[3]:

adata_dabrafenib = sc.read_h5ad("adata_dabrafenib.h5ad")

[18]:

subset_cells = np.random.choice(adata_dabrafenib.obs["SangerModelID"].unique().tolist(), size=10, replace=False)

[19]:

subset_cells

[19]:

array(['SIDM00759', 'SIDM00582', 'SIDM00963', 'SIDM01150', 'SIDM00143',
       'SIDM00756', 'SIDM01060', 'SIDM00139', 'SIDM01167', 'SIDM01026'],
      dtype='<U32')

[ ]:

logfc_df = pd.DataFrame(columns=adata_dabrafenib.var_names)

for cell_line in subset_cells:

    subset = adata_dabrafenib[adata_dabrafenib.obs["SangerModelID"] == cell_line]
    if subset.n_obs < 20: #Threshold from the McFarland paper
        continue
    if "Dabrafenib" not in subset.obs["perturbation"].unique():
        continue

    edgr = pt.tl.PyDESeq2(subset, design="~perturbation", layer="raw_counts")
    edgr.fit()

    res_df = edgr.test_contrasts(edgr.contrast("perturbation", "control", "Dabrafenib"))
    res_df = res_df[["variable", "log_fc"]]
    res_df = res_df.set_index("variable")
    res_df = res_df.reindex(adata_dabrafenib.var_names)

    logfc_df.loc[cell_line] = res_df["log_fc"]

logfc_df.to_csv(f"output/logfc_df_dabrafenib.csv")

[70]:

logfc_df.head()

[70]:

	RP11-34P13.7	AL627309.1	AP006222.2	RP4-669L17.10	RP11-206L10.3	RP11-206L10.2	RP11-206L10.9	FAM87B	LINC00115	FAM41C	...	MT-ND6	MT-CYB	AC145212.1	MGC39584	AC011043.1	AL592183.1	AC011841.1	AL354822.1	PNRC2-1	SRSF10-1
SIDM01150	-0.029041	-0.154688	0.007570	0.135863	0.080412	0.347878	0.346763	0.000000	-0.148709	-0.343955	...	0.366750	-0.100324	0.011169	0.132036	-0.025371	0.145605	0.013151	0.025349	-0.126633	-0.186471
SIDM00143	0.000000	0.294966	0.046804	-0.218813	0.497196	0.000000	-0.218851	0.000000	0.439098	-0.302195	...	0.400622	0.121451	-0.291270	0.000000	0.483729	0.086145	0.328663	0.168367	-0.006578	0.451708
SIDM01060	0.043288	0.000000	-0.162589	0.065901	0.027507	-0.033283	0.000590	0.000000	-0.502486	-0.351066	...	0.224724	0.005616	-0.017290	0.000000	-0.198974	-0.399949	0.000000	0.131691	0.000000	-0.250337
SIDM01167	0.000000	0.158182	0.484159	0.047778	0.047758	0.000000	-0.186445	-0.102564	0.546340	0.195960	...	0.089615	0.063865	0.063088	0.000000	-0.326881	0.000977	0.000000	0.114601	0.000000	-0.475517

4 rows × 21805 columns

2. TF activity estimation with decoupler-py

Now that we have the gene changes in response to the perturbation, we can perform TF activity estimation with decoupler and CollecTRI. This scores will be the input for the network contextualization methods from NetworkCommons.

[40]:

import decoupler as dc

[42]:

net = dc.get_collectri()

INFO:root:Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/about?format=text`

[44]:

tf_acts, pvals = dc.run_ulm(logfc_df, net)

[45]:

tf_acts

[45]:

	ABL1	AEBP1	AHR	AHRR	AIP	AIRE	AP1	APEX1	AR	ARID1A	...	ZNF382	ZNF384	ZNF395	ZNF410	ZNF436	ZNF699	ZNF76	ZNF804A	ZNF91	ZXDC
SIDM01150	0.080080	-1.631165	1.822143	-0.486594	-0.783931	-0.844352	1.723601	-1.519310	-0.344363	1.623128	...	-4.238011	1.965072	-2.004019	0.069187	2.652415	-0.182054	-1.793975	-0.143599	1.938493	-0.277062
SIDM00143	-1.332259	1.592115	1.183483	0.799785	1.025860	0.468413	4.497127	-1.430149	2.600641	2.156102	...	-0.789468	3.996778	-1.028295	-0.517214	0.598231	1.087119	0.778563	-0.599900	1.412452	-2.276532
SIDM01060	-0.652209	-2.475549	-1.580986	1.089401	-1.944108	0.399068	-5.066813	-2.765601	-0.934723	-0.981179	...	1.715413	-0.158227	1.480932	0.409427	-0.044302	-0.413981	-0.048804	-2.692391	1.406787	-0.253700
SIDM01167	-1.334910	0.017429	-2.367893	0.813763	2.198922	-0.660442	-2.465387	-1.764444	-2.983476	1.955569	...	0.219701	-0.641184	-0.172258	-0.653510	1.087562	2.267288	-1.447545	-0.400557	1.555596	-0.400194

4 rows × 735 columns

[50]:

measurements = tf_acts.loc["SIDM01060"].sort_values(ascending=False, key=abs)[0:25].to_dict()

3. Network inference with NetworkCommons

Now that we have scores for upstream and downstream layers, we can perform network inference with NetworkCommons. For the sake of simplicity and just for demonstration purposes, we will only use the shortest path approach.

[59]:

network = nc.data.network.get_omnipath()
graph = nc.utils.network_from_df(network)

[60]:

source = {'BRAF': -1}

[68]:

shortest_path_network, shortest_path_list = nc.methods.run_shortest_paths(graph, source, measurements)
shortest_sc_network, shortest_sc_list = nc.methods.run_sign_consistency(shortest_path_network, shortest_path_list, source, measurements)

[69]:

visualizer = nc.visual.NetworkXVisualizer(shortest_sc_network)
visualizer.visualize_network(source, measurements, network_type='sign_consistent')

[69]: