Vignette B: Pertpy bridging to NetworkCommons

In this vignette, we showcase how the resources from pertpy can be used with NetworkCommons. Here, we will use part of the vignette Use-case: Deconvoluting drug responses in cancer cell lines, available in the pertpy documentation. This dataset contains single-cell RNA-seq perturbational profiles from 172 cancer cell lines treated with 13 drugs. Due to computational power constrains, we will only showcase this with one cell line and one dataset. However, this can be expanded

[1]:
import pertpy as pt
import scanpy as sc
import numpy as np
import pandas as pd
[2]:
import networkcommons as nc

1. Processing with pertpy

This section was taken from the pertpy documentation, Use-case: Deconvoluting drug responses in cancer cell lines. Please refer to this for further details.

[ ]:
adata = pt.dt.mcfarland_2020()
adata
[6]:
adata.write_h5ad(
    "adata.h5ad"
)
[3]:
adata = sc.read_h5ad("adata.h5ad")
[4]:
adata.obs["perturbation"].unique().tolist()
[4]:
['Navitoclax',
 'BRD3379',
 'AZD5591',
 'Taselisib',
 'Everolimus',
 'Idasanutlin',
 'Bortezomib',
 'sgLACZ',
 'sgGPX4-1',
 'control',
 'Trametinib',
 'sgOR2J2',
 'Afatinib',
 'Dabrafenib',
 'sgGPX4-2',
 'Gemcitabine',
 'JQ1',
 'Prexasertib']
[ ]:
sc.pp.filter_genes(adata, min_cells=30)
[ ]:
adata.layers["raw_counts"] = adata.X.copy()
[ ]:
# Metadata annotation
cl_metadata = pt.md.CellLine()
cl_metadata.annotate(
    adata,
    query_id="DepMap_ID",
    reference_id="ModelID",
    fetch=["CellLineName", "Age", "OncotreePrimaryDisease", "SangerModelID", "OncotreeLineage"],
)

moa_metadata = pt.md.Moa()
moa_metadata.annotate(
    adata,
    query_id="perturbation",
)

# Add control annotations
adata.obs["moa"] = ["Control" if pert == "control" else moa for moa, pert in zip(adata.obs["moa"], adata.obs["perturbation"])]
adata.obs["target"] = ["Control" if pert == "control" else target for target, pert in zip(adata.obs["target"], adata.obs["perturbation"])]
[ ]:
sc.pl.umap(adata, color=["moa"])
[ ]:
cl_metadata.annotate_from_gdsc(
    adata,
    query_id="SangerModelID",
    reference_id="sanger_model_id",
    query_perturbation='perturbation',
    gdsc_dataset="gdsc_1",
)
adata.obs["ln_ic50_GDSC1"] = adata.obs["ln_ic50"].copy()

cl_metadata.annotate_from_gdsc(
    adata,
    query_id="SangerModelID",
    reference_id="sanger_model_id",
    query_perturbation='perturbation',
    gdsc_dataset="gdsc_2",
)
adata.obs["ln_ic50_GDSC2"] = adata.obs["ln_ic50"].copy()

del adata.obs["ln_ic50"]
[ ]:
adata_dabrafenib = adata[adata.obs["perturbation"].isin(["control", "Dabrafenib"])]
[3]:
adata_dabrafenib = sc.read_h5ad("adata_dabrafenib.h5ad")
[18]:
subset_cells = np.random.choice(adata_dabrafenib.obs["SangerModelID"].unique().tolist(), size=10, replace=False)

[19]:
subset_cells
[19]:
array(['SIDM00759', 'SIDM00582', 'SIDM00963', 'SIDM01150', 'SIDM00143',
       'SIDM00756', 'SIDM01060', 'SIDM00139', 'SIDM01167', 'SIDM01026'],
      dtype='<U32')
[ ]:
logfc_df = pd.DataFrame(columns=adata_dabrafenib.var_names)

for cell_line in subset_cells:

    subset = adata_dabrafenib[adata_dabrafenib.obs["SangerModelID"] == cell_line]
    if subset.n_obs < 20: #Threshold from the McFarland paper
        continue
    if "Dabrafenib" not in subset.obs["perturbation"].unique():
        continue

    edgr = pt.tl.PyDESeq2(subset, design="~perturbation", layer="raw_counts")
    edgr.fit()

    res_df = edgr.test_contrasts(edgr.contrast("perturbation", "control", "Dabrafenib"))
    res_df = res_df[["variable", "log_fc"]]
    res_df = res_df.set_index("variable")
    res_df = res_df.reindex(adata_dabrafenib.var_names)

    logfc_df.loc[cell_line] = res_df["log_fc"]

logfc_df.to_csv(f"output/logfc_df_dabrafenib.csv")
[70]:
logfc_df.head()
[70]:
RP11-34P13.7 AL627309.1 AP006222.2 RP4-669L17.10 RP11-206L10.3 RP11-206L10.2 RP11-206L10.9 FAM87B LINC00115 FAM41C ... MT-ND6 MT-CYB AC145212.1 MGC39584 AC011043.1 AL592183.1 AC011841.1 AL354822.1 PNRC2-1 SRSF10-1
SIDM01150 -0.029041 -0.154688 0.007570 0.135863 0.080412 0.347878 0.346763 0.000000 -0.148709 -0.343955 ... 0.366750 -0.100324 0.011169 0.132036 -0.025371 0.145605 0.013151 0.025349 -0.126633 -0.186471
SIDM00143 0.000000 0.294966 0.046804 -0.218813 0.497196 0.000000 -0.218851 0.000000 0.439098 -0.302195 ... 0.400622 0.121451 -0.291270 0.000000 0.483729 0.086145 0.328663 0.168367 -0.006578 0.451708
SIDM01060 0.043288 0.000000 -0.162589 0.065901 0.027507 -0.033283 0.000590 0.000000 -0.502486 -0.351066 ... 0.224724 0.005616 -0.017290 0.000000 -0.198974 -0.399949 0.000000 0.131691 0.000000 -0.250337
SIDM01167 0.000000 0.158182 0.484159 0.047778 0.047758 0.000000 -0.186445 -0.102564 0.546340 0.195960 ... 0.089615 0.063865 0.063088 0.000000 -0.326881 0.000977 0.000000 0.114601 0.000000 -0.475517

4 rows × 21805 columns

2. TF activity estimation with decoupler-py

Now that we have the gene changes in response to the perturbation, we can perform TF activity estimation with decoupler and CollecTRI. This scores will be the input for the network contextualization methods from NetworkCommons.

[40]:
import decoupler as dc
[42]:
net = dc.get_collectri()
INFO:root:Downloading data from `https://omnipathdb.org/queries/enzsub?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/interactions?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/complexes?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/annotations?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/queries/intercell?format=json`
INFO:root:Downloading data from `https://omnipathdb.org/about?format=text`
[44]:
tf_acts, pvals = dc.run_ulm(logfc_df, net)
[45]:
tf_acts
[45]:
ABL1 AEBP1 AHR AHRR AIP AIRE AP1 APEX1 AR ARID1A ... ZNF382 ZNF384 ZNF395 ZNF410 ZNF436 ZNF699 ZNF76 ZNF804A ZNF91 ZXDC
SIDM01150 0.080080 -1.631165 1.822143 -0.486594 -0.783931 -0.844352 1.723601 -1.519310 -0.344363 1.623128 ... -4.238011 1.965072 -2.004019 0.069187 2.652415 -0.182054 -1.793975 -0.143599 1.938493 -0.277062
SIDM00143 -1.332259 1.592115 1.183483 0.799785 1.025860 0.468413 4.497127 -1.430149 2.600641 2.156102 ... -0.789468 3.996778 -1.028295 -0.517214 0.598231 1.087119 0.778563 -0.599900 1.412452 -2.276532
SIDM01060 -0.652209 -2.475549 -1.580986 1.089401 -1.944108 0.399068 -5.066813 -2.765601 -0.934723 -0.981179 ... 1.715413 -0.158227 1.480932 0.409427 -0.044302 -0.413981 -0.048804 -2.692391 1.406787 -0.253700
SIDM01167 -1.334910 0.017429 -2.367893 0.813763 2.198922 -0.660442 -2.465387 -1.764444 -2.983476 1.955569 ... 0.219701 -0.641184 -0.172258 -0.653510 1.087562 2.267288 -1.447545 -0.400557 1.555596 -0.400194

4 rows × 735 columns

[50]:
measurements = tf_acts.loc["SIDM01060"].sort_values(ascending=False, key=abs)[0:25].to_dict()

3. Network inference with NetworkCommons

Now that we have scores for upstream and downstream layers, we can perform network inference with NetworkCommons. For the sake of simplicity and just for demonstration purposes, we will only use the shortest path approach.

[59]:
network = nc.data.network.get_omnipath()
graph = nc.utils.network_from_df(network)
[60]:
source = {'BRAF': -1}
[68]:
shortest_path_network, shortest_path_list = nc.methods.run_shortest_paths(graph, source, measurements)
shortest_sc_network, shortest_sc_list = nc.methods.run_sign_consistency(shortest_path_network, shortest_path_list, source, measurements)
[69]:
visualizer = nc.visual.NetworkXVisualizer(shortest_sc_network)
visualizer.visualize_network(source, measurements, network_type='sign_consistent')
[69]:
../_images/vignettes_B_pertpy_33_0.svg