Phenotype calculation modules

Phenotype score calculation

Log ratio of \(y\) vs \(x\):

\[\Delta = \log_2(\frac {\begin{bmatrix}{N_{y}}\end{bmatrix}_{(a,b)}} {\begin{bmatrix}{N_{x}}\end{bmatrix}_{(a,b)}} )\]
  • \(y \rightarrow\) condition \(x\) (e.g. treated samples)

  • \(x \rightarrow\) condition \(y\) (e.g. \(t_{0}\) samples, or untreated samples)

  • \(a \rightarrow\) number of library elements with sgRNAs targeting \(T\)

  • \(b \rightarrow\) number of biological replicates, \(R\) (e.g. 2 or 3)

  • \(N_{x}\) | \(N_{y} \rightarrow\) read counts normalized for sequencing depth in condition \(x\) or \(y\)


Here is a formula for V3 library with single library element per gene (i.e. dual sgRNAs in one construct targeting same gene).

Phenotype score for each \(T\) comparing \(y\) vs \(x\):

\[\text{PhenoScore}(T,x,y) = \left( \overline{\Delta_{(x,y)}} - \text{median}( {\overline{\Delta_{(x_{ctrl},y_{ctrl})}}} ) \right) \times \frac{ 1 }{d_{growth}}\]
  • \(\overline{\Delta(x,y)} \rightarrow\) log ratio averaged across replicates

  • \(T \rightarrow\) library elements with sgRNAs targeting \(T\)

  • \(d_{growth} \rightarrow\) growth factor to normalize the phenotype score.

Phenotype statistics calculation

Statistical test comparing \(y\) vs \(x\) per each target, \(T\):

\[\text{p-value}(T,x,y) = \text{t-test} \left( \begin{bmatrix}{N_{x}}\end{bmatrix}_{(a,b)}, \begin{bmatrix}{N_{y}}\end{bmatrix}_{(a,b)} \right)\]

(see this wikipedia page: Dependent t-test for paired samples)

(see the link to the implemented tool: ttest_rel, a scipy module)

This is a test for the null hypothesis that two related or repeated samples have identical average (expected) values).

Combined score calculation

\[\text{combined score} = \left( \dfrac{T_{\text{phenotype score}}}{\sigma{\text{(negative controls)}}} \right) \times -\log_{10}(\text{pvalue})\]

phenoscore module

This module contains functions for calculating relative phenotypes from CRISPR screens datasets.

screenpro.phenoscore.runPhenoScore(adata, cond_ref, cond_test, score_level, var_names='target', collapse_var=False, test='ttest', growth_rate=1, n_reps='auto', keep_top_n=None, num_pseudogenes='auto', pseudogene_size='auto', count_layer=None, count_filter_type='mean', count_filter_threshold=40, ctrl_label='negative_control')[source]

Calculate phenotype score and p-values when comparing cond_test vs cond_ref.

Parameters
  • adata (AnnData) – AnnData object

  • cond_ref (str) – condition reference

  • cond_test (str) – condition test

  • score_level (str) – score level

  • var_names (str) – variable names to use as index in the result dataframe

  • collapse_var (str) – variable to use for getBestTargetByTSS function, default is False

  • test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)

  • growth_rate (int) – growth rate

  • n_reps (int) – number of replicates

  • keep_top_n (int) – number of top guides to keep per target

  • num_pseudogenes (int) – number of pseudogenes to generate

  • pseudogene_size (int) – number of sgRNA elements in each pseudogene

  • count_layer (str) – count layer to use for calculating score, default is None (use default count layer in adata.X)

  • count_filter_type (str) – filter type for counts, default is ‘mean’

  • count_filter_threshold (int) – filter threshold for counts, default is 40

  • ctrl_label (str) – control label, default is ‘negative_control’

Returns

result name pd.DataFrame: result dataframe

Return type

str

Other related modules and functions

phenostat module: internal module for statistical analysis of phenoscore data.

screenpro.phenoscore.phenostat.matrixStat(x, y, test, level, transform='log10')[source]

Get p-values comparing y vs x matrices.

Parameters
  • x (np.array) – array of values

  • y (np.array) – array of values

  • test (str) – test to use for calculating p-value

  • level (str) – level at which to calculate p-value

  • transform (str) – transformation to apply to values before running test

Returns

array of p-values

Return type

np.array

screenpro.phenoscore.phenostat.multipleTestsCorrection(p_values, method='fdr_bh')[source]

Calculate adjusted p-values using multiple testing correction.

Parameters
  • p_values (np.array) – array of p-values

  • method (str) – method to use for multiple testing correction

Returns

array of adjusted p-values

Return type

np.array

delta module

screenpro.phenoscore.delta.calculateDelta(x, y, x_ctrl, y_ctrl, growth_rate)[source]

Calculate phenotype score normalized by negative control and growth rate.

Parameters
  • x (np.array) – array of values

  • y (np.array) – array of values

  • x_ctrl (np.array) – array of values

  • y_ctrl (np.array) – array of values

  • growth_rate (int) – growth rate

Returns

array of scores

Return type

np.array

screenpro.phenoscore.delta.compareByReplicates(adata, df_cond_ref, df_cond_test, var_names='target', test='ttest', ctrl_label='negative_control', growth_rate=1, filter_type='mean', filter_threshold=40)[source]

Calculate phenotype score and p-values comparing cond_test vs cond_ref.

In this function, the phenotype calculation is done by comparing multiple replicates of cond_test vs cond_ref.

Parameters
  • adata (AnnData) – AnnData object

  • df_cond_ref (pd.DataFrame) – dataframe of condition reference

  • df_cond_test (pd.DataFrame) – dataframe of condition test

  • var_names (str) – variable names to use as index in the result dataframe

  • test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)

  • ctrl_label (str) – control label, default is ‘negative_control’

  • growth_rate (int) – growth rate

  • filter_type (str) – filter type to apply to low counts (‘mean’, ‘both’, ‘either’)

  • filter_threshold (int) – filter threshold for low counts (default is 40)

Returns

result dataframe

Return type

pd.DataFrame

screenpro.phenoscore.delta.compareByTargetGroup(adata, df_cond_ref, df_cond_test, keep_top_n, var_names='target', test='ttest', ctrl_label='negative_control', growth_rate=1, filter_type='mean', filter_threshold=40)[source]

Calculate phenotype score and p-values comparing cond_test vs cond_ref.

In this function, the phenotype calculation is done by comparing groups of guide elements (e.g. sgRNAs) that target the same gene or groups of pseudogene (i.e. subsampled groups of non-targeting control elements) between cond_test vs cond_ref.

Parameters
  • adata (AnnData) – AnnData object

  • df_cond_ref (pd.DataFrame) – dataframe of condition reference

  • df_cond_test (pd.DataFrame) – dataframe of condition test

  • keep_top_n (int) – number of top guide elements to keep

  • var_names (str) – variable names to use as index in the result dataframe

  • test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)

  • ctrl_label (str) – control label, default is ‘negative_control’

  • growth_rate (int) – growth rate

  • filter_type (str) – filter type to apply to low counts (‘mean’, ‘both’, ‘either’)

  • filter_threshold (int) – filter threshold for low counts (default is 40)

Returns

result dataframe

Return type

pd.DataFrame

screenpro.phenoscore.delta.generatePseudoGeneAnnData(adata, num_pseudogenes='auto', pseudogene_size='auto', ctrl_label='negative_control')[source]

Generate pseudogenes from negative control elements in the library.

Parameters
  • adata (AnnData) – AnnData object

  • num_pseudogenes (int) – number of pseudogenes to generate

  • pseudogene_size (int) – number of sgRNA elements in each pseudogene

  • ctrl_label (str) – control label, default is ‘negative_control’

Returns

AnnData object with pseudogenes

Return type

AnnData

screenpro.phenoscore.delta.getBestTargetByTSS(score_df, target_col, pvalue_col)[source]

collapse the gene-transcript indices into a single score for a gene by best p-value

screenpro.phenoscore.delta.getPhenotypeData(adata, score_tag, cond_ref, cond_test, growth_rate_reps=None, ctrl_label='negative_control')[source]

Calculate phenotype score for each pair of replicates

Parameters
  • adata (AnnData) – AnnData object

  • score_tag (str) – score tag. e.g. ‘delta’, ‘gamma’, ‘tau’, ‘rho’.

  • cond_ref (str) – condition reference

  • cond_test (str) – condition test

  • growth_rate_reps (dict) – growth rate for each replicate. Key is replicate number, value is growth rate.

  • ctrl_label (str) – control label, default is ‘negative_control’

deseq module: adapt pyDESeq2 for use in ScreenPro2 package