Phenotype calculation modules
Phenotype score calculation
Log ratio of \(y\) vs \(x\):
\(y \rightarrow\) condition \(x\) (e.g. treated samples)
\(x \rightarrow\) condition \(y\) (e.g. \(t_{0}\) samples, or untreated samples)
\(a \rightarrow\) number of library elements with sgRNAs targeting \(T\)
\(b \rightarrow\) number of biological replicates, \(R\) (e.g. 2 or 3)
\(N_{x}\) | \(N_{y} \rightarrow\) read counts normalized for sequencing depth in condition \(x\) or \(y\)
Here is a formula for V3 library with single library element per gene (i.e. dual sgRNAs in one construct targeting same gene).
Phenotype score for each \(T\) comparing \(y\) vs \(x\):
\(\overline{\Delta(x,y)} \rightarrow\) log ratio averaged across replicates
\(T \rightarrow\) library elements with sgRNAs targeting \(T\)
\(d_{growth} \rightarrow\) growth factor to normalize the phenotype score.
Phenotype statistics calculation
Statistical test comparing \(y\) vs \(x\) per each target, \(T\):
(see this wikipedia page: Dependent t-test for paired samples)
(see the link to the implemented tool: ttest_rel, a scipy module)
This is a test for the null hypothesis that two related or repeated samples have identical average (expected) values).
Combined score calculation
phenoscore module
This module contains functions for calculating relative phenotypes from CRISPR screens datasets.
- screenpro.phenoscore.runPhenoScore(adata, cond_ref, cond_test, score_level, var_names='target', collapse_var=False, test='ttest', growth_rate=1, n_reps='auto', keep_top_n=None, num_pseudogenes='auto', pseudogene_size='auto', count_layer=None, count_filter_type='mean', count_filter_threshold=40, ctrl_label='negative_control')[source]
Calculate phenotype score and p-values when comparing cond_test vs cond_ref.
- Parameters
adata (AnnData) – AnnData object
cond_ref (str) – condition reference
cond_test (str) – condition test
score_level (str) – score level
var_names (str) – variable names to use as index in the result dataframe
collapse_var (str) – variable to use for getBestTargetByTSS function, default is False
test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)
growth_rate (int) – growth rate
n_reps (int) – number of replicates
keep_top_n (int) – number of top guides to keep per target
num_pseudogenes (int) – number of pseudogenes to generate
pseudogene_size (int) – number of sgRNA elements in each pseudogene
count_layer (str) – count layer to use for calculating score, default is None (use default count layer in adata.X)
count_filter_type (str) – filter type for counts, default is ‘mean’
count_filter_threshold (int) – filter threshold for counts, default is 40
ctrl_label (str) – control label, default is ‘negative_control’
- Returns
result name pd.DataFrame: result dataframe
- Return type
str
Other related modules and functions
phenostat module: internal module for statistical analysis of phenoscore data.
- screenpro.phenoscore.phenostat.matrixStat(x, y, test, level, transform='log10')[source]
Get p-values comparing y vs x matrices.
- Parameters
x (np.array) – array of values
y (np.array) – array of values
test (str) – test to use for calculating p-value
level (str) – level at which to calculate p-value
transform (str) – transformation to apply to values before running test
- Returns
array of p-values
- Return type
np.array
- screenpro.phenoscore.phenostat.multipleTestsCorrection(p_values, method='fdr_bh')[source]
Calculate adjusted p-values using multiple testing correction.
- Parameters
p_values (np.array) – array of p-values
method (str) – method to use for multiple testing correction
- Returns
array of adjusted p-values
- Return type
np.array
delta module
- screenpro.phenoscore.delta.calculateDelta(x, y, x_ctrl, y_ctrl, growth_rate)[source]
Calculate phenotype score normalized by negative control and growth rate.
- Parameters
x (np.array) – array of values
y (np.array) – array of values
x_ctrl (np.array) – array of values
y_ctrl (np.array) – array of values
growth_rate (int) – growth rate
- Returns
array of scores
- Return type
np.array
- screenpro.phenoscore.delta.compareByReplicates(adata, df_cond_ref, df_cond_test, var_names='target', test='ttest', ctrl_label='negative_control', growth_rate=1, filter_type='mean', filter_threshold=40)[source]
Calculate phenotype score and p-values comparing cond_test vs cond_ref.
In this function, the phenotype calculation is done by comparing multiple replicates of cond_test vs cond_ref.
- Parameters
adata (AnnData) – AnnData object
df_cond_ref (pd.DataFrame) – dataframe of condition reference
df_cond_test (pd.DataFrame) – dataframe of condition test
var_names (str) – variable names to use as index in the result dataframe
test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)
ctrl_label (str) – control label, default is ‘negative_control’
growth_rate (int) – growth rate
filter_type (str) – filter type to apply to low counts (‘mean’, ‘both’, ‘either’)
filter_threshold (int) – filter threshold for low counts (default is 40)
- Returns
result dataframe
- Return type
pd.DataFrame
- screenpro.phenoscore.delta.compareByTargetGroup(adata, df_cond_ref, df_cond_test, keep_top_n, var_names='target', test='ttest', ctrl_label='negative_control', growth_rate=1, filter_type='mean', filter_threshold=40)[source]
Calculate phenotype score and p-values comparing cond_test vs cond_ref.
In this function, the phenotype calculation is done by comparing groups of guide elements (e.g. sgRNAs) that target the same gene or groups of pseudogene (i.e. subsampled groups of non-targeting control elements) between cond_test vs cond_ref.
- Parameters
adata (AnnData) – AnnData object
df_cond_ref (pd.DataFrame) – dataframe of condition reference
df_cond_test (pd.DataFrame) – dataframe of condition test
keep_top_n (int) – number of top guide elements to keep
var_names (str) – variable names to use as index in the result dataframe
test (str) – test to use for calculating p-value (‘MW’: Mann-Whitney U rank; ‘ttest’ : t-test)
ctrl_label (str) – control label, default is ‘negative_control’
growth_rate (int) – growth rate
filter_type (str) – filter type to apply to low counts (‘mean’, ‘both’, ‘either’)
filter_threshold (int) – filter threshold for low counts (default is 40)
- Returns
result dataframe
- Return type
pd.DataFrame
- screenpro.phenoscore.delta.generatePseudoGeneAnnData(adata, num_pseudogenes='auto', pseudogene_size='auto', ctrl_label='negative_control')[source]
Generate pseudogenes from negative control elements in the library.
- Parameters
adata (AnnData) – AnnData object
num_pseudogenes (int) – number of pseudogenes to generate
pseudogene_size (int) – number of sgRNA elements in each pseudogene
ctrl_label (str) – control label, default is ‘negative_control’
- Returns
AnnData object with pseudogenes
- Return type
AnnData
- screenpro.phenoscore.delta.getBestTargetByTSS(score_df, target_col, pvalue_col)[source]
collapse the gene-transcript indices into a single score for a gene by best p-value
- screenpro.phenoscore.delta.getPhenotypeData(adata, score_tag, cond_ref, cond_test, growth_rate_reps=None, ctrl_label='negative_control')[source]
Calculate phenotype score for each pair of replicates
- Parameters
adata (AnnData) – AnnData object
score_tag (str) – score tag. e.g. ‘delta’, ‘gamma’, ‘tau’, ‘rho’.
cond_ref (str) – condition reference
cond_test (str) – condition test
growth_rate_reps (dict) – growth rate for each replicate. Key is replicate number, value is growth rate.
ctrl_label (str) – control label, default is ‘negative_control’
deseq module: adapt pyDESeq2 for use in ScreenPro2 package