Define percent strong #21

shntnu · 2020-11-09T15:34:57Z

(Stubs for now, so we can add this documentation to code later)

Percent strong is reported in two ways. We should distinguish between these ways of reporting (they are similar but not the same)

The fraction of replicate pairs that are more similar to each other than 95% of non-replicate pairs.
The fraction of replicate sets that are more coherent than 95% of non-replicate sets of the same size. The coherence of a set is defined as the median correlation among elements of the set.

The second version can be a bit confusing so here is an example:

Consider an experiment that has 100 perturbations, performed in 5 replicates
For each perturbation, we compute the pairwise correlation among its 5 replicates (the replicate set), and report the median value. This median replicate correlation value is the coherence of the perturbation's replicates.
We now compute the coherence of non-replicate sets of the same size. So we sample 5 random wells in the experiment such that no two are replicates of the same perturbation. We compute the pairwise correlation among the 5 wells (the non-replicate set), and report the median value. We do this several times to build a null distribution
We now report the fraction of perturbations that have a coherence greater than 95th percentile of the null. This is the percent strong.

gwaybio · 2020-11-19T20:47:33Z

@niranjchandrasekaran - at checkin today I think I may have answered your question about "percent strong" incorrectly. We are not calculating medians in percent_strong.py - all we do currently is determine the percentage of "group_replicates" that are higher than a quantile (95% default) of "not group_replicates".

niranjchandrasekaran · 2020-11-19T21:02:15Z

@gwaygenomics, does that mean the current implementation of percent_strong computes the first type of percent_strong in #21 (comment) and not the second type?

shntnu · 2020-11-19T21:06:52Z

@gwaygenomics, does that mean the current implementation of percent_strong computes the first type of percent_strong in #21 (comment) and not the second type?

I think that is correct.

The good news is that it should require relatively little extra code to implement the second type, given that this matrix is being computed

cytominer-eval/cytominer_eval/operations/percent_strong.py

Lines 35 to 37 in 6f9d350

    
           similarity_melted_df = assign_replicates( 
        
               similarity_melted_df=similarity_melted_df, replicate_groups=replicate_groups 
        
           )

gwaybio · 2021-01-14T15:15:31Z

note: we've also renamed percent_strong to percent_matching

gwaybio · 2021-05-13T15:35:51Z

Potentially useful resource: https://github.com/shntnu/grit-benchmark/blob/rtests/1.calculate-metrics/cell-health/taxonomy.md

shntnu · 2021-08-20T20:18:52Z

I'll copy here text by @gwaygenomics from the paper https://github.com/broadinstitute/lincs-profiling-complementarity because it's the clearest description of the method I've come across!

Constructing an appropriate null distribution to calculate reproducibility metrics
In order to calculate percent replicating and percent matching metrics, we constructed matched null distributions. We designed the null distributions to control for different replicate counts between different compounds and MOAs, taking into account different replicate counts per assay. We also constructed different null distributions within each treatment dose independently to account to control for dose differences.

Specifically, for percent replicating, for a given perturbation x with n replicates of dose p, we randomly sampled n non-replicate profiles from all 1,327 common perturbations treated with dose p. We performed this sampling procedure 1,000 times per replicate cardinalityclass (e.g. compounds with 3 replicates, 4 replicates, 5 replicates, etc.) with two additional restrictions: (1) the random sample did not include replicates for perturbation x, and (2) no two compounds of the same non-x perturbation were included in the same null group. For example, in cases where a compound treatment at a specific dose had five replicates, we sampled 1,000 groups of five randomly sampled non-replicate profiles of the same dose. For percent replicating, we used level 4 profiles considering compound and dose information as replicates. We considered a replicating profile one in which the ground truth median pairwise replicate correlation was higher than 95% of the null distribution. We therefore calculate the percent replicating metric as the total number of replicating profiles over all common compounds.

For percent matching, we performed a similar procedure. The only differences were that we (1) used level 5 consensus signatures and (2) considered MOA and dose information as replicates. We subsequently constructed dose and MOA replicate count-specific null distributions to compare against. We considered a matched MOA one in which the ground truth MOA median pairwise correlation was higher than 95% of the null distribution. We therefore calculate the percent matching metric as the total number of matched MOAs over all common MOAs.

We used these null distributions to calculate a non-parametric p value. First, for each compound, we calculated its median pairwise replicate correlation. We next calculated the median pairwise correlations of each randomly sampled group matched to the specific dose and replicate count. Lastly, we calculated a compound specific p value by dividing how many times the real median pairwise correlation of replicates was higher than all 1,000 randomly sampled groups of median pairwise correlations.

shntnu · 2021-10-22T11:22:57Z

Quick note because this came up when reviewing @jccaicedo 's paper:

As of Oct 2021, the definitions in #21 (comment) might be inconsistent with the terminology used in the package.

gwaybio · 2021-10-22T14:06:38Z

I am fairly sure they will be inconsistent - although I do think the differences will be very minor. We did not use this package in that paper, and I wrote the package implementation a couple months before

shntnu · 2021-10-22T14:13:21Z

Makes sense 👍
(I added this note because we were citing this issue in the LUAD gdoc (in comments, not actually citing it) and I didn't want people to get confused)

shntnu added the documentation Improvements or additions to documentation label Nov 9, 2020

gwaybio mentioned this issue Nov 19, 2020

Add reporting option to percent_strong #24

Open

shntnu closed this as completed Mar 22, 2021

gwaybio reopened this May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define percent strong #21

Define percent strong #21

shntnu commented Nov 9, 2020 •

edited

Loading

gwaybio commented Nov 19, 2020

niranjchandrasekaran commented Nov 19, 2020

shntnu commented Nov 19, 2020 •

edited

Loading

gwaybio commented Jan 14, 2021

gwaybio commented May 13, 2021

shntnu commented Aug 20, 2021

shntnu commented Oct 22, 2021

gwaybio commented Oct 22, 2021

shntnu commented Oct 22, 2021 •

edited

Loading

Define percent strong #21

Define percent strong #21

Comments

shntnu commented Nov 9, 2020 • edited Loading

gwaybio commented Nov 19, 2020

niranjchandrasekaran commented Nov 19, 2020

shntnu commented Nov 19, 2020 • edited Loading

gwaybio commented Jan 14, 2021

gwaybio commented May 13, 2021

shntnu commented Aug 20, 2021

shntnu commented Oct 22, 2021

gwaybio commented Oct 22, 2021

shntnu commented Oct 22, 2021 • edited Loading

shntnu commented Nov 9, 2020 •

edited

Loading

shntnu commented Nov 19, 2020 •

edited

Loading

shntnu commented Oct 22, 2021 •

edited

Loading