Add option to summarize replicate z-scores from controls as medians #38

gwaybio · 2021-02-11T22:47:39Z

The only option is to define based on mean. I need to add an option to define based on median.

gwaybio · 2021-02-12T14:30:51Z

@shntnu noted in https://github.com/broadinstitute/neuronal-cell-painting/issues/6#issuecomment-767719260 that we should give more thought to using negative control correlation z-scores to transform replicate correlations because of skewed distributions.

It is also possible that Amoolya brought this up yesterday, and suggested using median instead of mean.

I thought that she was referring to step 4 in https://raw.githubusercontent.com/broadinstitute/grit-benchmark/main/media/grit_calculation.png. This is what I added as an option in #39.

@shntnu - is your interpretation different?

To me, z-scoring is suitable for grit - even in the presence of skewed pairwise correlations. If we think about z-scoring as a way to normalize the correlations and interpret how many perturbations will exist above/below the distribution then we are burned. However, if we think about using the z-score to find where the mean/median replicate correlation is in respect to the controls (which we do) then we are ok. Biologically, negative controls will have variance, and, in a CRISPR experiment, potentially different off-target effects. By comparing replicate correlations to this potentially skewed distribution will help us know how different, on average, are the replicates from all controls. This underscores the importance of quality controls, which is true for all experiments.

gwaybio · 2021-02-12T14:32:22Z

a nice succinct summary on uses and misuses of z-scores https://influentialpoints.com/Training/z_scores_use_and_misuse.htm

AnneCarpenter · 2021-02-12T14:51:27Z

I will need to rely on other experts here - if you don't think you have sufficient ones chiming in, please ask for help in finding them! Sounds like Amoolya may be all you need, if you can have her read drafts sooner rather than later it will help I'm sure.

Remember how you say that Grit ranges between -1 and + 1? If that is a mathematical relationship and not just by chance, then I suspect that range might only hold true for experiments where the distribution is relatively normal and not skewed too much. Just speculation though.

gwaybio · 2021-02-12T15:01:28Z

I will need to rely on other experts here

This doesn't give me confidence! (Just FYI) Do you mean you'd like a consensus opinion in order to move forward? If so, what does a consensus opinion look like?

Remember how you say that Grit ranges between -1 and + 1? If that is a mathematical relationship and not just by chance, then I suspect that range might only hold true for experiments where the distribution is relatively normal and not skewed too much. Just speculation though.

We can assume that random data follows a normal distribution. We cannot assume that every profile without signal will appear random. That is the cost of doing business :)

AnneCarpenter · 2021-02-12T15:05:55Z

I just meant please do not count on ME to judge whether decisions on this are sound. If you, or other experts you trust, feel confident about it then carry on :)

gwaybio · 2021-02-12T15:07:00Z

Gotcha! Thanks for clarifying. I choose @shntnu to trust :)

gwaybio · 2021-02-12T17:09:29Z

some more info: in broadinstitute/grit-benchmark#22 I calculated grit using mean and median summary, and plotted the results (pasted below). The y axis is grit calculated with median, and the x axis is grit calculated with mean. The difference is very minor. We observe slightly elevated grit scores using median, potentially because poorly targeting guides reduce the mean score.

Calculated mean vs. median has a range of Spearman rank correlation between 0.9798 - 0.9835 (see this notebook).

shntnu · 2021-03-22T11:53:20Z

However, if we think about using the z-score to find where the mean/median replicate correlation is in respect to the controls (which we do) then we are ok.

I agree with this and with the explanation in #38 (comment).

I'd note that there are other, completely different ways of reporting the comparison of the two distributions (1. correlation to replicates and 2. corrections to negative controls) e.g. Average Precision (replicates = class 1, negative controls = class 2), but @gwaygenomics's choice of average z-score of class 1, using class 2 as a reference, is a defensible choice.

gwaybio added the enhancement New feature or request label Feb 11, 2021

gwaybio mentioned this issue Feb 11, 2021

Adding median to summarize replicate z-scores in grit #39

Merged

gwaybio closed this as completed in #39 Feb 12, 2021

gwaybio reopened this Feb 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to summarize replicate z-scores from controls as medians #38

Add option to summarize replicate z-scores from controls as medians #38

gwaybio commented Feb 11, 2021

gwaybio commented Feb 12, 2021

gwaybio commented Feb 12, 2021

AnneCarpenter commented Feb 12, 2021

gwaybio commented Feb 12, 2021

AnneCarpenter commented Feb 12, 2021

gwaybio commented Feb 12, 2021

gwaybio commented Feb 12, 2021 •

edited

Loading

shntnu commented Mar 22, 2021

Add option to summarize replicate z-scores from controls as medians #38

Add option to summarize replicate z-scores from controls as medians #38

Comments

gwaybio commented Feb 11, 2021

gwaybio commented Feb 12, 2021

gwaybio commented Feb 12, 2021

AnneCarpenter commented Feb 12, 2021

gwaybio commented Feb 12, 2021

AnneCarpenter commented Feb 12, 2021

gwaybio commented Feb 12, 2021

gwaybio commented Feb 12, 2021 • edited Loading

shntnu commented Mar 22, 2021

gwaybio commented Feb 12, 2021 •

edited

Loading