Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing labels for some wells in COMPOUND_EMPTY plates in source_1 #75

Open
hanslovsky opened this issue Aug 8, 2023 · 2 comments
Open

Comments

@hanslovsky
Copy link

I join the metadata from load_data_with_illum.parquet files with the data in well.csv.gz to download images for a plate and also get the associated perturbations. I noticed that there are no labels in well.csv.gz for some of the COMPOUND_EMPTY wells in load_data_with_illum.parquet for plate UL001661 in source_1. Note: jc.MetadataFiles.{get_well,get_plate} are convenience functions to read the metadata files at commit 4b24577. This is not the most recent commit on main and I will double check with the most recent commit on main, too.

In [72]: import pandas as pd

In [73]: import jump_conversion as jc

In [74]: load_data = pd.read_parquet(Path.home() / 'data/jump.zarr/.cache/cpg0016-jump/source_1/workspace/load_data_csv/Batch1_20221004/UL001661/load_data_with_illum.parquet').assign(Metadata_Plate='UL001661')

In [75]: well = jc.MetadataFiles.get_well()

In [76]: plate = jc.MetadataFiles.get_plate()

In [77]: with_jcp = load_data.merge(well, how='left', on=['Metadata_Plate', 'Metadata_Well'])

In [78]: with_jcp[with_jcp.Metadata_JCP2022.isnull()]
Out[78]:
     Metadata_Source_x   Metadata_Batch Metadata_Plate Metadata_Well Metadata_Site      FileName_IllumAGP  ...                                   PathName_OrigDNA                                    PathName_OrigER                                  PathName_OrigMito                                   PathName_OrigRNA Metadata_Source_y Metadata_JCP2022
184           source_1  Batch1_20221004       UL001661           B02             1  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
185           source_1  Batch1_20221004       UL001661           B02             2  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
186           source_1  Batch1_20221004       UL001661           B02             3  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
187           source_1  Batch1_20221004       UL001661           B02             4  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
192           source_1  Batch1_20221004       UL001661           B04             1  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
...                ...              ...            ...           ...           ...                    ...  ...                                                ...                                                ...                                                ...                                                ...               ...              ...
3815          source_1  Batch1_20221004       UL001661           U35             4  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
4760          source_1  Batch1_20221004       UL001661           Z42             1  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
4761          source_1  Batch1_20221004       UL001661           Z42             2  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
4762          source_1  Batch1_20221004       UL001661           Z42             3  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN
4763          source_1  Batch1_20221004       UL001661           Z42             4  UL001661_IllumAGP.npy  ...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...  s3://cellpainting-gallery/cpg0016-jump/source_...               NaN              NaN

At first, I thought this may be blank images as described in #61 (comment) but plate UL001661 is not listed in that comment. I downloaded one dna channel image for the wells that I identified from

s3://cellpainting-gallery/cpg0016-jump/source_1/images/Batch1_20221004/images/UL001661__2022-10-05T05_07_32-Measurement1/Images/r02c02f01p01-ch4sk1fk1fl1.tiff

I found that the image is not blank but it is very noisy and with strong artifacts plus visible well edge:

dna

Did these wells not pass QA and should be excluded, and are thus not included in the metadata? Can I extrapolate that to any other well that is not available in well.csv.gz?

Thank you!

@hanslovsky
Copy link
Author

I will label those as JCP2022_NAN for my own record keeping so I can easily exclude them.

@niranjchandrasekaran
Copy link
Contributor

Thanks @hanslovsky for flagging this. QC issues could be the reason.

@shntnu were wells not included in wells.csv.gz because of QC issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants