Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping last message or two before new parquet writer is created #1712

Open
jeremyplichtafc opened this issue Nov 20, 2020 · 2 comments
Open

Comments

@jeremyplichtafc
Copy link
Contributor

jeremyplichtafc commented Nov 20, 2020

We are using the AvroMessageParser and AvroParquetFileReaderWriterFactory and have noticed that a very small amount of messages are being dropped. Upon further investigation the sequence numbers of the messages being dropped correspond to the number right before (or sometimes 2 before) one of the files that was written to S3.

Ex:
If one of the files on s3 is named: 1_1_00000000002329440769.gz.parquet (which I take to mean that the first piece of data in that file was from partition 1 with offset 2329440769), then the data which was dropped was in offset 2329440768.

The previous file I would have expected it to be in is well under our max file size param so I think it is getting finalized/written due to reaching max file age.

I will try to investigate more and see if I can write a unit test and figure out what is going on. If it turns out this is somehow related to our setup/config I'll add more detail here.

We are running of a fairly recent version we built off master: 359c8b8

Thanks,
Jeremy

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Nov 20, 2020 via email

@jeremyplichtafc
Copy link
Contributor Author

Thanks for the tips on how to troubleshoot. I'll let you know what I find. And if there is an apparent fix I'll send a PR your way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants