Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Racing condition with parquet writer #1792

Open
richiesgr opened this issue Jan 3, 2021 · 3 comments
Open

Racing condition with parquet writer #1792

richiesgr opened this issue Jan 3, 2021 · 3 comments

Comments

@richiesgr
Copy link

Hi
After debugging a lot secor when writing Parquet from Avro message. I come to the conclusion that a possible racing condition can occurs.

  • So Iv'e a topic with 300 Partitions
  • Start 10 Pods - 7 Core per pod
  • 7 Thread per pod

Result not enough parquet writer to handle every partition. So start a IndexOutOfBoundsException
This occurs into the parquet writer no code related to secor. As you know the parquet writer is not thread safe. Is there a possibility one Parquet writer is used by more that 1 thread ?

My investigations show me that assumption is correct because the only workaround I found at this moment is to have a much higher number of secor.consumer.threads to be sure that Parquet writer is not reused by mistake

Can you confirm ?
Thanks

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jan 4, 2021 via email

@richiesgr
Copy link
Author

richiesgr commented Jan 4, 2021

Hi
My question was if my assumption are valid ?
yes we can debug with many tools but As I read the code I see there is 1 parquet writer by file because it's stored in a hashset<path,writer> but is there by any chance a problem a race condition trying to use the same parquet writer by 2 threads ?
Because in this case it's more a design problem than a code bug

@HenryCaiHaiying
Copy link
Contributor

HenryCaiHaiying commented Jan 4, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants