Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whats the fastest way to add a new column that already has the same partitions (probably)? #7391

Open
Liquidmasl opened this issue Sep 6, 2024 · 1 comment
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage

Comments

@Liquidmasl
Copy link

There are a bunch of ways to add a column to a dataframe..

what is the fastest with modin?

say get a new column by applying a function to another one

new_c = df['column'].apply(lambda x: abs(x))

the resulting series should have the same partitions as the dataframe right?

we can use...
merge, or concat, or just do

df['new_col'] = new_c

which is the most readable IMO

and probably a few other ways

but what is the fastest?

Thank you!

@Liquidmasl Liquidmasl added question ❓ Questions about Modin Triage 🩹 Issues that need triage labels Sep 6, 2024
@Liquidmasl
Copy link
Author

Liquidmasl commented Sep 6, 2024

And also:

How to add multple columns at once?

concat ? will it play nice with partitions?

cause

df[['col1','col2']] = <some np array with 2 columns and the corrent amount of rows>

just defaults to pandas... because inserting with unhashable key is not supported..?

I dont want to make a new modin dataframe out of the np array for concatenation because i dont want to cause trouble with partitions that dont fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage
Projects
None yet
Development

No branches or pull requests

1 participant