-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make to_ascii_lowercase optional #63
Comments
I agree it should definitely be optional.
Well, |
Maybe you can pass optional lambda as an argument. Because tokenizer cannot do this, due to it's |
PR would be welcome. Initially it was
What's the use case for an internal state in tokenizers? |
I think |
Hi thanks for cool crate!
Could you remove or make
to_ascii_lowercase
optional? I think such pre-processing should be done on the library client side, since it is simple (.map(|doc| doc.to_ascii_lowercase())
), and is not required for main heavy tokenization fitting and transform logic, I would prefer to call it my self when needed.The text was updated successfully, but these errors were encountered: