Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compl could work with non single-character length regexp #98

Open
EmileRolley opened this issue May 24, 2021 · 8 comments
Open

Compl could work with non single-character length regexp #98

EmileRolley opened this issue May 24, 2021 · 8 comments

Comments

@EmileRolley
Copy link

Hello, I would like to know if there would be any reticence to add the possibility to use Compl with a non single-character length regexp ?

@hhugo
Copy link
Contributor

hhugo commented Feb 25, 2023

@EmileRolley, can you explain the semantic of such construct ?

@davesnx
Copy link

davesnx commented Oct 20, 2023

I need this also. Not aware of how sedlex is implemented, but haven't found a good way to represent Compl with a secession of chars.

The exact code I have is a lexer to represent a printf-similar syntax but instead of %d being the interpolation token, I designed $() and inside the parens it contains the value.

The regexpes are defined like this:

let letter = [%sedlex.regexp? 'a' .. 'z' | 'A' .. 'Z']

  let case_ident =
    [%sedlex.regexp?
      ('a' .. 'z' | '_' | '\''), Star (letter | '0' .. '9' | '_')]

  let ident = [%sedlex.regexp? (letter | '_'), Star (letter | '0' .. '9' | '_')]
  let variable = [%sedlex.regexp? Star (ident, '.'), case_ident]
  let interpolation = [%sedlex.regexp? "$(", variable, ")"]
  let rest = [%sedlex.regexp? Plus (Compl '$')]

I would like to define rest as [%sedlex.regexp? Plus (Compl "$(")]

Since currently, as soon as there's one $ (wihtout a () I can't handle it on rest

@hhugo
Copy link
Contributor

hhugo commented Oct 20, 2023

Can you write what you want with Compl '$' | ( '$', Compl '(') ?

@davesnx
Copy link

davesnx commented Oct 21, 2023

Totally, but I'm not entirely sure If you can make this structure work since combining Compl '$' with '$' generates a non-matchable rule, right?

@EmileRolley
Copy link
Author

Can you write what you want with Compl '$' | ( '$', Compl '(') ?

Yes, that's what I ended up doing and it would be convenient to have the possibility to simply write Compl "mystring" instead of Compl 'm' | ( 'm', Compl 'y') | ("my", Compl "s") | ... | ("mystrin", Compl "g").

@pmetzger
Copy link
Member

pmetzger commented Dec 4, 2023

This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...

@EmileRolley
Copy link
Author

This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...

What is the way ?

@pmetzger
Copy link
Member

pmetzger commented Dec 6, 2023

I don't know what your whole language is, but the reasonable thing to do is to write a grammar for it and to tokenize lexemes that you see rather than trying to not tokenize lexemes you don't see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants