Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider using 32-bit offsets for start and end in yp_location_t #1566

Open
froydnj opened this issue Sep 20, 2023 · 3 comments
Open

consider using 32-bit offsets for start and end in yp_location_t #1566

froydnj opened this issue Sep 20, 2023 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@froydnj
Copy link
Contributor

froydnj commented Sep 20, 2023

YARP provides precise location information for any number of things in its AST, which is great! But each of those locations on a 64-bit platform costs 16 bytes, which means every YARP node (yp_node_t) on those platforms is at least 24 bytes, which is rather large.

For some concrete numbers, nearly half (~47%) of the memory size as measured by YARP::Debug.memsize on a subset of Stripe's codebase is consumed by locations. YARP::Debug.memsize doesn't measure this as-is; the count(s) of locations were obtained from this branch (and note whatever the branch counts needs to be multiplied by 16 to obtain the size):

main...froydnj:froydnj-location-counting

so I might have done something wrong.

Using 32-bit offsets (well, probably 31-bit offsets so there's still some way to mark locations as "invalid" or "unavailable") would immediately shrink the memory required by ~25%; I haven't put together a proof-of-concept branch with that change, but my belief is that there would be a performance gain as well, though probably not of the same magnitude. One can also imagine more complicated schemes where nodes hold a 32-bit key into some side table, which would shrink things still more, but one step at a time.

This change would limit the kinds of Ruby source files that YARP could consume. But other Ruby-consuming tools/implementations (Sorbet and--I think--TruffleRuby cc @eregon ) already maintain 32-bit offsets internally. I believe Clang and Go also only store 32-bit offsets for their locations. If Clang and Go can get away with it, I think YARP can too.

@eregon
Copy link
Member

eregon commented Sep 20, 2023

Yes, when serialized it's already 32-bit ints and actually in Java it's a 32-bit signed integer, so only 31 bits.
Related: #872 (comment) and #1407 (comment)
Also #1532 for stats on the savings to not have location fields, although that is with varint "compression" so not directly related to the footprint for C structs.

I think it'd be good to store locaitons in C as uint32_t start_offset; uint32_t length, but that is a massive change with huge conflict potential, so I think we'd need @kddnewton to agree to not merge anything from when it's being worked on until it's merged (or abandoned).
It's quite a big change because yarp.c in many places use the fact the start is a char* and not just an offset.

@kddnewton kddnewton added the enhancement New feature or request label Sep 21, 2023
@kddnewton
Copy link
Collaborator

I assume we're already limited in very non-obvious ways, so I'm not particularly worried about switching to 32-bits. I think this is a good idea.

I disagree with @eregon here I don't think we can halt merging entirely in service of getting this over the finish line, so I'd like to explore other ways to do this incrementally.

I think one way to start would be to modify a single location field on a single node. You would do this by creating a new kind of field called OffsetsField (as opposed to LocationField). Presumably this could be done in the various yp_*_node_create functions where it would subtract from the beginning of the parser. That way we could test out the implications of a single node's fields while still being able to keep the codebase healthy.

@kddnewton kddnewton added this to the Ruby 3.4.0 milestone Dec 6, 2023
@kddnewton kddnewton modified the milestones: CRuby unblocked, Long-term Feb 14, 2024
@kddnewton
Copy link
Collaborator

Officially got confirmation that we can make this happen! Yay! https://bugs.ruby-lang.org/issues/20488#note-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants