Huge GC pressure when serving files with `Cohttp_lwt_unix.Server.respond_file` #207

mfp · 2014-11-26T12:49:36Z

respond_file reads the file in 16 KB chunks using Lwt_io.read. This allocates lots of 16 KB strings that go directly to the major heap and become garbage right away, and is bound to trigger major collection fests. See ocsigen/ocsigenserver#49 for an example of what can happen in a similar case...

The text was updated successfully, but these errors were encountered:

rgrinberg · 2014-11-26T14:14:57Z

We should probably just be using send_file in this case. At least where it's available. Since you are knowledgeable on this topic. What would you recommend?

mfp · 2014-11-26T16:22:56Z

I'm not especially knowledgeable on this, but for sure serving files is what sendfile(2) was made for. It would only help for plain (non-SSL) connections, which is an important limitation.

There are a few complications though:

portability: both sendfile(2) and the accompanying TCP_CORK option are Linux-only
it doesn't seem to fit Cohttp's current (stream-based) model

Regarding (2), after a quick look (this is the first time I look at cohttp's code, so take it with a grain
of salt) I believe it might be possible to change both Cohttp_lwt_body.t (so that it can be a reference to a file reifiable as a regular stream) and the output channels used elsewhere in order to perform
a sort of stream fusion and replace the read-file-into-stream & write-stream-to-socket operations into a single sendfile loop.

This sounds quite involved, so for the time being simply allocating a string buffer and reusing it
(with Lwt_io.read_into) would help a lot. I assume that nobody expects the strings returned by the stream to be immutable (this could be documented in the corresponding interface).

pqwy · 2014-11-26T17:54:51Z

Another alternative would be to use Cstruct.t. sendfile(2) avoids copying the file into OCaml memory altogether, sure, but just using cstructs mostly bypasses the GC. They are backed by a foreign-allocated Bigarray and are represented on the heap as a 3-slot record, so they are unlikely to get undue promotion out of the young generation.

rgrinberg · 2014-11-26T18:03:34Z

We don't even need cstruct if we adopt core_kernel for every backend since Bigsubstring is analagous AFAIK.

pqwy · 2014-11-26T18:09:59Z

Yeah. Anything that wraps Bigarray with offset/pointer so the bulk of it is allocated off the GC heap and it's not expensive to cut.

We use Cstruct.t in TLS pervasively for these reasons.

dbuenzli · 2014-11-26T18:39:15Z

I'd really prefer cstruct than a pervasive core_kernel dependency, we want portability and fast build times on constrained platforms.

hcarty · 2014-11-26T18:45:56Z

I second the preference for cstruct over core_kernel.

pqwy · 2014-11-26T18:48:22Z

@dbuenzli, @hcarty:
That's also true. I can't see myself switching TLS and all the related libs to core just for the buffer type anytime soon.

The ecosystem could really use a common bigarray wrapper with zero-copy slicing and a few handy fixed-size extractor funs, that everybody agrees on. Like cstruct, sans syntax extensions, but one that everyone is using...

avsm · 2014-11-26T19:17:57Z

Cstruct now only optionally depends on camlp4 since 1.5.0, and has an adapter to Async.

mfp · 2014-11-26T19:32:21Z

On Wed, Nov 26, 2014 at 09:54:52AM -0800, David Kaloper wrote:

Another alternative would be to use Cstruct.t. sendfile(2) avoids
copying the file into OCaml memory altogether, sure, but just using cstructs
mostly bypasses the GC. They are backed by a foreign-allocated Bigarray
and are represented on the heap as a 3-slot record, so they are unlikely to
get undue promotion out of the young generation.

I agree that cstructs (or even plain old 1-dimensional bigarrays, maybe with
minimal wrapping for offset/length) should be more prevalent in this kind of
things (IO buffers and such). The main advantage is that they're not going to
be moved around by the GC, so no copying to a temp buffer is needed when
performing IO after releasing the runtime.

They are not a panacea however. For instance, non-mmapped bigarrays simply use
malloc to reserve the memory and free() it on GC of the bigarray, which
cannot be controlled precisely at a reasonable cost -- you have to do
Gc.full_major (). This does not generate extra GC load, but can easily cause
memory fragmentation (and high RSS peaks), for there's no compaction unlike
OCaml's major heap. So you're going to need some care (and maybe a buffer
pool) and have to avoid allocating too much anyway --- not unlike with plain
old strings, although for different reasons.

Going back to the present issue, if Cohttp_lwt_body.t is changed from
Stream of string Lwt_stream.t to Stream of cstruct_like Lwt_stream.t,
you definitely want to reuse the cstruct_like buffer. Also, for compatibility
with Lwt_io, which cohttp_lwt* builds upon, I think simple 1-dimensional
bigarrays (= Lwt_bytes.t) would be best, as in time Lwt_io channels could
become able to dump them directly to the underlying fd (if applicable).
However, neither Lwt_io.read_into_bytes" norLwt_io.write_from_bytesexist at the moment, so... in fact a single plain old string (+ length) reused throughout the life of the stream is going to be close to optimal inLwt_io`'s current state.

Mauricio Fernández

pqwy · 2014-11-26T19:55:09Z

Cstructs are a minimal wrapping of bigarrays with offset/length!

As for Lwt_io.read_into_bytes and its counterpart, they are called Lwt_bytes.read / write, and are what Lwt_cstruct.read / write build upon. It should be relatively straightforward to switch Cohttp_lwt_body.t and friends to cstruct internally, with the damage being api breakage or duplication.

Re: imprecise deallocation and fragmentation.. I have no idea how it would play out. I think those questions are best answered by measurements.

mfp · 2014-11-26T20:30:20Z

On Wed, Nov 26, 2014 at 11:55:11AM -0800, David Kaloper wrote:

Cstructs are a minimal wrapping of bigarrays with offset/length!

I'm not very lucid today, what I meant was really non-opaque wrapping, for
maximal compatibility/interoperability.

As for Lwt_io.read_into_bytes and its counterpart, they are called Lwt_bytes.read / write, and are what Lwt_cstruct.read / write build upon. It should be relatively straightforward to switch Cohttp_lwt_body.t and friends to cstruct internally, with the damage being api breakage or duplication.

Lwt_bytes.read/write operate on Lwt_unix.file_descr, not Lwt_io channels
(as needed to read from/write to TLS channels via Lwt_ssl).

We want to avoid allocation in respond_file's stream, but at some point
you're going to want to write it to a Lwt_io channel if you're living in the
Lwt ecosystem.

This is why, when serving a file with respond_file, you're going to run into
this issue as well: mirage/ocaml-conduit#27

Refer also to ocsigen/lwt#98 (as you can see I went
on a little bug reporting spree).

Re: imprecise deallocation and fragmentation.. I have no idea how it would play out.
I think those questions are best answered by measurements.

Allocating lots of bigarrays and hoping the GC will release them is just
asking for trouble. There's a constant in the bigarray lib used to compute the
major slice size which will cause up to 1 GB worth of bigarray data to be
allocated before a full major GC cycle is completed. The heap is going to
become gruyère cheese if you don't try to reuse the bigarrays (and triggering
GC runs manually would generate as much work as plain old strings being
allocated in the major heap, but that at least adjusts the GC speed
automatically and prevents fragmentation...).

Mauricio Fernández

Chris00 · 2015-01-26T09:16:49Z

Lwt_bytes.read/write operate on Lwt_unix.file_descr, not Lwt_io channels (as needed to read from/write to TLS channels via Lwt_ssl).

It looks like a first move would be to request Lwt to have “allocation free” Lwt_io.read and Lwt_io.write (working on channels) — it would be useful for other purposes too. Async already has that. Then one can ask Ssl to directly support Bytes.t.

Chris00 · 2015-01-26T09:21:43Z

BTW, while we are at it, IO.read should report differently EOF and an error condition — IMHO it is very important to distinguish the two, one may have to rely on EOF to make sure the complete input was read...

avsm · 2015-01-26T11:00:47Z

I've opened up #207 to track the IO.read comment -- it's helpful to keep these bug reports separate where possible.

mfp · 2015-08-17T13:43:28Z

I'm thinking of implementing a trivial buffer pool to prevent allocation of large buffers (both Bytes.t and Lwt_bytes.t/Cstruct.t), starting from conduit and upwards to cohttp and ocsigenserver. I did it in ocsigenserver to great effect (up to ~30% speed up).

This is the API I have in mind (as you can see in the .ml, the implementation is trivial).
Reusing buffers entails the risk of accidentally using one after it's been explicitly released, but the places where they are used are clear enough and the speed gains so great it makes it worthwhile IMO (for instance, the buffer passed to Lwt_io.make can be released in the ~close function).

Initially, I was going to just add the buffer pool to conduit, and then reuse it in cohttp and ocsigenserver (at some point, the API could be enriched and generalized, and split off to a separate package). One doubt I have is if it'd be acceptable to expose conduit's buffer pool (I mean the pool "value" itself, not the implementation) so that cohttp and ocsigenserver can reuse it, or if it'd be deemed better to have each one use its own internally (and without exposing it), while reusing the implementation, of course.

In preliminary benchmarks of the ocsigenserver cohttp(_rebased) branch, I've seen it's between ~50% (for very small responses) and about as fast (slowdown going down as size increases, about ~ 5% for 16KB files) as stock ocsigenserver 2.6 (without any of the allocation prevention work mentioned earlier). I think the higher per-connection overhead can easily be explained by cohttp using 16KB buffers (vs. 4 KB in stock ocsigenserver).

Also, regarding the precise problem that motivated this issue, it's similar to this one relative to ocsigenserver, so I'd expect about the same speed increase (>2X faster serving of large files) once addressed. This would build on the buffer pool work for extra effectiveness.

mfp mentioned this issue Nov 26, 2014

Huge GC pressure caused by I/O in Conduit_lwt_unix_ssl mirage/ocaml-conduit#27

Closed

avsm mentioned this issue Jan 26, 2015

IO.read should not gobble errors silently #242

Closed

eras mentioned this issue May 11, 2015

Huge GC pressure when downloading a file over http (but not https) #345

Closed

dsheets added the Performance label May 29, 2015

mfp mentioned this issue Aug 16, 2015

Large buffers allocated on each request ocsigen/ocsigenserver#86

Open

vicuna mentioned this issue Mar 14, 2019

Another Strange Out of memory using Bigarray ocaml/ocaml#7671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge GC pressure when serving files with `Cohttp_lwt_unix.Server.respond_file` #207

Huge GC pressure when serving files with `Cohttp_lwt_unix.Server.respond_file` #207

mfp commented Nov 26, 2014

rgrinberg commented Nov 26, 2014

mfp commented Nov 26, 2014

pqwy commented Nov 26, 2014

rgrinberg commented Nov 26, 2014

pqwy commented Nov 26, 2014

dbuenzli commented Nov 26, 2014

hcarty commented Nov 26, 2014

pqwy commented Nov 26, 2014

avsm commented Nov 26, 2014

mfp commented Nov 26, 2014

pqwy commented Nov 26, 2014

mfp commented Nov 26, 2014

Chris00 commented Jan 26, 2015

Chris00 commented Jan 26, 2015

avsm commented Jan 26, 2015

mfp commented Aug 17, 2015

Huge GC pressure when serving files with Cohttp_lwt_unix.Server.respond_file #207

Huge GC pressure when serving files with Cohttp_lwt_unix.Server.respond_file #207

Comments

mfp commented Nov 26, 2014

rgrinberg commented Nov 26, 2014

mfp commented Nov 26, 2014

pqwy commented Nov 26, 2014

rgrinberg commented Nov 26, 2014

pqwy commented Nov 26, 2014

dbuenzli commented Nov 26, 2014

hcarty commented Nov 26, 2014

pqwy commented Nov 26, 2014

avsm commented Nov 26, 2014

mfp commented Nov 26, 2014

pqwy commented Nov 26, 2014

mfp commented Nov 26, 2014

Chris00 commented Jan 26, 2015

Chris00 commented Jan 26, 2015

avsm commented Jan 26, 2015

mfp commented Aug 17, 2015

Huge GC pressure when serving files with `Cohttp_lwt_unix.Server.respond_file` #207

Huge GC pressure when serving files with `Cohttp_lwt_unix.Server.respond_file` #207