Skip to content
Matt Howlett edited this page Aug 6, 2018 · 28 revisions

FAQ

I want to synchronously commit after each consumed messages. It's very slow. How do I make it fast?

librdkafka uses the same broker connection both for commits and fetching messages, thus a commit may be backed up behind a long-poll blocking Fetch request. The long-poll behavior of fetch requests is configured with the fetch.wait.max.ms property that defaults to 100ms. You can decrease that value to decrease offset commit latency at the expense of a larger number of Fetch requests (depending on your traffic pattern). Also see https://github.com/edenhill/librdkafka/issues/1787

Can I use the .NET Client to check if my Kafka cluster is down?

This question is not as straight forward as it sounds. What does 'cluster down' mean? Is that all brokers down? Or just the leaders for the partitions we care about? Does it include the replicas? If all brokers are down is this maybe just a configuration error on the client, or a temporary networking problem?

If we propagate broker state information via the client, should we then make partition leader information available, and maybe consumer coordinator information? Should we provide everything you need to re-implement the capability the client already provides?

We take the approach that as a user you shouldn't care. You configure the message.timeout.ms and message.max.retries settings and let the client take care of the rest. At the end of the day, it typically boils down to the question: what amount of time will I allow for a message to be sent before it is deemed outdated?

How can I ensure delivery of messages in the order specified?

Configure max.in.flight to 1.

How many connections should we expect to see from the .NET client into the kafka brokers?

What factors determine the connection count? (#brokers, #topics, #partitions, #client consumer instances, other?)

Refer to: https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-broker-tcp-connections

The number of open connections is determined by the number of brokers. The client writes / reads data directly from the broker that is the leader for the partition of interest and commonly a client will require connections to all brokers.

The worst case number of connections held open by librdkafka is: cnt(bootstrap.servers) + cnt(brokers in Metadata response). The minimum number of connections held open is cnt(brokers in Metadata response). Currently, librdkafka holds connections open to all brokers whether or not they are needed. In the future, we plan to enhance librdkafka so that disused connections are not maintained.

What are the trade-offs regarding the number of .NET client consumer instances?

Currently, we have N topics. We are creating a consumer instance in the application for each topic. Is that acceptable?

Should we use a single consumer for all topics?

It's more efficient to use less clients:

  • Each client maintains open connections to all brokers and internally creates 1+(# connections) threads. Non-voluntary context switches may start introducing significant overhead client-side for large numbers of threads on a single machine.
  • Each broker connection has a small server side cost. As a rough indication of the magnitude of this, in a recent benchmark we saw end-to-end latencies reduced by about half as number of (producer) connections was varied from ~200 to ~25000 on a 12 broker cluster, all else equal.
  • There is a small fixed cost per broker request (client and server side) and using a single client allows multiple topics to be combined in broker requests.
  • There is additional client-side memory overhead in using separate clients.

On the other hand, the API isn't set up for the subscription set being updated frequently. If you want to change the subscription set dynamically, you'll probably be better off with multiple consumers.

With Avro, is there a performance difference between the specific vs generic approach? What is preferred from the .NET client?

Working with the specific classes is much simpler and you should prefer this where possible. Use GenericRecord in scenarios where you need to dynamically work with data of any type.

I have not benchmarked this, but suspect the specific case to be a bit more efficient.

Where can I find a list of all the configuration properties?

Configuration parameters

What are some good resources for getting started with Kafka?

If you're new to Apache Kafka, the introduction and design sections of the Apache documentation are an excellent place to start.

The Confluent blog has a lot of good technical information - Jay Kreps's guide to building a streaming platform covers many of the core concepts again, but with a focus on Kafka's role at a company-wide scale. Also noteworthy are Ben Stopford's microservices blogposts for his unique take on the relationship between applications and data.

What is Confluent Platform?

Confluent Platform is a distribution of Apache Kafka. A good comparison of Apache Kafka, Confluent OSS and Confluent Enterprise can be found here.


Client Creation

Presentations