Skip to content
Matt Howlett edited this page Jul 6, 2018 · 28 revisions

FAQ

How many connections should we expect to see from the .NET client into the kafka brokers?

What factors determine the connection count? (#brokers, #topics, #partitions, #client consumer instances, other?)

Refer to: https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-broker-tcp-connections

The number of open connections is determined by the number of brokers. The client writes / reads data directly from the broker that is the leader for the partition of interest and commonly a client will require connections to all brokers.

The worst case number of connections held open by librdkafka is: cnt(bootstrap.servers) + cnt(brokers in Metadata response). The minimum number of connections held open is cnt(brokers in Metadata response). Currently, librdkafka holds connections open to all brokers whether or not they are needed. In the future, we plan to enhance librdkafka so that disused connections are not maintained.

What are the trade-offs regarding the number of .NET client consumer instances?

Currently, we have N topics. We are creating a consumer instance in the application for each topic. Is that acceptable?

Should we use a single consumer for all topics?

It's more efficient to use less clients:

  • Each client maintains open connections to all brokers and internally creates 1+(# connections) threads. Non-voluntary context switches may start introducing significant overhead client-side for large numbers of threads on a single machine.
  • Each broker connection has a small server side cost. As a rough indication of the magnitude of this, in a recent benchmark we saw end-to-end latencies reduced by about half as number of (producer) connections was varied from ~200 to ~25000 on a 12 broker cluster, all else equal.
  • There is a small fixed cost per broker request (client and server side) and using a single client allows multiple topics to be combined in broker requests.
  • There is additional client-side memory overhead in using separate clients.

On the other hand, the API isn't set up for the subscription set being updated frequently. If you want to change the subscription set dynamically, you'll probably be better off with multiple consumers.

With Avro, is there a performance difference between the specific vs generic approach? What is preferred from the .NET client?

Working with the specific classes is much simpler and you should prefer this where possible. Use GenericRecord in scenarios where you need to dynamically work with data of any type.

I have not benchmarked this, but suspect the specific case to be a bit more efficient.

Where can I find a list of all the configuration properties?

Configuration parameters

What are some good resources for getting started with Kafka?

If you're new to Apache Kafka, the introduction and design sections of the Apache documentation are an excellent place to start.

The Confluent blog has a lot of good technical information - Jay Kreps's guide to building a streaming platform covers many of the core concepts again, but with a focus on Kafka's role at a company-wide scale. Also noteworthy are Ben Stopford's microservices blogposts for his unique take on the relationship between applications and data.

What is Confluent Platform?

Confluent Platform is a distribution of Apache Kafka. A good comparison of Apache Kafka, Confluent OSS and Confluent Enterprise can be found here.


Client Creation

Presentations