You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by ju...@apache.org on 2013/03/24 23:54:39 UTC

svn commit: r1460481 - /kafka/site/faq.html

Author: junrao
Date: Sun Mar 24 22:54:39 2013
New Revision: 1460481

URL: http://svn.apache.org/r1460481
Log:
add more FAQ

Modified:
    kafka/site/faq.html

Modified: kafka/site/faq.html
URL: http://svn.apache.org/viewvc/kafka/site/faq.html?rev=1460481&r1=1460480&r2=1460481&view=diff
==============================================================================
--- kafka/site/faq.html (original)
+++ kafka/site/faq.html Sun Mar 24 22:54:39 2013
@@ -2,12 +2,24 @@
 
 <h2>Frequently asked questions</h3>
 <ol>	
+<li> <h3> Why do I get QueueFullException in my producer when running in async mode? </h3>
+This typically happens when the producer is trying to send messages quicker than the broker can handle. If the producer can't block, one will have to add enough brokers so that they jointly can handle the load. If the producer can block, one can set queue.enqueueTimeout.ms in producer config to -1. This way, if the queue is full, the producer will block instead of dropping messages.
+
 <li> <h3> Why does my consumer get InvalidMessageSizeException? </h3>
 This typically means that the "fetch size" of the consumer is too small. Each time the consumer pulls data from the broker, it reads bytes up to a configured limit. If that limit is smaller than the largest single message stored in Kafka, the consumer can't decode the message properly and will throw an InvalidMessageSizeException. To fix this, increase the limit by setting the property "fetch.size" properly in config/consumer.properties. The default fetch.size is 300,000 bytes.
 
 <li> <h3> On EC2, why can't my high-level consumers connect to the brokers? </h3>
 When a broker starts up, it registers its host ip in ZK. The high-level consumer later uses the registered host ip to establish the socket connection to the broker. By default, the registered ip is given by InetAddress.getLocalHost.getHostAddress. Typically, this should return the real ip of the host. However, in EC2, the returned ip is an internal one and can't be connected to from outside. The solution is to explicitly set the host ip to be registered in ZK by setting the "hostname" property in server.properties.
 
+<li> <h3> Why some of the consumers in a consumer group never receive any message? </h3>
+Currently, a topic partition is the smallest unit that we distribute messages among consumers in the same consumer group. So, if the number of consumers is larger than the total number of partitions in a Kafka cluster (across all brokers), some consumers will never get any data. The solution is to increase the number of partitions on the broker.
+
+<li> <h3> How do I choose the number of partitions for a topic? </h3>
+Having more partitions increases I/O parallelism for writes and thus leads to higher producer throughput. It also increases the degree of parallelism for consumers (see the previous question). On the other hand, more partitions adds some overhead: (a) there will be more segment files and thus more open file handlers in the broker; (b) there are more offsets to be checkpointed by consumers which can increase the load of Zookeeper. So, one needs to balace these tradeoffs. 
+
+<li> <h3> Why are there many rebalances in my consumer log? </h3>
+A typical reason for many rebalances is the consumer side GC. If so, you will see Zookeeper session expirations in the consumer log (grep for Expired). Occasional rebalances are fine. Too many rebalances can slow down the consumption and one will need to tune the java GC setting.
+
 <li> <h3> My consumer seems to have stopped, why? </h3>
 First, try to figure out if the consumer has really stopped or is just slow, using our tool <code>ConsumerOffsetChecker</code>.
 <pre>