You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by Apache Wiki <wi...@apache.org> on 2014/04/22 17:38:22 UTC

[Samza Wiki] Update of "FAQ" by ChrisRiccomini

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Samza Wiki" for change notification.

The "FAQ" page has been changed by ChrisRiccomini:
https://wiki.apache.org/samza/FAQ?action=diff&rev1=3&rev2=4

  = FAQ =
  
+ <<TableOfContents(3)>>
+ 
- ==== Why am I seeing "java.io.IOException: Broken pipe" exceptions in my logs? ====
+ == Why am I seeing "java.io.IOException: Broken pipe" exceptions in my logs? ==
  
  If you are using a Kafka system, this exception can be caused by a number of issues.
  
@@ -13, +15 @@

  
  For more details, have a look at Kafka's [[http://kafka.apache.org/08/configuration.html|producer configuration page]].
  
- ==== I see "Unable to load realm info from SCDynamicStore" in my logs. Why? ====
+ == I see "Unable to load realm info from SCDynamicStore" in my logs. Why? ==
  
  This is a Java 6 bug that has no real impact to Samza. It does appear in the logs in certain circumstances. More details are available on [[https://issues.apache.org/jira/browse/HADOOP-7489|HADOOP-7489]].
  
+ == Why is my job processing old messages? ==
+ 
+ There are several ways in which this can occur:
+ 
+  1. Your job has no checkpoint, and is configured to start reading from the beginning of a topic.
+  2. Your job has a checkpoint, but is configured to disregard it and start reading from the beginning of a topic.
+  3. The Kafka consumer for your job gets an OffsetNotFound exception, and begins reading from the beginning of a topic.
+ 
+ There are several important configurations:
+ 
+ {{{
+  systems.<system>.streams.<your stream>.samza.reset.offset=true
+  systems.<system>.streams.<your stream>.samza.offset.default=oldest
+  systems.<system>.consumer.auto.offset.reset=smallest
+ }}}
+ 
+ The samza.reset.offset configuration tells Samza whether to pay attention to checkpoint messages, which store the last offset you read from before the container stopped. When your container starts up again, it normally picks up where it left off in the stream. This setting tells the container not to pick up where it left off.
+ 
+ The samza.offset.default setting tells the container what to do when there's no checkpoint available (or it's been ignored because of samza.reset.offset). If you say "oldest", Samza will start reading from the OLDEST message in the topic. If you say "upcoming", Samza will start reading from the newest message in the topic.
+ 
+ The consumer.auto.offset configuration tells the Kafka consumer what to do in cases where the consumer has fallen off the edge of the topic: it has an offset that's either too old, or too new for the topic it's trying to read from. If set to smallest, the Kafka consumer will start reading from the oldest message. If set to largest, it'll start reading from the newest message in the topic.
+ 
+ Note that the first two configurations also have system-level settings(i.e. systems.<your system>.samza.reset.offset and systems.<your system>.samza.offset.default).
+