You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Louis Verret <lo...@grenoble-inp.org> on 2017/09/06 11:38:04 UTC

Problem: undesired data in a temporary topic

Hello, 

I am writing to you because we are facing a technical issue regarding the development of our data streaming application using Kafka. 

Our application consists of two Java main: The Producer and The Consumer . 
The Producer periodically reads data lines from a file and sends them (as a message) into a Kafka topic. 
The Consumer uses a Kafka stream processor (i.e a Java application that implements the Kafka Streams interface Processor ). Hence, this Java application overrides the two Java methods process() and punctate() of the Processor Interface. The process() method stores in a KeyValueStore instance the lines of a Kafka message in a topic. The offset is updated in order to not double read a message. The punctate() method reads the KeyValueStore instance and processes the data. 

The problem is this: between two runnings of the Producer and Consumer main (in parallel), data lines are stored as residue in a temporary topic called test-name_of_the_store-changelog. When executing the Consumer and the Launcher the second time, we believe that the data residue are re-load from the topic t est-name_of_the_store-changelog into the current topic. Then, the first data read by the Launcher at the second execution is undesired data (last data of the first execution). When we delete the t est-name_of_the_store-changelog topic from command line, the problem is fixed. Since command line executions are different between OS distributions, we want to delete this topic from the application Java code just at the beginning of the running. Is there a way to realize that using the Kafka Streams API ? 

Best regards, 

Louis