You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Milind Vaidya <ka...@gmail.com> on 2017/04/13 17:30:50 UTC

Failure scenarios for a java kafka producer reading from stdin

Hi

Background :

I have following set up

Apache server >> Apache Kafka Producer >> Apache Kafka Cluster >> Apache
Storm

As a normal scenario, front end boxes run the apache server and populate
the  log files. The requirement is to read every log and send it to kafka
cluster.

The java producer reads the logs from stdin and transfer to cluster.

Zero loss criteria definition : contents of error log files should match
the data received by Kafka cluster per hour eventually per day.

The error_log files get rotated per hour.

There are couple of ways already tried to connect log files and the producer

1. Custom startup script to start, stop and check status of the server :
      tail -n0 -F /var/log/httpd/error_log /var/log/httpd/ssl_error_log |
java consumer
2. Hooking up directly to  apache using httpd.conf setting  :
      ErrorLog "| /usr/bin/tee -a /var/log/httpd/error_log |  java consumer"


In case 1 loss of logs was observed but that reduced significantly in case
2, where apache restarts the process the data is piped to if it crashes and
restarts it along with server restart as well. Now the loss is seen across
the restart of apache server.

Questions :

1. what is appropriate way to interface apache httpd and kafka ?
2. Is there way to gracefully shut down the kafka producer so that the
pending buffers are flushed before the process dies ?
3. Are there any known failure scenarios for kafka producer which are
documented ?