You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Gwen Shapira (JIRA)" <ji...@apache.org> on 2014/10/15 01:37:34 UTC

[jira] [Created] (KAFKA-1705) Add MR layer to Kafka

Gwen Shapira created KAFKA-1705:
-----------------------------------

             Summary: Add MR layer to Kafka
                 Key: KAFKA-1705
                 URL: https://issues.apache.org/jira/browse/KAFKA-1705
             Project: Kafka
          Issue Type: Improvement
            Reporter: Gwen Shapira
            Assignee: Gwen Shapira


Many NoSQL-type storage systems (HBase, Mongo,
Cassandra) and file formats (Avro, Parquet) provide is a MapReduce
integration layer - usually an InputFormat, OutputFormat and a utility
class. Sometimes there's also an abstract Job and Mapper that do more
setup, which can make things even more convenient.

This is different than the existing Hadoop contrib project or Camus in that an MR layer will be providing components for use in MR jobs, not an entire job that ingests data from Kafka to HDFS.

The benefits I see for a MapReduce layer are:
* Developers can create their own jobs, processing the data as it is
ingested - rather than having to process it in two steps.
* There's reusable components for developers looking to integrate with
Kafka, rather than having everyone implement their own solution.
* Hadoop developers expect projects to have this layer.
* Spark reuses Hadoop's InputFormat and OutputFormat - so we get Spark
integration for free.
* There's a layer to plug the delegation token code into and make it
invisible to MapReduce developers. Without this, everyone who writes
MR jobs will need to think about how to implement authentication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)