You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by R P <ha...@outlook.com> on 2016/02/11 02:09:11 UTC

What is the best way to write Kafka data into HDFS?

Hello All,
  New Kafka user here. What is the best way to write Kafka data into HDFS?
I have looked into following options and found that Flume is quickest and easiest to setup.

1. Flume
2. KaBoom
3. Kafka Hadoop Loader
4. Camus -> Gobblin

Although Flume can result into small file problems when your data is partitioned and some partitions generate sporadic data.

What are some best practices and options to write data from Kafka to HDFS?

Thanks,
R P




RE: What is the best way to write Kafka data into HDFS?

Posted by "Kudumula, Surender" <su...@hpe.com>.
May be you can try Apache NiFi its quicker as well. Give a try good luck




-----Original Message-----
From: R P [mailto:hadooper@outlook.com] 
Sent: 11 February 2016 16:09
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?


Hello Steve, Thanks for the suggestion. Looks like this Git repo is not updated for more than 10 months. 
Is this project still supported? 
Where can I find current usage and performance metrics ?

Thanks,
R P
________________________________________
From: steve.morin@gmail.com <st...@gmail.com> on behalf of Steve Morin <st...@stevemorin.com>
Sent: Wednesday, February 10, 2016 6:36 PM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

R P, happy to walk you through https://github.com/DemandCube/Scribengin if your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest 
> and easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is 
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


--
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smorin@nventdata.com  <sm...@nventdata.com>

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable | Energy | Oil & Gas | Insurance | Finance | Telecom

Re: What is the best way to write Kafka data into HDFS?

Posted by R P <ha...@outlook.com>.
Hello Steve, Thanks for the suggestion. Looks like this Git repo is not updated for more than 10 months. 
Is this project still supported? 
Where can I find current usage and performance metrics ?

Thanks,
R P
________________________________________
From: steve.morin@gmail.com <st...@gmail.com> on behalf of Steve Morin <st...@stevemorin.com>
Sent: Wednesday, February 10, 2016 6:36 PM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

R P, happy to walk you through https://github.com/DemandCube/Scribengin if
your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


--
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smorin@nventdata.com  <sm...@nventdata.com>

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable |
Energy | Oil & Gas | Insurance | Finance | Telecom

Re: What is the best way to write Kafka data into HDFS?

Posted by Steve Morin <st...@stevemorin.com>.
R P, happy to walk you through https://github.com/DemandCube/Scribengin if
your interested

On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smorin@nventdata.com  <sm...@nventdata.com>

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization

BigData Technologies: Hadoop & Ecosystem | NoSql| Hbase | Cassandra | Storm
| Spark | Kafka | Mesos | Docker | & More

Industries: IoT | Advertising | Retail | Manufacturing | TV & Cable |
Energy | Oil & Gas | Insurance | Finance | Telecom

Re: What is the best way to write Kafka data into HDFS?

Posted by Adam Kunicki <ad...@streamsets.com>.
If you're looking for a lightweight solution with a friendly GUI (and fully
open source) check out streamsets.com
<https://mailtrack.io/trace/link/8413e80062fe60ea36185e68e2c54fe655621683?url=http%3A%2F%2Fstreamsets.com&signature=f9dc2333185869a9>
.
It supports writing messages to a parameterized directory hierarchy (e.g.
partitioned hive tables), support for late records if your template happens
to involve date/time variables.
How many messages per file and maximum file size are also fully
configurable.

Full Disclosure: I'm an engineer actively working on the project.

-Adam

On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>


-- 
Adam Kunicki
StreamSets | Field Engineer
mobile: 415.890.DATA (3282) | linkedin
<https://mailtrack.io/trace/link/ca71d99cbd53c90aa486d53a89ae27b424435e40?url=http%3A%2F%2Fwww.adamkunicki.com&signature=b0e94e141f13a326>

Re: What is the best way to write Kafka data into HDFS?

Posted by R P <ha...@outlook.com>.
Hey Jay, 
  It's awesome to get reply from one of the key Kafka contributor :) .  Thanks for suggesting Kafka Connect.

How does Kafka-Connect deals with HDFS small files? ( I assume setting large flus.size allows user to maintain minimum HDFS file size.  )
Does Kafka-Connect keep file handle open until file is committed?  ( Flume keeps file handles open resulting into too many files open) 
Can I write custom serializer for kafka-connect ?

Thanks,
R P

________________________________________
From: Jay Kreps <ja...@confluent.io>
Sent: Thursday, February 11, 2016 11:45 AM
To: users@kafka.apache.org
Subject: Re: What is the best way to write Kafka data into HDFS?

Check out Kafka Connect:

http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect

-Jay


On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>

Re: What is the best way to write Kafka data into HDFS?

Posted by Jay Kreps <ja...@confluent.io>.
Check out Kafka Connect:

http://www.confluent.io/blog/how-to-build-a-scalable-etl-pipeline-with-kafka-connect

-Jay


On Wed, Feb 10, 2016 at 5:09 PM, R P <ha...@outlook.com> wrote:

> Hello All,
>   New Kafka user here. What is the best way to write Kafka data into HDFS?
> I have looked into following options and found that Flume is quickest and
> easiest to setup.
>
> 1. Flume
> 2. KaBoom
> 3. Kafka Hadoop Loader
> 4. Camus -> Gobblin
>
> Although Flume can result into small file problems when your data is
> partitioned and some partitions generate sporadic data.
>
> What are some best practices and options to write data from Kafka to HDFS?
>
> Thanks,
> R P
>
>
>
>