You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Cas Apanowicz <ca...@it-horizon.com> on 2017/01/10 00:56:11 UTC

Kafka as a data ingest

Hi,

I have general understanding of main Kafka functionality as a streaming tool.
However, I'm trying to figure out if I can use Kafka to read Hadoop file.
Can you please advise?
Thanks

Cas


Re: Kafka as a data ingest

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Will,

The HDFS connector we ship today is for Kafka -> HDFS, so it isn't
reading/processing data in HDFS.

I was discussing both directions because the question was unclear. However,
there's no reason you couldn't create a connector that processes files in
splits to parallelize an HDFS -> Kafka path, even if it was only for a
single file.

-Ewen

On Tue, Jan 10, 2017 at 5:09 AM, Will Du <wi...@gmail.com> wrote:

> In terms of big files which is quite often in HDFS, does connect task
> parallel process the same file like what MR deal with split files? I do not
> think so. In this case, Kafka connect implement has no advantages to read
> single big file unless you also use mapreduce.
>
> Sent from my iPhone
>
> On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava <ew...@confluent.io>
> wrote:
>
> >> However, I'm trying to figure out if I can use Kafka to read Hadoop
> file.
> >
> > The question is a bit unclear as to whether you mean "use Kafka to send
> > data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
> > topic". But in both cases, Kafka Connect provides a good option.
> >
> > The more common use case is sending data that you have in Kafka into
> HDFS.
> > In that case,
> > http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/
> hdfs_connector.html
> > is a good option.
> >
> > If you want the less common case of sending data from HDFS files into a
> > stream of Kafka records, I'm not aware of a connector for doing that yet
> > but it is definitely possible. Kafka Connect takes care of a lot of the
> > details for you so all you have to do is read the file and emit Connect's
> > SourceRecords containing the data from the file. Most other details are
> > handled for you.
> >
> > -Ewen
> >
> >> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder <sh...@gmail.com>
> wrote:
> >>
> >> If you want to know if "kafka" can read hadoop files, then no. But you
> can
> >> write your own producer that reads from hdfs any which way and pushes to
> >> kafka. We use kafka as the ingestion pipeline's main queue. Read from
> >> various sources and push everything to kafka.
> >>
> >>
> >> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <
> >> cas.apanowicz@it-horizon.com
> >>> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have general understanding of main Kafka functionality as a streaming
> >>> tool.
> >>> However, I'm trying to figure out if I can use Kafka to read Hadoop
> file.
> >>> Can you please advise?
> >>> Thanks
> >>>
> >>> Cas
> >>>
> >>>
> >>
> >>
> >> --
> >> --
> >> Sharninder
> >>
>

Re: Kafka as a data ingest

Posted by Will Du <wi...@gmail.com>.
In terms of big files which is quite often in HDFS, does connect task parallel process the same file like what MR deal with split files? I do not think so. In this case, Kafka connect implement has no advantages to read single big file unless you also use mapreduce.

Sent from my iPhone

On Jan 10, 2017, at 02:41, Ewen Cheslack-Postava <ew...@confluent.io> wrote:

>> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> 
> The question is a bit unclear as to whether you mean "use Kafka to send
> data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
> topic". But in both cases, Kafka Connect provides a good option.
> 
> The more common use case is sending data that you have in Kafka into HDFS.
> In that case,
> http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/hdfs_connector.html
> is a good option.
> 
> If you want the less common case of sending data from HDFS files into a
> stream of Kafka records, I'm not aware of a connector for doing that yet
> but it is definitely possible. Kafka Connect takes care of a lot of the
> details for you so all you have to do is read the file and emit Connect's
> SourceRecords containing the data from the file. Most other details are
> handled for you.
> 
> -Ewen
> 
>> On Mon, Jan 9, 2017 at 9:18 PM, Sharninder <sh...@gmail.com> wrote:
>> 
>> If you want to know if "kafka" can read hadoop files, then no. But you can
>> write your own producer that reads from hdfs any which way and pushes to
>> kafka. We use kafka as the ingestion pipeline's main queue. Read from
>> various sources and push everything to kafka.
>> 
>> 
>> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <
>> cas.apanowicz@it-horizon.com
>>> wrote:
>> 
>>> Hi,
>>> 
>>> I have general understanding of main Kafka functionality as a streaming
>>> tool.
>>> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
>>> Can you please advise?
>>> Thanks
>>> 
>>> Cas
>>> 
>>> 
>> 
>> 
>> --
>> --
>> Sharninder
>> 

Re: Kafka as a data ingest

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
> However, I'm trying to figure out if I can use Kafka to read Hadoop file.

The question is a bit unclear as to whether you mean "use Kafka to send
data to a Hadoop file" or "use Kafka to read a Hadoop file into a Kafka
topic". But in both cases, Kafka Connect provides a good option.

The more common use case is sending data that you have in Kafka into HDFS.
In that case,
http://docs.confluent.io/3.1.1/connect/connect-hdfs/docs/hdfs_connector.html
is a good option.

If you want the less common case of sending data from HDFS files into a
stream of Kafka records, I'm not aware of a connector for doing that yet
but it is definitely possible. Kafka Connect takes care of a lot of the
details for you so all you have to do is read the file and emit Connect's
SourceRecords containing the data from the file. Most other details are
handled for you.

-Ewen

On Mon, Jan 9, 2017 at 9:18 PM, Sharninder <sh...@gmail.com> wrote:

> If you want to know if "kafka" can read hadoop files, then no. But you can
> write your own producer that reads from hdfs any which way and pushes to
> kafka. We use kafka as the ingestion pipeline's main queue. Read from
> various sources and push everything to kafka.
>
>
> On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <
> cas.apanowicz@it-horizon.com
> > wrote:
>
> > Hi,
> >
> > I have general understanding of main Kafka functionality as a streaming
> > tool.
> > However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> > Can you please advise?
> > Thanks
> >
> > Cas
> >
> >
>
>
> --
> --
> Sharninder
>

Re: Kafka as a data ingest

Posted by Sharninder <sh...@gmail.com>.
If you want to know if "kafka" can read hadoop files, then no. But you can
write your own producer that reads from hdfs any which way and pushes to
kafka. We use kafka as the ingestion pipeline's main queue. Read from
various sources and push everything to kafka.


On Tue, Jan 10, 2017 at 6:26 AM, Cas Apanowicz <cas.apanowicz@it-horizon.com
> wrote:

> Hi,
>
> I have general understanding of main Kafka functionality as a streaming
> tool.
> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> Can you please advise?
> Thanks
>
> Cas
>
>


-- 
--
Sharninder

Re: Kafka as a data ingest

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
Can you explain in more detail? Do you want to have files created in hdfs somehow broken into records and put into Kafka?

> On Jan 9, 2017, at 19:57, Cas Apanowicz <ca...@it-horizon.com> wrote:
>
> Hi,
>
> I have general understanding of main Kafka functionality as a streaming tool.
> However, I'm trying to figure out if I can use Kafka to read Hadoop file.
> Can you please advise?
> Thanks
>
> Cas
>
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.