You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by ao...@wikimedia.org on 2013/08/07 18:33:38 UTC

Re: Kafka/Hadoop consumers and producers

Hi all,

Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm not sure if we are going to.  I came across this post.

If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?

Thanks!
-Andrew

Re: Kafka/Hadoop consumers and producers

Posted by Oleg Ruchovets <or...@gmail.com>.

I am also interested with hadoop+kafka capabilities. I am using kafka 0.7 ,
so my question : What is the best way to consume contect from kafka and
write it to hdfs? At this time I need the only consuming functionality.

thanks
Oleg.

On Wed, Aug 7, 2013 at 7:33 PM, <ao...@wikimedia.org> wrote:

> Hi all,
>
> Over at the Wikimedia Foundation, we're trying to figure out the best way
> to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm
> not sure if we are going to.  I came across this post.
>
> If the plan is to remove the hadoop-consumer from Kafka contrib, do you
> think we should not consider it as one of our viable options?
>
> Thanks!
> -Andrew

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

For the last 6 months, we've been using this:

https://github.com/wikimedia-incubator/kafka-hadoop-consumer

In combination with this wrapper script:
https://github.com/wikimedia/kraken/blob/master/bin/kafka-hadoop-consume

It's not great, but it works!



On Aug 9, 2013, at 2:06 PM, Felix GV <fe...@mate1inc.com> wrote:

> I think the answer is that there is currently no strong community-backed
> solution to consume non-Avro data from Kafka to HDFS.
> 
> A lot of people do it, but I think most people adapted and expanded the
> contrib code to fit their needs.
> 
> --
> Felix
> 
> 
> On Fri, Aug 9, 2013 at 1:27 PM, Oleg Ruchovets <or...@gmail.com> wrote:
> 
>> Yes , I am definitely interested with such capabilities. We also using
>> kafka 0.7.
>>   Guys I already asked , but nobody answer: what community using to
>> consume from kafka to hdfs?
>> My assumption was that if Camus support only Avro it will not be suitable
>> for all , but people transfer from kafka to hadoop somehow. So the question
>> is what is the alternatives to Camus to transfer messages from kafka to
>> hdfs?
>> Thanks
>> Oleg.
>> 
>> 
>> On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <psaltis.andrew@gmail.com
>>> wrote:
>> 
>>> Felix,
>>> The Camus route is the direction I have headed for allot of the reasons
>>> that you described. The only wrinkle is we are still on Kafka 0.7.3 so I
>> am
>>> in the process of back porting this patch:
>>> 
>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8that
>>> is described here:
>>> https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that
>>> we can handle reading and writing non-avro'ized (if that is a word) data.
>>> 
>>> I hope to have that done sometime in the morning and would be happy to
>>> share it if others can benefit from it.
>>> 
>>> Thanks,
>>> Andrew
>>> 
>>> 
>>> On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>>> 
>>>> The contrib code is simple and probably wouldn't require too much work
>> to
>>>> fix, but it's a lot less robust than Camus, so you would ideally need
>> to do
>>>> some work to make it solid against all edge cases, failure scenarios and
>>>> performance bottlenecks...
>>>> 
>>>> I would definitely recommend investing in Camus instead, since it
>> already
>>>> covers a lot of the challenges I'm mentioning above, and also has more
>>>> community support behind it at the moment (as far as I can tell,
>> anyway),
>>>> so it is more likely to keep getting improvements than the contrib code.
>>>> 
>>>> --
>>>> Felix
>>>> 
>>>> 
>>>> On Thu, Aug 8, 2013 at 9:28 AM, <ps...@gmail.com> wrote:
>>>> 
>>>>> We also have a need today to ETL from Kafka into Hadoop and we do not
>>>>> currently nor have any plans to use Avro.
>>>>> 
>>>>> So is the official direction based on this discussion to ditch the
>> Kafka
>>>>> contrib code and direct people to use Camus without Avro as Ken
>> described
>>>>> or are both solutions going to survive?
>>>>> 
>>>>> I can put time into the contrib code and/or work on documenting the
>>>>> tutorial on how to make Camus work without Avro.
>>>>> 
>>>>> Which is the preferred route, for the long term?
>>>>> 
>>>>> Thanks,
>>>>> Andrew
>>>>> 
>>>>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>>>>>> Hi Andrew,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Camus can be made to work without avro. You will need to implement a
>>>>> message decoder and and a data writer.   We need to add a better
>> tutorial
>>>>> on how to do this, but it isn't that difficult. If you decide to go
>> down
>>>>> this path, you can always ask questions on this list. I try to make
>> sure
>>>>> each email gets answered. But it can take me a day or two.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Ken
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Hi all,
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> Over at the Wikimedia Foundation, we're trying to figure out the
>>>>> best way to do our ETL from Kafka into Hadoop.  We don't currently use
>> Avro
>>>>> and I'm not sure if we are going to.  I came across this post.
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> If the plan is to remove the hadoop-consumer from Kafka contrib, do
>>>>> you think we should not consider it as one of our viable options?
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> Thanks!
>>>>>> 
>>>>>>> -Andrew
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> --
>>>>>> 
>>>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Camus - Kafka ETL for Hadoop" group.
>>>>>> 
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>> send an email to camus_etl+...@**googlegroups.com.
>>>>> 
>>>>>> 
>>>>>>> For more options, visit https://groups.google.com/**groups/opt_out
>> <https://groups.google.com/groups/opt_out>
>>>>> .
>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>

Re: Kafka/Hadoop consumers and producers

Posted by Felix GV <fe...@mate1inc.com>.

I think the answer is that there is currently no strong community-backed
solution to consume non-Avro data from Kafka to HDFS.

A lot of people do it, but I think most people adapted and expanded the
contrib code to fit their needs.

--
Felix


On Fri, Aug 9, 2013 at 1:27 PM, Oleg Ruchovets <or...@gmail.com> wrote:

> Yes , I am definitely interested with such capabilities. We also using
> kafka 0.7.
>    Guys I already asked , but nobody answer: what community using to
> consume from kafka to hdfs?
> My assumption was that if Camus support only Avro it will not be suitable
> for all , but people transfer from kafka to hadoop somehow. So the question
> is what is the alternatives to Camus to transfer messages from kafka to
> hdfs?
> Thanks
> Oleg.
>
>
> On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <psaltis.andrew@gmail.com
> >wrote:
>
> > Felix,
> > The Camus route is the direction I have headed for allot of the reasons
> > that you described. The only wrinkle is we are still on Kafka 0.7.3 so I
> am
> > in the process of back porting this patch:
> >
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8that
> > is described here:
> > https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that
> > we can handle reading and writing non-avro'ized (if that is a word) data.
> >
> > I hope to have that done sometime in the morning and would be happy to
> > share it if others can benefit from it.
> >
> > Thanks,
> > Andrew
> >
> >
> > On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
> >
> >> The contrib code is simple and probably wouldn't require too much work
> to
> >> fix, but it's a lot less robust than Camus, so you would ideally need
> to do
> >> some work to make it solid against all edge cases, failure scenarios and
> >> performance bottlenecks...
> >>
> >> I would definitely recommend investing in Camus instead, since it
> already
> >> covers a lot of the challenges I'm mentioning above, and also has more
> >> community support behind it at the moment (as far as I can tell,
> anyway),
> >> so it is more likely to keep getting improvements than the contrib code.
> >>
> >> --
> >> Felix
> >>
> >>
> >> On Thu, Aug 8, 2013 at 9:28 AM, <ps...@gmail.com> wrote:
> >>
> >>> We also have a need today to ETL from Kafka into Hadoop and we do not
> >>> currently nor have any plans to use Avro.
> >>>
> >>> So is the official direction based on this discussion to ditch the
> Kafka
> >>> contrib code and direct people to use Camus without Avro as Ken
> described
> >>> or are both solutions going to survive?
> >>>
> >>> I can put time into the contrib code and/or work on documenting the
> >>> tutorial on how to make Camus work without Avro.
> >>>
> >>> Which is the preferred route, for the long term?
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> >>> > Hi Andrew,
> >>> >
> >>> >
> >>> >
> >>> > Camus can be made to work without avro. You will need to implement a
> >>> message decoder and and a data writer.   We need to add a better
> tutorial
> >>> on how to do this, but it isn't that difficult. If you decide to go
> down
> >>> this path, you can always ask questions on this list. I try to make
> sure
> >>> each email gets answered. But it can take me a day or two.
> >>> >
> >>> >
> >>> >
> >>> > -Ken
> >>> >
> >>> >
> >>> >
> >>> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote:
> >>> >
> >>> >
> >>> >
> >>> > > Hi all,
> >>> >
> >>> > >
> >>> >
> >>> > > Over at the Wikimedia Foundation, we're trying to figure out the
> >>> best way to do our ETL from Kafka into Hadoop.  We don't currently use
> Avro
> >>> and I'm not sure if we are going to.  I came across this post.
> >>> >
> >>> > >
> >>> >
> >>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
> >>> you think we should not consider it as one of our viable options?
> >>> >
> >>> > >
> >>> >
> >>> > > Thanks!
> >>> >
> >>> > > -Andrew
> >>> >
> >>> > >
> >>> >
> >>> > > --
> >>> >
> >>> > > You received this message because you are subscribed to the Google
> >>> Groups "Camus - Kafka ETL for Hadoop" group.
> >>> >
> >>> > > To unsubscribe from this group and stop receiving emails from it,
> >>> send an email to camus_etl+...@**googlegroups.com.
> >>>
> >>> >
> >>> > > For more options, visit https://groups.google.com/**groups/opt_out
> <https://groups.google.com/groups/opt_out>
> >>> .
> >>> >
> >>> > >
> >>> >
> >>> > >
> >>>
> >>>
> >>
>

Re: Kafka/Hadoop consumers and producers

Posted by Oleg Ruchovets <or...@gmail.com>.

Yes , I am definitely interested with such capabilities. We also using
kafka 0.7.
   Guys I already asked , but nobody answer: what community using to
consume from kafka to hdfs?
My assumption was that if Camus support only Avro it will not be suitable
for all , but people transfer from kafka to hadoop somehow. So the question
is what is the alternatives to Camus to transfer messages from kafka to
hdfs?
Thanks
Oleg.


On Fri, Aug 9, 2013 at 6:21 AM, Andrew Psaltis <ps...@gmail.com>wrote:

> Felix,
> The Camus route is the direction I have headed for allot of the reasons
> that you described. The only wrinkle is we are still on Kafka 0.7.3 so I am
> in the process of back porting this patch:
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8 that
> is described here:
> https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so that
> we can handle reading and writing non-avro'ized (if that is a word) data.
>
> I hope to have that done sometime in the morning and would be happy to
> share it if others can benefit from it.
>
> Thanks,
> Andrew
>
>
> On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>
>> The contrib code is simple and probably wouldn't require too much work to
>> fix, but it's a lot less robust than Camus, so you would ideally need to do
>> some work to make it solid against all edge cases, failure scenarios and
>> performance bottlenecks...
>>
>> I would definitely recommend investing in Camus instead, since it already
>> covers a lot of the challenges I'm mentioning above, and also has more
>> community support behind it at the moment (as far as I can tell, anyway),
>> so it is more likely to keep getting improvements than the contrib code.
>>
>> --
>> Felix
>>
>>
>> On Thu, Aug 8, 2013 at 9:28 AM, <ps...@gmail.com> wrote:
>>
>>> We also have a need today to ETL from Kafka into Hadoop and we do not
>>> currently nor have any plans to use Avro.
>>>
>>> So is the official direction based on this discussion to ditch the Kafka
>>> contrib code and direct people to use Camus without Avro as Ken described
>>> or are both solutions going to survive?
>>>
>>> I can put time into the contrib code and/or work on documenting the
>>> tutorial on how to make Camus work without Avro.
>>>
>>> Which is the preferred route, for the long term?
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>>> > Hi Andrew,
>>> >
>>> >
>>> >
>>> > Camus can be made to work without avro. You will need to implement a
>>> message decoder and and a data writer.   We need to add a better tutorial
>>> on how to do this, but it isn't that difficult. If you decide to go down
>>> this path, you can always ask questions on this list. I try to make sure
>>> each email gets answered. But it can take me a day or two.
>>> >
>>> >
>>> >
>>> > -Ken
>>> >
>>> >
>>> >
>>> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org wrote:
>>> >
>>> >
>>> >
>>> > > Hi all,
>>> >
>>> > >
>>> >
>>> > > Over at the Wikimedia Foundation, we're trying to figure out the
>>> best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro
>>> and I'm not sure if we are going to.  I came across this post.
>>> >
>>> > >
>>> >
>>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
>>> you think we should not consider it as one of our viable options?
>>> >
>>> > >
>>> >
>>> > > Thanks!
>>> >
>>> > > -Andrew
>>> >
>>> > >
>>> >
>>> > > --
>>> >
>>> > > You received this message because you are subscribed to the Google
>>> Groups "Camus - Kafka ETL for Hadoop" group.
>>> >
>>> > > To unsubscribe from this group and stop receiving emails from it,
>>> send an email to camus_etl+...@**googlegroups.com.
>>>
>>> >
>>> > > For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>> .
>>> >
>>> > >
>>> >
>>> > >
>>>
>>>
>>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <ps...@gmail.com>.

Felix,
The Camus route is the direction I have headed for allot of the reasons 
that you described. The only wrinkle is we are still on Kafka 0.7.3 so I am 
in the process of back porting this patch: 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8 that 
is described 
here: https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so 
that we can handle reading and writing non-avro'ized (if that is a word) 
data.

I hope to have that done sometime in the morning and would be happy to 
share it if others can benefit from it.

Thanks,
Andrew


On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>
> The contrib code is simple and probably wouldn't require too much work to 
> fix, but it's a lot less robust than Camus, so you would ideally need to do 
> some work to make it solid against all edge cases, failure scenarios and 
> performance bottlenecks...
>
> I would definitely recommend investing in Camus instead, since it already 
> covers a lot of the challenges I'm mentioning above, and also has more 
> community support behind it at the moment (as far as I can tell, anyway), 
> so it is more likely to keep getting improvements than the contrib code.
>
> --
> Felix
>
>
> On Thu, Aug 8, 2013 at 9:28 AM, <psaltis...@gmail.com <javascript:>>wrote:
>
>> We also have a need today to ETL from Kafka into Hadoop and we do not 
>> currently nor have any plans to use Avro.
>>
>> So is the official direction based on this discussion to ditch the Kafka 
>> contrib code and direct people to use Camus without Avro as Ken described 
>> or are both solutions going to survive?
>>
>> I can put time into the contrib code and/or work on documenting the 
>> tutorial on how to make Camus work without Avro.
>>
>> Which is the preferred route, for the long term?
>>
>> Thanks,
>> Andrew
>>
>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>> > Hi Andrew,
>> >
>> >
>> >
>> > Camus can be made to work without avro. You will need to implement a 
>> message decoder and and a data writer.   We need to add a better tutorial 
>> on how to do this, but it isn't that difficult. If you decide to go down 
>> this path, you can always ask questions on this list. I try to make sure 
>> each email gets answered. But it can take me a day or two.
>> >
>> >
>> >
>> > -Ken
>> >
>> >
>> >
>> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org <javascript:> wrote:
>> >
>> >
>> >
>> > > Hi all,
>> >
>> > >
>> >
>> > > Over at the Wikimedia Foundation, we're trying to figure out the best 
>> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and 
>> I'm not sure if we are going to.  I came across this post.
>> >
>> > >
>> >
>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do 
>> you think we should not consider it as one of our viable options?
>> >
>> > >
>> >
>> > > Thanks!
>> >
>> > > -Andrew
>> >
>> > >
>> >
>> > > --
>> >
>> > > You received this message because you are subscribed to the Google 
>> Groups "Camus - Kafka ETL for Hadoop" group.
>> >
>> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to camus_etl+...@googlegroups.com <javascript:>.
>> >
>> > > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > >
>> >
>> > >
>>
>>
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <ps...@gmail.com>.

Felix,
The Camus route is the direction I have headed for allot of the reasons 
that you described. The only wrinkle is we are still on Kafka 0.7.3 so I am 
in the process of back porting this patch: 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8 that 
is described 
here: https://groups.google.com/forum/#!topic/camus_etl/VcETxkYhzg8 -- so 
that we can handle reading and writing non-avro'ized (if that is a word) 
data.

I hope to have that done sometime in the morning and would be happy to 
share it if others can benefit from it.

Thanks,
Andrew


On Thursday, August 8, 2013 7:18:27 PM UTC-6, Felix GV wrote:
>
> The contrib code is simple and probably wouldn't require too much work to 
> fix, but it's a lot less robust than Camus, so you would ideally need to do 
> some work to make it solid against all edge cases, failure scenarios and 
> performance bottlenecks...
>
> I would definitely recommend investing in Camus instead, since it already 
> covers a lot of the challenges I'm mentioning above, and also has more 
> community support behind it at the moment (as far as I can tell, anyway), 
> so it is more likely to keep getting improvements than the contrib code.
>
> --
> Felix
>
>
> On Thu, Aug 8, 2013 at 9:28 AM, <psaltis...@gmail.com <javascript:>>wrote:
>
>> We also have a need today to ETL from Kafka into Hadoop and we do not 
>> currently nor have any plans to use Avro.
>>
>> So is the official direction based on this discussion to ditch the Kafka 
>> contrib code and direct people to use Camus without Avro as Ken described 
>> or are both solutions going to survive?
>>
>> I can put time into the contrib code and/or work on documenting the 
>> tutorial on how to make Camus work without Avro.
>>
>> Which is the preferred route, for the long term?
>>
>> Thanks,
>> Andrew
>>
>> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
>> > Hi Andrew,
>> >
>> >
>> >
>> > Camus can be made to work without avro. You will need to implement a 
>> message decoder and and a data writer.   We need to add a better tutorial 
>> on how to do this, but it isn't that difficult. If you decide to go down 
>> this path, you can always ask questions on this list. I try to make sure 
>> each email gets answered. But it can take me a day or two.
>> >
>> >
>> >
>> > -Ken
>> >
>> >
>> >
>> > On Aug 7, 2013, at 9:33 AM, ao...@wikimedia.org <javascript:> wrote:
>> >
>> >
>> >
>> > > Hi all,
>> >
>> > >
>> >
>> > > Over at the Wikimedia Foundation, we're trying to figure out the best 
>> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and 
>> I'm not sure if we are going to.  I came across this post.
>> >
>> > >
>> >
>> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do 
>> you think we should not consider it as one of our viable options?
>> >
>> > >
>> >
>> > > Thanks!
>> >
>> > > -Andrew
>> >
>> > >
>> >
>> > > --
>> >
>> > > You received this message because you are subscribed to the Google 
>> Groups "Camus - Kafka ETL for Hadoop" group.
>> >
>> > > To unsubscribe from this group and stop receiving emails from it, 
>> send an email to camus_etl+...@googlegroups.com <javascript:>.
>> >
>> > > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > >
>> >
>> > >
>>
>>
>

Re: Kafka/Hadoop consumers and producers

Posted by Felix GV <fe...@mate1inc.com>.

The contrib code is simple and probably wouldn't require too much work to
fix, but it's a lot less robust than Camus, so you would ideally need to do
some work to make it solid against all edge cases, failure scenarios and
performance bottlenecks...

I would definitely recommend investing in Camus instead, since it already
covers a lot of the challenges I'm mentioning above, and also has more
community support behind it at the moment (as far as I can tell, anyway),
so it is more likely to keep getting improvements than the contrib code.

--
Felix


On Thu, Aug 8, 2013 at 9:28 AM, <ps...@gmail.com> wrote:

> We also have a need today to ETL from Kafka into Hadoop and we do not
> currently nor have any plans to use Avro.
>
> So is the official direction based on this discussion to ditch the Kafka
> contrib code and direct people to use Camus without Avro as Ken described
> or are both solutions going to survive?
>
> I can put time into the contrib code and/or work on documenting the
> tutorial on how to make Camus work without Avro.
>
> Which is the preferred route, for the long term?
>
> Thanks,
> Andrew
>
> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> > Hi Andrew,
> >
> >
> >
> > Camus can be made to work without avro. You will need to implement a
> message decoder and and a data writer.   We need to add a better tutorial
> on how to do this, but it isn't that difficult. If you decide to go down
> this path, you can always ask questions on this list. I try to make sure
> each email gets answered. But it can take me a day or two.
> >
> >
> >
> > -Ken
> >
> >
> >
> > On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:
> >
> >
> >
> > > Hi all,
> >
> > >
> >
> > > Over at the Wikimedia Foundation, we're trying to figure out the best
> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and
> I'm not sure if we are going to.  I came across this post.
> >
> > >
> >
> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
> you think we should not consider it as one of our viable options?
> >
> > >
> >
> > > Thanks!
> >
> > > -Andrew
> >
> > >
> >
> > > --
> >
> > > You received this message because you are subscribed to the Google
> Groups "Camus - Kafka ETL for Hadoop" group.
> >
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to camus_etl+unsubscribe@googlegroups.com.
> >
> > > For more options, visit https://groups.google.com/groups/opt_out.
> >
> > >
> >
> > >
>
>

Re: Kafka/Hadoop consumers and producers

Posted by Felix GV <fe...@mate1inc.com>.

The contrib code is simple and probably wouldn't require too much work to
fix, but it's a lot less robust than Camus, so you would ideally need to do
some work to make it solid against all edge cases, failure scenarios and
performance bottlenecks...

I would definitely recommend investing in Camus instead, since it already
covers a lot of the challenges I'm mentioning above, and also has more
community support behind it at the moment (as far as I can tell, anyway),
so it is more likely to keep getting improvements than the contrib code.

--
Felix


On Thu, Aug 8, 2013 at 9:28 AM, <ps...@gmail.com> wrote:

> We also have a need today to ETL from Kafka into Hadoop and we do not
> currently nor have any plans to use Avro.
>
> So is the official direction based on this discussion to ditch the Kafka
> contrib code and direct people to use Camus without Avro as Ken described
> or are both solutions going to survive?
>
> I can put time into the contrib code and/or work on documenting the
> tutorial on how to make Camus work without Avro.
>
> Which is the preferred route, for the long term?
>
> Thanks,
> Andrew
>
> On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> > Hi Andrew,
> >
> >
> >
> > Camus can be made to work without avro. You will need to implement a
> message decoder and and a data writer.   We need to add a better tutorial
> on how to do this, but it isn't that difficult. If you decide to go down
> this path, you can always ask questions on this list. I try to make sure
> each email gets answered. But it can take me a day or two.
> >
> >
> >
> > -Ken
> >
> >
> >
> > On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:
> >
> >
> >
> > > Hi all,
> >
> > >
> >
> > > Over at the Wikimedia Foundation, we're trying to figure out the best
> way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and
> I'm not sure if we are going to.  I came across this post.
> >
> > >
> >
> > > If the plan is to remove the hadoop-consumer from Kafka contrib, do
> you think we should not consider it as one of our viable options?
> >
> > >
> >
> > > Thanks!
> >
> > > -Andrew
> >
> > >
> >
> > > --
> >
> > > You received this message because you are subscribed to the Google
> Groups "Camus - Kafka ETL for Hadoop" group.
> >
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to camus_etl+unsubscribe@googlegroups.com.
> >
> > > For more options, visit https://groups.google.com/groups/opt_out.
> >
> > >
> >
> > >
>
>

Re: Kafka/Hadoop consumers and producers

Posted by ps...@gmail.com.

We also have a need today to ETL from Kafka into Hadoop and we do not currently nor have any plans to use Avro. 

So is the official direction based on this discussion to ditch the Kafka contrib code and direct people to use Camus without Avro as Ken described or are both solutions going to survive? 

I can put time into the contrib code and/or work on documenting the tutorial on how to make Camus work without Avro. 

Which is the preferred route, for the long term?

Thanks,
Andrew

On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> Hi Andrew,
> 
> 
> 
> Camus can be made to work without avro. You will need to implement a message decoder and and a data writer.   We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. 
> 
> 
> 
> -Ken
> 
> 
> 
> On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:
> 
> 
> 
> > Hi all,
> 
> > 
> 
> > Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm not sure if we are going to.  I came across this post.
> 
> > 
> 
> > If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?
> 
> > 
> 
> > Thanks!
> 
> > -Andrew
> 
> > 
> 
> > -- 
> 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> 
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> 
> > For more options, visit https://groups.google.com/groups/opt_out.
> 
> > 
> 
> >

Re: Kafka/Hadoop consumers and producers

Posted by Abhi Basu <90...@gmail.com>.

I agree with you. We are looking for a simple solution for data from Kafka 
to Hadoop. I have tried using Camus earlier (Non-Avro) and documentation is 
lacking to make it work correctly, as we do not need to introduce another 
component to the solution. In the meantime, can the Kafka Hadoop 
Consumer/Producer be documented well so we can try it out ASAP. :)  Thanks.

On Friday, August 9, 2013 12:27:12 PM UTC-7, Ken Goodhope wrote:
>
> I just checked and that patch is in .8 branch.   Thanks for working on 
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite 
> identical.  Camus is intended to be a high throughput ETL for bulk 
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is 
> more of a simple KafkaInputFormat.  Neither can really replace the other.  
> If you had a complex hadoop workflow and wanted to introduce some Kafka 
> data into that workflow, using Camus would be a gigantic overkill and a 
> pain to setup.  On the flipside, if what you want is frequent reliable 
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you 
> with that.
>
> I think it would be preferable to simplify the existing contrib 
> Input/OutputFormats by refactoring them to use the more stable higher level 
> Kafka APIs.  Currently they use the lower level APIs.  This should make 
> them easier to maintain, and user friendly enough to avoid the need for 
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <psaltis...@gmail.com<javascript:>
> > wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fpull%2F15&sa=D&sntz=1&usg=AFQjCNENlPRS_I-7w_drkTC09rmQKGNNVg>it was merged into the camus-kafka-0.8 
>> branch. I have not checked if the code was subsequently removed, however, 
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java) 
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <dibyendu.b...@pearson.com <javascript:>
>> > wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our 
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fcommit%2F87917a2aea46da9d21c8f67129f6463af52f7aa8&sa=D&sntz=1&usg=AFQjCNGxLfUhDjxOiEp-zHUb14dlNYwriw>) 
>>> which supports custom data writer for Camus. But this patch is not pulled 
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> camus_etl+...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to camus_etl+...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks Andrew. I upgrade it to use the high level consumer.


________________________________
 From: Andrew Psaltis <An...@Webtrends.com>
To: "users@kafka.apache.org" <us...@kafka.apache.org>; Kam Kasravi <ka...@yahoo.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com> 
Cc: Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com> 
Sent: Monday, August 12, 2013 8:20 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

Kam,
I am perfectly fine if you pick this up. After thinking about it for a
while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
more closely matches our use case, with the caveat of we do not use Avro.
With that said, I will try and work on the back-port of custom data writer
patch[1], however, I am not sure how quickly I will get this done as we
are going to work towards upgrading our Kafka cluster.

Thanks,
Andrew

[1] 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
2f7aa8





On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:

>I would like to do this refactoring since I did a high level consumer a
>while ago. 
>A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>hoping to add to contribute.
>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>0.8 to the bigtop distribution.
>KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>services and would ease some of the
>startup/configuration issues that newbies have when getting started with
>kafka. Ideally I would like to
>fold a number of kafka/bin/* commands into the kafka service. Andrew
>please let me know if would like to
>pick this up instead. Thanks!
>
>Kam
>
>
>________________________________
> From: Jay Kreps <ja...@gmail.com>
>To: Ken Goodhope <ke...@gmail.com>
>Cc: Andrew Psaltis <ps...@gmail.com>;
>dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
><ca...@googlegroups.com>; "aotto@wikimedia.org"
><ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
><cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>"users@kafka.apache.org" <us...@kafka.apache.org>
>Sent: Saturday, August 10, 2013 3:30 PM
>Subject: Re: Kafka/Hadoop consumers and producers
> 
>
>So guys, just to throw my 2 cents in:
>
>1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>package wasn't getting as much attention as it should.
>
>2. Andrew or anyone--if there is anyone using the contrib package who
>would
>be willing to volunteer to kind of adopt it that would be great. I am
>happy
>to help in whatever way I can. The practical issue is that most of the
>committers are either using Camus or not using Hadoop at all so we just
>haven't been doing a good job of documenting, bug fixing, and supporting
>the contrib packages.
>
>3. Ken, if you could document how to use Camus that would likely make it a
>lot more useful to people. I think most people would want a full-fledged
>ETL solution and would likely prefer Camus, but very few people are using
>Avro.
>
>-Jay
>
>
>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>wrote:
>
>> I just checked and that patch is in .8 branch.   Thanks for working on
>> back porting it Andrew.  We'd be happy to commit that work to master.
>>
>> As for the kafka contrib project vs Camus, they are similar but not
>>quite
>> identical.  Camus is intended to be a high throughput ETL for bulk
>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>> more of a simple KafkaInputFormat.  Neither can really replace the
>>other.
>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>> data into that workflow, using Camus would be a gigantic overkill and a
>> pain to setup.  On the flipside, if what you want is frequent reliable
>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>> with that.
>>
>> I think it would be preferable to simplify the existing contrib
>> Input/OutputFormats by refactoring them to use the more stable higher
>>level
>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>> them easier to maintain, and user friendly enough to avoid the need for
>> extensive documentation.
>>
>> Ken
>>
>>
>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>><ps...@gmail.com>wrote:
>>
>>> Dibyendu,
>>> According to the pull request:
>>>https://github.com/linkedin/camus/pull/15it was merged into the
>>>camus-kafka-0.8
>>> branch. I have not checked if the code was subsequently removed,
>>>however,
>>> two at least one the important files from this patch
>>>(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>a)
>>> is still present.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>  On Fri, Aug 9, 2013 at 9:39 AM,
>>><di...@pearson.com>wrote:
>>>
>>>>  Hi Ken,
>>>>
>>>> I am also working on making the Camus fit for Non Avro message for our
>>>> requirement.
>>>>
>>>> I see you mentioned about this patch (
>>>> 
>>>>https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>3af52f7aa8)
>>>> which supports custom data writer for Camus. But this patch is not
>>>>pulled
>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>
>>>> Regards,
>>>> Dibyendu
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>>Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>>an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>>Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>>an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks Andrew. I upgrade it to use the high level consumer.


________________________________
 From: Andrew Psaltis <An...@Webtrends.com>
To: "users@kafka.apache.org" <us...@kafka.apache.org>; Kam Kasravi <ka...@yahoo.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com> 
Cc: Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com> 
Sent: Monday, August 12, 2013 8:20 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

Kam,
I am perfectly fine if you pick this up. After thinking about it for a
while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
more closely matches our use case, with the caveat of we do not use Avro.
With that said, I will try and work on the back-port of custom data writer
patch[1], however, I am not sure how quickly I will get this done as we
are going to work towards upgrading our Kafka cluster.

Thanks,
Andrew

[1] 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
2f7aa8





On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:

>I would like to do this refactoring since I did a high level consumer a
>while ago. 
>A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>hoping to add to contribute.
>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>0.8 to the bigtop distribution.
>KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>services and would ease some of the
>startup/configuration issues that newbies have when getting started with
>kafka. Ideally I would like to
>fold a number of kafka/bin/* commands into the kafka service. Andrew
>please let me know if would like to
>pick this up instead. Thanks!
>
>Kam
>
>
>________________________________
> From: Jay Kreps <ja...@gmail.com>
>To: Ken Goodhope <ke...@gmail.com>
>Cc: Andrew Psaltis <ps...@gmail.com>;
>dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
><ca...@googlegroups.com>; "aotto@wikimedia.org"
><ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
><cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>"users@kafka.apache.org" <us...@kafka.apache.org>
>Sent: Saturday, August 10, 2013 3:30 PM
>Subject: Re: Kafka/Hadoop consumers and producers
> 
>
>So guys, just to throw my 2 cents in:
>
>1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>package wasn't getting as much attention as it should.
>
>2. Andrew or anyone--if there is anyone using the contrib package who
>would
>be willing to volunteer to kind of adopt it that would be great. I am
>happy
>to help in whatever way I can. The practical issue is that most of the
>committers are either using Camus or not using Hadoop at all so we just
>haven't been doing a good job of documenting, bug fixing, and supporting
>the contrib packages.
>
>3. Ken, if you could document how to use Camus that would likely make it a
>lot more useful to people. I think most people would want a full-fledged
>ETL solution and would likely prefer Camus, but very few people are using
>Avro.
>
>-Jay
>
>
>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>wrote:
>
>> I just checked and that patch is in .8 branch.   Thanks for working on
>> back porting it Andrew.  We'd be happy to commit that work to master.
>>
>> As for the kafka contrib project vs Camus, they are similar but not
>>quite
>> identical.  Camus is intended to be a high throughput ETL for bulk
>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>> more of a simple KafkaInputFormat.  Neither can really replace the
>>other.
>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>> data into that workflow, using Camus would be a gigantic overkill and a
>> pain to setup.  On the flipside, if what you want is frequent reliable
>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>> with that.
>>
>> I think it would be preferable to simplify the existing contrib
>> Input/OutputFormats by refactoring them to use the more stable higher
>>level
>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>> them easier to maintain, and user friendly enough to avoid the need for
>> extensive documentation.
>>
>> Ken
>>
>>
>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>><ps...@gmail.com>wrote:
>>
>>> Dibyendu,
>>> According to the pull request:
>>>https://github.com/linkedin/camus/pull/15it was merged into the
>>>camus-kafka-0.8
>>> branch. I have not checked if the code was subsequently removed,
>>>however,
>>> two at least one the important files from this patch
>>>(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>a)
>>> is still present.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>  On Fri, Aug 9, 2013 at 9:39 AM,
>>><di...@pearson.com>wrote:
>>>
>>>>  Hi Ken,
>>>>
>>>> I am also working on making the Camus fit for Non Avro message for our
>>>> requirement.
>>>>
>>>> I see you mentioned about this patch (
>>>> 
>>>>https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>3af52f7aa8)
>>>> which supports custom data writer for Camus. But this patch is not
>>>>pulled
>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>
>>>> Regards,
>>>> Dibyendu
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>>Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>>an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>>Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>>an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

Andrew,

I'm about to dive into figuring out how to use Camus without Avro.  Perhaps we should join forces?  (Be warned thought! My java fu is low at the moment. :) ).

-Ao


On Aug 12, 2013, at 11:20 PM, Andrew Psaltis <An...@Webtrends.com> wrote:

> Kam,
> I am perfectly fine if you pick this up. After thinking about it for a
> while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
> more closely matches our use case, with the caveat of we do not use Avro.
> With that said, I will try and work on the back-port of custom data writer
> patch[1], however, I am not sure how quickly I will get this done as we
> are going to work towards upgrading our Kafka cluster.
> 
> Thanks,
> Andrew
> 
> [1] 
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
> 2f7aa8
> 
> 
> 
> 
> 
> On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:
> 
>> I would like to do this refactoring since I did a high level consumer a
>> while ago. 
>> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>> hoping to add to contribute.
>> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>> 0.8 to the bigtop distribution.
>> KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>> services and would ease some of the
>> startup/configuration issues that newbies have when getting started with
>> kafka. Ideally I would like to
>> fold a number of kafka/bin/* commands into the kafka service. Andrew
>> please let me know if would like to
>> pick this up instead. Thanks!
>> 
>> Kam
>> 
>> 
>> ________________________________
>> From: Jay Kreps <ja...@gmail.com>
>> To: Ken Goodhope <ke...@gmail.com>
>> Cc: Andrew Psaltis <ps...@gmail.com>;
>> dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
>> <ca...@googlegroups.com>; "aotto@wikimedia.org"
>> <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
>> <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>> "users@kafka.apache.org" <us...@kafka.apache.org>
>> Sent: Saturday, August 10, 2013 3:30 PM
>> Subject: Re: Kafka/Hadoop consumers and producers
>> 
>> 
>> So guys, just to throw my 2 cents in:
>> 
>> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>> package wasn't getting as much attention as it should.
>> 
>> 2. Andrew or anyone--if there is anyone using the contrib package who
>> would
>> be willing to volunteer to kind of adopt it that would be great. I am
>> happy
>> to help in whatever way I can. The practical issue is that most of the
>> committers are either using Camus or not using Hadoop at all so we just
>> haven't been doing a good job of documenting, bug fixing, and supporting
>> the contrib packages.
>> 
>> 3. Ken, if you could document how to use Camus that would likely make it a
>> lot more useful to people. I think most people would want a full-fledged
>> ETL solution and would likely prefer Camus, but very few people are using
>> Avro.
>> 
>> -Jay
>> 
>> 
>> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>> wrote:
>> 
>>> I just checked and that patch is in .8 branch.   Thanks for working on
>>> back porting it Andrew.  We'd be happy to commit that work to master.
>>> 
>>> As for the kafka contrib project vs Camus, they are similar but not
>>> quite
>>> identical.  Camus is intended to be a high throughput ETL for bulk
>>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>>> more of a simple KafkaInputFormat.  Neither can really replace the
>>> other.
>>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>>> data into that workflow, using Camus would be a gigantic overkill and a
>>> pain to setup.  On the flipside, if what you want is frequent reliable
>>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>>> with that.
>>> 
>>> I think it would be preferable to simplify the existing contrib
>>> Input/OutputFormats by refactoring them to use the more stable higher
>>> level
>>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>>> them easier to maintain, and user friendly enough to avoid the need for
>>> extensive documentation.
>>> 
>>> Ken
>>> 
>>> 
>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>>> <ps...@gmail.com>wrote:
>>> 
>>>> Dibyendu,
>>>> According to the pull request:
>>>> https://github.com/linkedin/camus/pull/15it was merged into the
>>>> camus-kafka-0.8
>>>> branch. I have not checked if the code was subsequently removed,
>>>> however,
>>>> two at least one the important files from this patch
>>>> (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>> a)
>>>> is still present.
>>>> 
>>>> Thanks,
>>>> Andrew
>>>> 
>>>> 
>>>> On Fri, Aug 9, 2013 at 9:39 AM,
>>>> <di...@pearson.com>wrote:
>>>> 
>>>>> Hi Ken,
>>>>> 
>>>>> I am also working on making the Camus fit for Non Avro message for our
>>>>> requirement.
>>>>> 
>>>>> I see you mentioned about this patch (
>>>>> 
>>>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>> 3af52f7aa8)
>>>>> which supports custom data writer for Camus. But this patch is not
>>>>> pulled
>>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>> 
>>>>> Regards,
>>>>> Dibyendu
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> camus_etl+unsubscribe@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>> 
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an
>>>> email to camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

Andrew,

I'm about to dive into figuring out how to use Camus without Avro.  Perhaps we should join forces?  (Be warned thought! My java fu is low at the moment. :) ).

-Ao


On Aug 12, 2013, at 11:20 PM, Andrew Psaltis <An...@Webtrends.com> wrote:

> Kam,
> I am perfectly fine if you pick this up. After thinking about it for a
> while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
> more closely matches our use case, with the caveat of we do not use Avro.
> With that said, I will try and work on the back-port of custom data writer
> patch[1], however, I am not sure how quickly I will get this done as we
> are going to work towards upgrading our Kafka cluster.
> 
> Thanks,
> Andrew
> 
> [1] 
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
> 2f7aa8
> 
> 
> 
> 
> 
> On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:
> 
>> I would like to do this refactoring since I did a high level consumer a
>> while ago. 
>> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>> hoping to add to contribute.
>> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>> 0.8 to the bigtop distribution.
>> KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>> services and would ease some of the
>> startup/configuration issues that newbies have when getting started with
>> kafka. Ideally I would like to
>> fold a number of kafka/bin/* commands into the kafka service. Andrew
>> please let me know if would like to
>> pick this up instead. Thanks!
>> 
>> Kam
>> 
>> 
>> ________________________________
>> From: Jay Kreps <ja...@gmail.com>
>> To: Ken Goodhope <ke...@gmail.com>
>> Cc: Andrew Psaltis <ps...@gmail.com>;
>> dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
>> <ca...@googlegroups.com>; "aotto@wikimedia.org"
>> <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
>> <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>> "users@kafka.apache.org" <us...@kafka.apache.org>
>> Sent: Saturday, August 10, 2013 3:30 PM
>> Subject: Re: Kafka/Hadoop consumers and producers
>> 
>> 
>> So guys, just to throw my 2 cents in:
>> 
>> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>> package wasn't getting as much attention as it should.
>> 
>> 2. Andrew or anyone--if there is anyone using the contrib package who
>> would
>> be willing to volunteer to kind of adopt it that would be great. I am
>> happy
>> to help in whatever way I can. The practical issue is that most of the
>> committers are either using Camus or not using Hadoop at all so we just
>> haven't been doing a good job of documenting, bug fixing, and supporting
>> the contrib packages.
>> 
>> 3. Ken, if you could document how to use Camus that would likely make it a
>> lot more useful to people. I think most people would want a full-fledged
>> ETL solution and would likely prefer Camus, but very few people are using
>> Avro.
>> 
>> -Jay
>> 
>> 
>> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>> wrote:
>> 
>>> I just checked and that patch is in .8 branch.   Thanks for working on
>>> back porting it Andrew.  We'd be happy to commit that work to master.
>>> 
>>> As for the kafka contrib project vs Camus, they are similar but not
>>> quite
>>> identical.  Camus is intended to be a high throughput ETL for bulk
>>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>>> more of a simple KafkaInputFormat.  Neither can really replace the
>>> other.
>>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>>> data into that workflow, using Camus would be a gigantic overkill and a
>>> pain to setup.  On the flipside, if what you want is frequent reliable
>>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>>> with that.
>>> 
>>> I think it would be preferable to simplify the existing contrib
>>> Input/OutputFormats by refactoring them to use the more stable higher
>>> level
>>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>>> them easier to maintain, and user friendly enough to avoid the need for
>>> extensive documentation.
>>> 
>>> Ken
>>> 
>>> 
>>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>>> <ps...@gmail.com>wrote:
>>> 
>>>> Dibyendu,
>>>> According to the pull request:
>>>> https://github.com/linkedin/camus/pull/15it was merged into the
>>>> camus-kafka-0.8
>>>> branch. I have not checked if the code was subsequently removed,
>>>> however,
>>>> two at least one the important files from this patch
>>>> (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>> a)
>>>> is still present.
>>>> 
>>>> Thanks,
>>>> Andrew
>>>> 
>>>> 
>>>> On Fri, Aug 9, 2013 at 9:39 AM,
>>>> <di...@pearson.com>wrote:
>>>> 
>>>>> Hi Ken,
>>>>> 
>>>>> I am also working on making the Camus fit for Non Avro message for our
>>>>> requirement.
>>>>> 
>>>>> I see you mentioned about this patch (
>>>>> 
>>>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>> 3af52f7aa8)
>>>>> which supports custom data writer for Camus. But this patch is not
>>>>> pulled
>>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>> 
>>>>> Regards,
>>>>> Dibyendu
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> camus_etl+unsubscribe@googlegroups.com.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>> 
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an
>>>> email to camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <An...@Webtrends.com>.

Kam,
I am perfectly fine if you pick this up. After thinking about it for a
while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
more closely matches our use case, with the caveat of we do not use Avro.
With that said, I will try and work on the back-port of custom data writer
patch[1], however, I am not sure how quickly I will get this done as we
are going to work towards upgrading our Kafka cluster.

Thanks,
Andrew

[1] 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
2f7aa8
 




On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:

>I would like to do this refactoring since I did a high level consumer a
>while ago. 
>A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>hoping to add to contribute.
>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>0.8 to the bigtop distribution.
>KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>services and would ease some of the
>startup/configuration issues that newbies have when getting started with
>kafka. Ideally I would like to
>fold a number of kafka/bin/* commands into the kafka service. Andrew
>please let me know if would like to
>pick this up instead. Thanks!
>
>Kam
>
>
>________________________________
> From: Jay Kreps <ja...@gmail.com>
>To: Ken Goodhope <ke...@gmail.com>
>Cc: Andrew Psaltis <ps...@gmail.com>;
>dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
><ca...@googlegroups.com>; "aotto@wikimedia.org"
><ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
><cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>"users@kafka.apache.org" <us...@kafka.apache.org>
>Sent: Saturday, August 10, 2013 3:30 PM
>Subject: Re: Kafka/Hadoop consumers and producers
> 
>
>So guys, just to throw my 2 cents in:
>
>1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>package wasn't getting as much attention as it should.
>
>2. Andrew or anyone--if there is anyone using the contrib package who
>would
>be willing to volunteer to kind of adopt it that would be great. I am
>happy
>to help in whatever way I can. The practical issue is that most of the
>committers are either using Camus or not using Hadoop at all so we just
>haven't been doing a good job of documenting, bug fixing, and supporting
>the contrib packages.
>
>3. Ken, if you could document how to use Camus that would likely make it a
>lot more useful to people. I think most people would want a full-fledged
>ETL solution and would likely prefer Camus, but very few people are using
>Avro.
>
>-Jay
>
>
>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>wrote:
>
>> I just checked and that patch is in .8 branch.   Thanks for working on
>> back porting it Andrew.  We'd be happy to commit that work to master.
>>
>> As for the kafka contrib project vs Camus, they are similar but not
>>quite
>> identical.  Camus is intended to be a high throughput ETL for bulk
>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>> more of a simple KafkaInputFormat.  Neither can really replace the
>>other.
>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>> data into that workflow, using Camus would be a gigantic overkill and a
>> pain to setup.  On the flipside, if what you want is frequent reliable
>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>> with that.
>>
>> I think it would be preferable to simplify the existing contrib
>> Input/OutputFormats by refactoring them to use the more stable higher
>>level
>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>> them easier to maintain, and user friendly enough to avoid the need for
>> extensive documentation.
>>
>> Ken
>>
>>
>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>><ps...@gmail.com>wrote:
>>
>>> Dibyendu,
>>> According to the pull request:
>>>https://github.com/linkedin/camus/pull/15it was merged into the
>>>camus-kafka-0.8
>>> branch. I have not checked if the code was subsequently removed,
>>>however,
>>> two at least one the important files from this patch
>>>(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>a)
>>> is still present.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>  On Fri, Aug 9, 2013 at 9:39 AM,
>>><di...@pearson.com>wrote:
>>>
>>>>  Hi Ken,
>>>>
>>>> I am also working on making the Camus fit for Non Avro message for our
>>>> requirement.
>>>>
>>>> I see you mentioned about this patch (
>>>> 
>>>>https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>3af52f7aa8)
>>>> which supports custom data writer for Camus. But this patch is not
>>>>pulled
>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>
>>>> Regards,
>>>> Dibyendu
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>>Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>>an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>>Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>>an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <An...@Webtrends.com>.

Kam,
I am perfectly fine if you pick this up. After thinking about it for a
while, we are going to upgrade to Kafka 0.8.0 and also use Camus as it
more closely matches our use case, with the caveat of we do not use Avro.
With that said, I will try and work on the back-port of custom data writer
patch[1], however, I am not sure how quickly I will get this done as we
are going to work towards upgrading our Kafka cluster.

Thanks,
Andrew

[1] 
https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af5
2f7aa8
 




On 8/12/13 6:16 PM, "Kam Kasravi" <ka...@yahoo.com> wrote:

>I would like to do this refactoring since I did a high level consumer a
>while ago. 
>A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also
>hoping to add to contribute.
>It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka
>0.8 to the bigtop distribution.
>KAFKA-949 basically allows kafka brokers to be started up using sysvinit
>services and would ease some of the
>startup/configuration issues that newbies have when getting started with
>kafka. Ideally I would like to
>fold a number of kafka/bin/* commands into the kafka service. Andrew
>please let me know if would like to
>pick this up instead. Thanks!
>
>Kam
>
>
>________________________________
> From: Jay Kreps <ja...@gmail.com>
>To: Ken Goodhope <ke...@gmail.com>
>Cc: Andrew Psaltis <ps...@gmail.com>;
>dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com"
><ca...@googlegroups.com>; "aotto@wikimedia.org"
><ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene
><cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>;
>"users@kafka.apache.org" <us...@kafka.apache.org>
>Sent: Saturday, August 10, 2013 3:30 PM
>Subject: Re: Kafka/Hadoop consumers and producers
> 
>
>So guys, just to throw my 2 cents in:
>
>1. We aren't deprecating anything. I just noticed that the Hadoop contrib
>package wasn't getting as much attention as it should.
>
>2. Andrew or anyone--if there is anyone using the contrib package who
>would
>be willing to volunteer to kind of adopt it that would be great. I am
>happy
>to help in whatever way I can. The practical issue is that most of the
>committers are either using Camus or not using Hadoop at all so we just
>haven't been doing a good job of documenting, bug fixing, and supporting
>the contrib packages.
>
>3. Ken, if you could document how to use Camus that would likely make it a
>lot more useful to people. I think most people would want a full-fledged
>ETL solution and would likely prefer Camus, but very few people are using
>Avro.
>
>-Jay
>
>
>On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com>
>wrote:
>
>> I just checked and that patch is in .8 branch.   Thanks for working on
>> back porting it Andrew.  We'd be happy to commit that work to master.
>>
>> As for the kafka contrib project vs Camus, they are similar but not
>>quite
>> identical.  Camus is intended to be a high throughput ETL for bulk
>> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
>> more of a simple KafkaInputFormat.  Neither can really replace the
>>other.
>> If you had a complex hadoop workflow and wanted to introduce some Kafka
>> data into that workflow, using Camus would be a gigantic overkill and a
>> pain to setup.  On the flipside, if what you want is frequent reliable
>> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
>> with that.
>>
>> I think it would be preferable to simplify the existing contrib
>> Input/OutputFormats by refactoring them to use the more stable higher
>>level
>> Kafka APIs.  Currently they use the lower level APIs.  This should make
>> them easier to maintain, and user friendly enough to avoid the need for
>> extensive documentation.
>>
>> Ken
>>
>>
>> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis
>><ps...@gmail.com>wrote:
>>
>>> Dibyendu,
>>> According to the pull request:
>>>https://github.com/linkedin/camus/pull/15it was merged into the
>>>camus-kafka-0.8
>>> branch. I have not checked if the code was subsequently removed,
>>>however,
>>> two at least one the important files from this patch
>>>(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.jav
>>>a)
>>> is still present.
>>>
>>> Thanks,
>>> Andrew
>>>
>>>
>>>  On Fri, Aug 9, 2013 at 9:39 AM,
>>><di...@pearson.com>wrote:
>>>
>>>>  Hi Ken,
>>>>
>>>> I am also working on making the Camus fit for Non Avro message for our
>>>> requirement.
>>>>
>>>> I see you mentioned about this patch (
>>>> 
>>>>https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f646
>>>>3af52f7aa8)
>>>> which supports custom data writer for Camus. But this patch is not
>>>>pulled
>>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>>
>>>> Regards,
>>>> Dibyendu
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> camus_etl+unsubscribe@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>>Groups
>>> "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>>an
>>> email to camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google
>>Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send
>>an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

>  may merge a bit of your work into bigtop-989 if that's ok with you. 
Merge away!  Happy to help. :)

> I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.

Yeah, there seems to be a constant struggle between the 'java way' of doing things, e.g. Maven downloading the internet, and the 'debian way', e.g. be paranoid about everything, make sure the build process is 100% repeatable.

Bigtop should definitely do whatever Bigtop thinks is best.  This Makefile technique works for us now, but probably will require a lot of manual maintenance as Kafka grows.








On Aug 13, 2013, at 6:03 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks - I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.
> I may merge a bit of your work into bigtop-989 if that's ok with you. I do know the bigtop folks 
> would like to see sbt support.
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Tuesday, August 13, 2013 1:03 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> > What installs all the kafka dependencies under /usr/share/java?
> 
> 
> The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.
> 
> Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.
> 
> The sbt build system has been replaced with Make:
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch
> 
> You should be able to build a .deb by checking out the debian branch and running:
> 
>   git-buildpackage -uc -us
> 
> -Ao
> 
> 
> 
> 
> 
> 
> On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > Thanks Andrew - I like the shell wrapper - very clean and simple. 
> > What installs all the kafka dependencies under /usr/share/java?
> > 
> > From: Andrew Otto <ot...@wikimedia.org>
> > To: Kam Kasravi <ka...@yahoo.com> 
> > Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Monday, August 12, 2013 7:00 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> > 
> > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> > 
> > Most relevant, Ken, is an init script for Kafka:
> >  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> > 
> > And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
> >  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> > 
> > I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> > 
> > 
> > On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> > 
> > > I would like to do this refactoring since I did a high level consumer a while ago. 
> > > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > > pick this up instead. Thanks!
> > > 
> > > Kam
> > > 
> > > From: Jay Kreps <ja...@gmail.com>
> > > To: Ken Goodhope <ke...@gmail.com> 
> > > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > > Sent: Saturday, August 10, 2013 3:30 PM
> > > Subject: Re: Kafka/Hadoop consumers and producers
> > > 
> > > So guys, just to throw my 2 cents in:
> > > 
> > > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > > package wasn't getting as much attention as it should.
> > > 
> > > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > > be willing to volunteer to kind of adopt it that would be great. I am happy
> > > to help in whatever way I can. The practical issue is that most of the
> > > committers are either using Camus or not using Hadoop at all so we just
> > > haven't been doing a good job of documenting, bug fixing, and supporting
> > > the contrib packages.
> > > 
> > > 3. Ken, if you could document how to use Camus that would likely make it a
> > > lot more useful to people. I think most people would want a full-fledged
> > > ETL solution and would likely prefer Camus, but very few people are using
> > > Avro.
> > > 
> > > -Jay
> > > 
> > > 
> > > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > > 
> > > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > > back porting it Andrew.  We'd be happy to commit that work to master.
> > > >
> > > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > > data into that workflow, using Camus would be a gigantic overkill and a
> > > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > > with that.
> > > >
> > > > I think it would be preferable to simplify the existing contrib
> > > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > > them easier to maintain, and user friendly enough to avoid the need for
> > > > extensive documentation.
> > > >
> > > > Ken
> > > >
> > > >
> > > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > > >
> > > >> Dibyendu,
> > > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > > >> branch. I have not checked if the code was subsequently removed, however,
> > > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > > >> is still present.
> > > >>
> > > >> Thanks,
> > > >> Andrew
> > > >>
> > > >>
> > > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > > >>
> > > >>>  Hi Ken,
> > > >>>
> > > >>> I am also working on making the Camus fit for Non Avro message for our
> > > >>> requirement.
> > > >>>
> > > >>> I see you mentioned about this patch (
> > > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > > >>> which supports custom data writer for Camus. But this patch is not pulled
> > > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > > >>>
> > > >>> Regards,
> > > >>> Dibyendu
> > > >>>
> > > >>> --
> > > >>> You received this message because you are subscribed to a topic in the
> > > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > > >>> To unsubscribe from this topic, visit
> > > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > > >>> To unsubscribe from this group and all its topics, send an email to
> > > >>> camus_etl+unsubscribe@googlegroups.com.
> > > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > > >>>
> > > >>
> > > >>  --
> > > >> You received this message because you are subscribed to the Google Groups
> > > >> "Camus - Kafka ETL for Hadoop" group.
> > > >> To unsubscribe from this group and stop receiving emails from it, send an
> > > >> email to camus_etl+unsubscribe@googlegroups.com.
> > > >> For more options, visit https://groups.google.com/groups/opt_out.
> > > >>
> > > >>
> > > >>
> > > >
> > > >  --
> > > > You received this message because you are subscribed to the Google Groups
> > > > "Camus - Kafka ETL for Hadoop" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send an
> > > > email to camus_etl+unsubscribe@googlegroups.com.
> > > > For more options, visit https://groups.google.com/groups/opt_out.
> > > >
> > > 
> > > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >  
> > >  
> > 
> > 
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

>  may merge a bit of your work into bigtop-989 if that's ok with you. 
Merge away!  Happy to help. :)

> I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.

Yeah, there seems to be a constant struggle between the 'java way' of doing things, e.g. Maven downloading the internet, and the 'debian way', e.g. be paranoid about everything, make sure the build process is 100% repeatable.

Bigtop should definitely do whatever Bigtop thinks is best.  This Makefile technique works for us now, but probably will require a lot of manual maintenance as Kafka grows.








On Aug 13, 2013, at 6:03 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks - I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.
> I may merge a bit of your work into bigtop-989 if that's ok with you. I do know the bigtop folks 
> would like to see sbt support.
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Tuesday, August 13, 2013 1:03 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> > What installs all the kafka dependencies under /usr/share/java?
> 
> 
> The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.
> 
> Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.
> 
> The sbt build system has been replaced with Make:
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch
> 
> You should be able to build a .deb by checking out the debian branch and running:
> 
>   git-buildpackage -uc -us
> 
> -Ao
> 
> 
> 
> 
> 
> 
> On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > Thanks Andrew - I like the shell wrapper - very clean and simple. 
> > What installs all the kafka dependencies under /usr/share/java?
> > 
> > From: Andrew Otto <ot...@wikimedia.org>
> > To: Kam Kasravi <ka...@yahoo.com> 
> > Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Monday, August 12, 2013 7:00 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> > 
> > https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> > 
> > Most relevant, Ken, is an init script for Kafka:
> >  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> > 
> > And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
> >  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> > 
> > I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> > 
> > 
> > On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> > 
> > > I would like to do this refactoring since I did a high level consumer a while ago. 
> > > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > > pick this up instead. Thanks!
> > > 
> > > Kam
> > > 
> > > From: Jay Kreps <ja...@gmail.com>
> > > To: Ken Goodhope <ke...@gmail.com> 
> > > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > > Sent: Saturday, August 10, 2013 3:30 PM
> > > Subject: Re: Kafka/Hadoop consumers and producers
> > > 
> > > So guys, just to throw my 2 cents in:
> > > 
> > > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > > package wasn't getting as much attention as it should.
> > > 
> > > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > > be willing to volunteer to kind of adopt it that would be great. I am happy
> > > to help in whatever way I can. The practical issue is that most of the
> > > committers are either using Camus or not using Hadoop at all so we just
> > > haven't been doing a good job of documenting, bug fixing, and supporting
> > > the contrib packages.
> > > 
> > > 3. Ken, if you could document how to use Camus that would likely make it a
> > > lot more useful to people. I think most people would want a full-fledged
> > > ETL solution and would likely prefer Camus, but very few people are using
> > > Avro.
> > > 
> > > -Jay
> > > 
> > > 
> > > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > > 
> > > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > > back porting it Andrew.  We'd be happy to commit that work to master.
> > > >
> > > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > > data into that workflow, using Camus would be a gigantic overkill and a
> > > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > > with that.
> > > >
> > > > I think it would be preferable to simplify the existing contrib
> > > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > > them easier to maintain, and user friendly enough to avoid the need for
> > > > extensive documentation.
> > > >
> > > > Ken
> > > >
> > > >
> > > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > > >
> > > >> Dibyendu,
> > > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > > >> branch. I have not checked if the code was subsequently removed, however,
> > > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > > >> is still present.
> > > >>
> > > >> Thanks,
> > > >> Andrew
> > > >>
> > > >>
> > > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > > >>
> > > >>>  Hi Ken,
> > > >>>
> > > >>> I am also working on making the Camus fit for Non Avro message for our
> > > >>> requirement.
> > > >>>
> > > >>> I see you mentioned about this patch (
> > > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > > >>> which supports custom data writer for Camus. But this patch is not pulled
> > > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > > >>>
> > > >>> Regards,
> > > >>> Dibyendu
> > > >>>
> > > >>> --
> > > >>> You received this message because you are subscribed to a topic in the
> > > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > > >>> To unsubscribe from this topic, visit
> > > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > > >>> To unsubscribe from this group and all its topics, send an email to
> > > >>> camus_etl+unsubscribe@googlegroups.com.
> > > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > > >>>
> > > >>
> > > >>  --
> > > >> You received this message because you are subscribed to the Google Groups
> > > >> "Camus - Kafka ETL for Hadoop" group.
> > > >> To unsubscribe from this group and stop receiving emails from it, send an
> > > >> email to camus_etl+unsubscribe@googlegroups.com.
> > > >> For more options, visit https://groups.google.com/groups/opt_out.
> > > >>
> > > >>
> > > >>
> > > >
> > > >  --
> > > > You received this message because you are subscribed to the Google Groups
> > > > "Camus - Kafka ETL for Hadoop" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send an
> > > > email to camus_etl+unsubscribe@googlegroups.com.
> > > > For more options, visit https://groups.google.com/groups/opt_out.
> > > >
> > > 
> > > 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >  
> > >  
> > 
> > 
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks - I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.
I may merge a bit of your work into bigtop-989 if that's ok with you. I do know the bigtop folks 
would like to see sbt support.


________________________________
 From: Andrew Otto <ot...@wikimedia.org>
To: Kam Kasravi <ka...@yahoo.com> 
Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Tuesday, August 13, 2013 1:03 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

> What installs all the kafka dependencies under /usr/share/java?


The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

The sbt build system has been replaced with Make:
https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch

You should be able to build a .deb by checking out the debian branch and running:

  git-buildpackage -uc -us

-Ao






On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks Andrew - I like the shell wrapper - very clean and simple. 
> What installs all the kafka dependencies under /usr/share/java?
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Monday, August 12, 2013 7:00 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> 
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> 
> Most relevant, Ken, is an init script for Kafka:
>  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> 
> And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
>  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> 
> I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> 
> 
> On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > I would like to do this refactoring since I did a high level consumer a while ago. 
> > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > pick this up instead. Thanks!
> > 
> > Kam
> > 
> > From: Jay Kreps <ja...@gmail.com>
> > To: Ken Goodhope <ke...@gmail.com> 
> > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Saturday, August 10, 2013 3:30 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > So guys, just to throw my 2 cents in:
> > 
> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > package wasn't getting as much attention as it should.
> > 
> > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > be willing to volunteer to kind of adopt it that would be great. I am happy
> > to help in whatever way I can. The practical issue is that most of the
> > committers are either using Camus or not using Hadoop at all so we just
> > haven't been doing a good job of documenting, bug fixing, and supporting
> > the contrib packages.
> > 
> > 3. Ken, if you could document how to use Camus that would likely make it a
> > lot more useful to people. I think most people would want a full-fledged
> > ETL solution and would likely prefer Camus, but very few people are using
> > Avro.
> > 
> > -Jay
> > 
> > 
> > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > 
> > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > back porting it Andrew.  We'd be happy to commit that work to master.
> > >
> > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > data into that workflow, using Camus would be a gigantic overkill and a
> > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > with that.
> > >
> > > I think it would be preferable to simplify the existing contrib
> > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > them easier to maintain, and user friendly enough to avoid the need for
> > > extensive documentation.
> > >
> > > Ken
> > >
> > >
> > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > >
> > >> Dibyendu,
> > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > >> branch. I have not checked if the code was subsequently removed, however,
> > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > >> is still present.
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>
> > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > >>
> > >>>  Hi Ken,
> > >>>
> > >>> I am also working on making the Camus fit for Non Avro message for our
> > >>> requirement.
> > >>>
> > >>> I see you mentioned about this patch (
> > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > >>> which supports custom data writer for Camus. But this patch is not pulled
> > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > >>>
> > >>> Regards,
> > >>> Dibyendu
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to a topic in the
> > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > >>> To unsubscribe from this topic, visit
> > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > >>> To unsubscribe from this group and all its topics, send an email to
> > >>> camus_etl+unsubscribe@googlegroups.com.
> > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > >>>
> > >>
> > >>  --
> > >> You received this message because you are subscribed to the Google Groups
> > >> "Camus - Kafka ETL for Hadoop" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send an
> > >> email to camus_etl+unsubscribe@googlegroups.com.
> > >> For more options, visit https://groups.google.com/groups/opt_out.
> > >>
> > >>
> > >>
> > >
> > >  --
> > > You received this message because you are subscribed to the Google Groups
> > > "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > > email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >  
> >  
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks - I'll ask on bigtop regarding the .deb requirement - it seems they don't abide by this.
I may merge a bit of your work into bigtop-989 if that's ok with you. I do know the bigtop folks 
would like to see sbt support.


________________________________
 From: Andrew Otto <ot...@wikimedia.org>
To: Kam Kasravi <ka...@yahoo.com> 
Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Tuesday, August 13, 2013 1:03 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

> What installs all the kafka dependencies under /usr/share/java?


The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

The sbt build system has been replaced with Make:
https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch

You should be able to build a .deb by checking out the debian branch and running:

  git-buildpackage -uc -us

-Ao






On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks Andrew - I like the shell wrapper - very clean and simple. 
> What installs all the kafka dependencies under /usr/share/java?
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Monday, August 12, 2013 7:00 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> 
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> 
> Most relevant, Ken, is an init script for Kafka:
>  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> 
> And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
>  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> 
> I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> 
> 
> On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > I would like to do this refactoring since I did a high level consumer a while ago. 
> > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > pick this up instead. Thanks!
> > 
> > Kam
> > 
> > From: Jay Kreps <ja...@gmail.com>
> > To: Ken Goodhope <ke...@gmail.com> 
> > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Saturday, August 10, 2013 3:30 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > So guys, just to throw my 2 cents in:
> > 
> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > package wasn't getting as much attention as it should.
> > 
> > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > be willing to volunteer to kind of adopt it that would be great. I am happy
> > to help in whatever way I can. The practical issue is that most of the
> > committers are either using Camus or not using Hadoop at all so we just
> > haven't been doing a good job of documenting, bug fixing, and supporting
> > the contrib packages.
> > 
> > 3. Ken, if you could document how to use Camus that would likely make it a
> > lot more useful to people. I think most people would want a full-fledged
> > ETL solution and would likely prefer Camus, but very few people are using
> > Avro.
> > 
> > -Jay
> > 
> > 
> > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > 
> > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > back porting it Andrew.  We'd be happy to commit that work to master.
> > >
> > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > data into that workflow, using Camus would be a gigantic overkill and a
> > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > with that.
> > >
> > > I think it would be preferable to simplify the existing contrib
> > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > them easier to maintain, and user friendly enough to avoid the need for
> > > extensive documentation.
> > >
> > > Ken
> > >
> > >
> > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > >
> > >> Dibyendu,
> > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > >> branch. I have not checked if the code was subsequently removed, however,
> > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > >> is still present.
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>
> > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > >>
> > >>>  Hi Ken,
> > >>>
> > >>> I am also working on making the Camus fit for Non Avro message for our
> > >>> requirement.
> > >>>
> > >>> I see you mentioned about this patch (
> > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > >>> which supports custom data writer for Camus. But this patch is not pulled
> > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > >>>
> > >>> Regards,
> > >>> Dibyendu
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to a topic in the
> > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > >>> To unsubscribe from this topic, visit
> > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > >>> To unsubscribe from this group and all its topics, send an email to
> > >>> camus_etl+unsubscribe@googlegroups.com.
> > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > >>>
> > >>
> > >>  --
> > >> You received this message because you are subscribed to the Google Groups
> > >> "Camus - Kafka ETL for Hadoop" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send an
> > >> email to camus_etl+unsubscribe@googlegroups.com.
> > >> For more options, visit https://groups.google.com/groups/opt_out.
> > >>
> > >>
> > >>
> > >
> > >  --
> > > You received this message because you are subscribed to the Google Groups
> > > "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > > email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >  
> >  
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

> What installs all the kafka dependencies under /usr/share/java?


The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

The sbt build system has been replaced with Make:
https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch

You should be able to build a .deb by checking out the debian branch and running:

  git-buildpackage -uc -us

-Ao






On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks Andrew - I like the shell wrapper - very clean and simple. 
> What installs all the kafka dependencies under /usr/share/java?
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Monday, August 12, 2013 7:00 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> 
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> 
> Most relevant, Ken, is an init script for Kafka:
>   https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> 
> And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
>   https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> 
> I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> 
> 
> On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > I would like to do this refactoring since I did a high level consumer a while ago. 
> > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > pick this up instead. Thanks!
> > 
> > Kam
> > 
> > From: Jay Kreps <ja...@gmail.com>
> > To: Ken Goodhope <ke...@gmail.com> 
> > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Saturday, August 10, 2013 3:30 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > So guys, just to throw my 2 cents in:
> > 
> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > package wasn't getting as much attention as it should.
> > 
> > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > be willing to volunteer to kind of adopt it that would be great. I am happy
> > to help in whatever way I can. The practical issue is that most of the
> > committers are either using Camus or not using Hadoop at all so we just
> > haven't been doing a good job of documenting, bug fixing, and supporting
> > the contrib packages.
> > 
> > 3. Ken, if you could document how to use Camus that would likely make it a
> > lot more useful to people. I think most people would want a full-fledged
> > ETL solution and would likely prefer Camus, but very few people are using
> > Avro.
> > 
> > -Jay
> > 
> > 
> > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > 
> > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > back porting it Andrew.  We'd be happy to commit that work to master.
> > >
> > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > data into that workflow, using Camus would be a gigantic overkill and a
> > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > with that.
> > >
> > > I think it would be preferable to simplify the existing contrib
> > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > them easier to maintain, and user friendly enough to avoid the need for
> > > extensive documentation.
> > >
> > > Ken
> > >
> > >
> > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > >
> > >> Dibyendu,
> > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > >> branch. I have not checked if the code was subsequently removed, however,
> > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > >> is still present.
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>
> > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > >>
> > >>>  Hi Ken,
> > >>>
> > >>> I am also working on making the Camus fit for Non Avro message for our
> > >>> requirement.
> > >>>
> > >>> I see you mentioned about this patch (
> > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > >>> which supports custom data writer for Camus. But this patch is not pulled
> > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > >>>
> > >>> Regards,
> > >>> Dibyendu
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to a topic in the
> > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > >>> To unsubscribe from this topic, visit
> > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > >>> To unsubscribe from this group and all its topics, send an email to
> > >>> camus_etl+unsubscribe@googlegroups.com.
> > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > >>>
> > >>
> > >>  --
> > >> You received this message because you are subscribed to the Google Groups
> > >> "Camus - Kafka ETL for Hadoop" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send an
> > >> email to camus_etl+unsubscribe@googlegroups.com.
> > >> For more options, visit https://groups.google.com/groups/opt_out.
> > >>
> > >>
> > >>
> > >
> > >  --
> > > You received this message because you are subscribed to the Google Groups
> > > "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > > email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >  
> >  
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

> What installs all the kafka dependencies under /usr/share/java?


The debian/ work was done mostly by another WMF staffer.  We tried and tried to make sbt behave with debian standards, most importantly the one that requires that .debs can be created without needing to connect to the internet, aside from official apt repositories.

Many of the /usr/share/java dependencies are handled by apt.  Any that aren't available in an official apt somewhere have been manually added to the ext/ directory.

The sbt build system has been replaced with Make:
https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/patches/our-own-build-system.patch

You should be able to build a .deb by checking out the debian branch and running:

  git-buildpackage -uc -us

-Ao






On Aug 13, 2013, at 1:34 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> Thanks Andrew - I like the shell wrapper - very clean and simple. 
> What installs all the kafka dependencies under /usr/share/java?
> 
> From: Andrew Otto <ot...@wikimedia.org>
> To: Kam Kasravi <ka...@yahoo.com> 
> Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Monday, August 12, 2013 7:00 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.
> 
> https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian
> 
> Most relevant, Ken, is an init script for Kafka:
>   https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init
> 
> And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
>   https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka
> 
> I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.
> 
> 
> On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:
> 
> > I would like to do this refactoring since I did a high level consumer a while ago. 
> > A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> > It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> > KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> > startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> > fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> > pick this up instead. Thanks!
> > 
> > Kam
> > 
> > From: Jay Kreps <ja...@gmail.com>
> > To: Ken Goodhope <ke...@gmail.com> 
> > Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> > Sent: Saturday, August 10, 2013 3:30 PM
> > Subject: Re: Kafka/Hadoop consumers and producers
> > 
> > So guys, just to throw my 2 cents in:
> > 
> > 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> > package wasn't getting as much attention as it should.
> > 
> > 2. Andrew or anyone--if there is anyone using the contrib package who would
> > be willing to volunteer to kind of adopt it that would be great. I am happy
> > to help in whatever way I can. The practical issue is that most of the
> > committers are either using Camus or not using Hadoop at all so we just
> > haven't been doing a good job of documenting, bug fixing, and supporting
> > the contrib packages.
> > 
> > 3. Ken, if you could document how to use Camus that would likely make it a
> > lot more useful to people. I think most people would want a full-fledged
> > ETL solution and would likely prefer Camus, but very few people are using
> > Avro.
> > 
> > -Jay
> > 
> > 
> > On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> > 
> > > I just checked and that patch is in .8 branch.  Thanks for working on
> > > back porting it Andrew.  We'd be happy to commit that work to master.
> > >
> > > As for the kafka contrib project vs Camus, they are similar but not quite
> > > identical.  Camus is intended to be a high throughput ETL for bulk
> > > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > > data into that workflow, using Camus would be a gigantic overkill and a
> > > pain to setup.  On the flipside, if what you want is frequent reliable
> > > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > > with that.
> > >
> > > I think it would be preferable to simplify the existing contrib
> > > Input/OutputFormats by refactoring them to use the more stable higher level
> > > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > > them easier to maintain, and user friendly enough to avoid the need for
> > > extensive documentation.
> > >
> > > Ken
> > >
> > >
> > > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> > >
> > >> Dibyendu,
> > >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> > >> branch. I have not checked if the code was subsequently removed, however,
> > >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> > >> is still present.
> > >>
> > >> Thanks,
> > >> Andrew
> > >>
> > >>
> > >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> > >>
> > >>>  Hi Ken,
> > >>>
> > >>> I am also working on making the Camus fit for Non Avro message for our
> > >>> requirement.
> > >>>
> > >>> I see you mentioned about this patch (
> > >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> > >>> which supports custom data writer for Camus. But this patch is not pulled
> > >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> > >>>
> > >>> Regards,
> > >>> Dibyendu
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to a topic in the
> > >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> > >>> To unsubscribe from this topic, visit
> > >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> > >>> To unsubscribe from this group and all its topics, send an email to
> > >>> camus_etl+unsubscribe@googlegroups.com.
> > >>> For more options, visit https://groups.google.com/groups/opt_out.
> > >>>
> > >>
> > >>  --
> > >> You received this message because you are subscribed to the Google Groups
> > >> "Camus - Kafka ETL for Hadoop" group.
> > >> To unsubscribe from this group and stop receiving emails from it, send an
> > >> email to camus_etl+unsubscribe@googlegroups.com.
> > >> For more options, visit https://groups.google.com/groups/opt_out.
> > >>
> > >>
> > >>
> > >
> > >  --
> > > You received this message because you are subscribed to the Google Groups
> > > "Camus - Kafka ETL for Hadoop" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > > email to camus_etl+unsubscribe@googlegroups.com.
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >  
> >  
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks Andrew - I like the shell wrapper - very clean and simple. 
What installs all the kafka dependencies under /usr/share/java?


________________________________
 From: Andrew Otto <ot...@wikimedia.org>
To: Kam Kasravi <ka...@yahoo.com> 
Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Monday, August 12, 2013 7:00 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.

https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian

Most relevant, Ken, is an init script for Kafka:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init

And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.


On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. 
> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> pick this up instead. Thanks!
> 
> Kam
> 
> From: Jay Kreps <ja...@gmail.com>
> To: Ken Goodhope <ke...@gmail.com> 
> Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Saturday, August 10, 2013 3:30 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> So guys, just to throw my 2 cents in:
> 
> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> package wasn't getting as much attention as it should.
> 
> 2. Andrew or anyone--if there is anyone using the contrib package who would
> be willing to volunteer to kind of adopt it that would be great. I am happy
> to help in whatever way I can. The practical issue is that most of the
> committers are either using Camus or not using Hadoop at all so we just
> haven't been doing a good job of documenting, bug fixing, and supporting
> the contrib packages.
> 
> 3. Ken, if you could document how to use Camus that would likely make it a
> lot more useful to people. I think most people would want a full-fledged
> ETL solution and would likely prefer Camus, but very few people are using
> Avro.
> 
> -Jay
> 
> 
> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> 
> > I just checked and that patch is in .8 branch.  Thanks for working on
> > back porting it Andrew.  We'd be happy to commit that work to master.
> >
> > As for the kafka contrib project vs Camus, they are similar but not quite
> > identical.  Camus is intended to be a high throughput ETL for bulk
> > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > data into that workflow, using Camus would be a gigantic overkill and a
> > pain to setup.  On the flipside, if what you want is frequent reliable
> > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > with that.
> >
> > I think it would be preferable to simplify the existing contrib
> > Input/OutputFormats by refactoring them to use the more stable higher level
> > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > them easier to maintain, and user friendly enough to avoid the need for
> > extensive documentation.
> >
> > Ken
> >
> >
> > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> >
> >> Dibyendu,
> >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> >> branch. I have not checked if the code was subsequently removed, however,
> >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> >> is still present.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> >>
> >>>  Hi Ken,
> >>>
> >>> I am also working on making the Camus fit for Non Avro message for our
> >>> requirement.
> >>>
> >>> I see you mentioned about this patch (
> >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> >>> which supports custom data writer for Camus. But this patch is not pulled
> >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> >>>
> >>> Regards,
> >>> Dibyendu
> >>>
> >>> --
> >>> You received this message because you are subscribed to a topic in the
> >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> >>> To unsubscribe from this topic, visit
> >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> >>> To unsubscribe from this group and all its topics, send an email to
> >>> camus_etl+unsubscribe@googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Camus - Kafka ETL for Hadoop" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to camus_etl+unsubscribe@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

Thanks Andrew - I like the shell wrapper - very clean and simple. 
What installs all the kafka dependencies under /usr/share/java?


________________________________
 From: Andrew Otto <ot...@wikimedia.org>
To: Kam Kasravi <ka...@yahoo.com> 
Cc: "dev@kafka.apache.org" <de...@kafka.apache.org>; Ken Goodhope <ke...@gmail.com>; Andrew Psaltis <ps...@gmail.com>; "dibyendu.bhattacharya@pearson.com" <di...@pearson.com>; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Monday, August 12, 2013 7:00 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.

https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian

Most relevant, Ken, is an init script for Kafka:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init

And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.


On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. 
> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> pick this up instead. Thanks!
> 
> Kam
> 
> From: Jay Kreps <ja...@gmail.com>
> To: Ken Goodhope <ke...@gmail.com> 
> Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Saturday, August 10, 2013 3:30 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> So guys, just to throw my 2 cents in:
> 
> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> package wasn't getting as much attention as it should.
> 
> 2. Andrew or anyone--if there is anyone using the contrib package who would
> be willing to volunteer to kind of adopt it that would be great. I am happy
> to help in whatever way I can. The practical issue is that most of the
> committers are either using Camus or not using Hadoop at all so we just
> haven't been doing a good job of documenting, bug fixing, and supporting
> the contrib packages.
> 
> 3. Ken, if you could document how to use Camus that would likely make it a
> lot more useful to people. I think most people would want a full-fledged
> ETL solution and would likely prefer Camus, but very few people are using
> Avro.
> 
> -Jay
> 
> 
> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> 
> > I just checked and that patch is in .8 branch.  Thanks for working on
> > back porting it Andrew.  We'd be happy to commit that work to master.
> >
> > As for the kafka contrib project vs Camus, they are similar but not quite
> > identical.  Camus is intended to be a high throughput ETL for bulk
> > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > data into that workflow, using Camus would be a gigantic overkill and a
> > pain to setup.  On the flipside, if what you want is frequent reliable
> > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > with that.
> >
> > I think it would be preferable to simplify the existing contrib
> > Input/OutputFormats by refactoring them to use the more stable higher level
> > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > them easier to maintain, and user friendly enough to avoid the need for
> > extensive documentation.
> >
> > Ken
> >
> >
> > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> >
> >> Dibyendu,
> >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> >> branch. I have not checked if the code was subsequently removed, however,
> >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> >> is still present.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> >>
> >>>  Hi Ken,
> >>>
> >>> I am also working on making the Camus fit for Non Avro message for our
> >>> requirement.
> >>>
> >>> I see you mentioned about this patch (
> >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> >>> which supports custom data writer for Camus. But this patch is not pulled
> >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> >>>
> >>> Regards,
> >>> Dibyendu
> >>>
> >>> --
> >>> You received this message because you are subscribed to a topic in the
> >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> >>> To unsubscribe from this topic, visit
> >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> >>> To unsubscribe from this group and all its topics, send an email to
> >>> camus_etl+unsubscribe@googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Camus - Kafka ETL for Hadoop" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to camus_etl+unsubscribe@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.

https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian

Most relevant, Ken, is an init script for Kafka:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init

And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.


On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. 
> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> pick this up instead. Thanks!
> 
> Kam
> 
> From: Jay Kreps <ja...@gmail.com>
> To: Ken Goodhope <ke...@gmail.com> 
> Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Saturday, August 10, 2013 3:30 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> So guys, just to throw my 2 cents in:
> 
> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> package wasn't getting as much attention as it should.
> 
> 2. Andrew or anyone--if there is anyone using the contrib package who would
> be willing to volunteer to kind of adopt it that would be great. I am happy
> to help in whatever way I can. The practical issue is that most of the
> committers are either using Camus or not using Hadoop at all so we just
> haven't been doing a good job of documenting, bug fixing, and supporting
> the contrib packages.
> 
> 3. Ken, if you could document how to use Camus that would likely make it a
> lot more useful to people. I think most people would want a full-fledged
> ETL solution and would likely prefer Camus, but very few people are using
> Avro.
> 
> -Jay
> 
> 
> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> 
> > I just checked and that patch is in .8 branch.  Thanks for working on
> > back porting it Andrew.  We'd be happy to commit that work to master.
> >
> > As for the kafka contrib project vs Camus, they are similar but not quite
> > identical.  Camus is intended to be a high throughput ETL for bulk
> > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > data into that workflow, using Camus would be a gigantic overkill and a
> > pain to setup.  On the flipside, if what you want is frequent reliable
> > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > with that.
> >
> > I think it would be preferable to simplify the existing contrib
> > Input/OutputFormats by refactoring them to use the more stable higher level
> > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > them easier to maintain, and user friendly enough to avoid the need for
> > extensive documentation.
> >
> > Ken
> >
> >
> > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> >
> >> Dibyendu,
> >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> >> branch. I have not checked if the code was subsequently removed, however,
> >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> >> is still present.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> >>
> >>>  Hi Ken,
> >>>
> >>> I am also working on making the Camus fit for Non Avro message for our
> >>> requirement.
> >>>
> >>> I see you mentioned about this patch (
> >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> >>> which supports custom data writer for Camus. But this patch is not pulled
> >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> >>>
> >>> Regards,
> >>> Dibyendu
> >>>
> >>> --
> >>> You received this message because you are subscribed to a topic in the
> >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> >>> To unsubscribe from this topic, visit
> >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> >>> To unsubscribe from this group and all its topics, send an email to
> >>> camus_etl+unsubscribe@googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Camus - Kafka ETL for Hadoop" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to camus_etl+unsubscribe@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Otto <ot...@wikimedia.org>.

We've done a bit of work over at Wikimedia to debianize Kafka and make it behave like a regular service.

https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian

Most relevant, Ken, is an init script for Kafka:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/kafka.init

And a bin/kafka shell wrapper for the kafka/bin/*.sh scripts:
  https://github.com/wikimedia/operations-debs-kafka/blob/debian/debian/bin/kafka

I'm about to add an init script for MirrorMaker as well, so mirroring can be demonized and run as a service.


On Aug 12, 2013, at 8:16 PM, Kam Kasravi <ka...@yahoo.com> wrote:

> I would like to do this refactoring since I did a high level consumer a while ago. 
> A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
> It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
> KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
> startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
> fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
> pick this up instead. Thanks!
> 
> Kam
> 
> From: Jay Kreps <ja...@gmail.com>
> To: Ken Goodhope <ke...@gmail.com> 
> Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
> Sent: Saturday, August 10, 2013 3:30 PM
> Subject: Re: Kafka/Hadoop consumers and producers
> 
> So guys, just to throw my 2 cents in:
> 
> 1. We aren't deprecating anything. I just noticed that the Hadoop contrib
> package wasn't getting as much attention as it should.
> 
> 2. Andrew or anyone--if there is anyone using the contrib package who would
> be willing to volunteer to kind of adopt it that would be great. I am happy
> to help in whatever way I can. The practical issue is that most of the
> committers are either using Camus or not using Hadoop at all so we just
> haven't been doing a good job of documenting, bug fixing, and supporting
> the contrib packages.
> 
> 3. Ken, if you could document how to use Camus that would likely make it a
> lot more useful to people. I think most people would want a full-fledged
> ETL solution and would likely prefer Camus, but very few people are using
> Avro.
> 
> -Jay
> 
> 
> On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:
> 
> > I just checked and that patch is in .8 branch.  Thanks for working on
> > back porting it Andrew.  We'd be happy to commit that work to master.
> >
> > As for the kafka contrib project vs Camus, they are similar but not quite
> > identical.  Camus is intended to be a high throughput ETL for bulk
> > ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> > more of a simple KafkaInputFormat.  Neither can really replace the other.
> > If you had a complex hadoop workflow and wanted to introduce some Kafka
> > data into that workflow, using Camus would be a gigantic overkill and a
> > pain to setup.  On the flipside, if what you want is frequent reliable
> > ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> > with that.
> >
> > I think it would be preferable to simplify the existing contrib
> > Input/OutputFormats by refactoring them to use the more stable higher level
> > Kafka APIs.  Currently they use the lower level APIs.  This should make
> > them easier to maintain, and user friendly enough to avoid the need for
> > extensive documentation.
> >
> > Ken
> >
> >
> > On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
> >
> >> Dibyendu,
> >> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> >> branch. I have not checked if the code was subsequently removed, however,
> >> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> >> is still present.
> >>
> >> Thanks,
> >> Andrew
> >>
> >>
> >>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
> >>
> >>>  Hi Ken,
> >>>
> >>> I am also working on making the Camus fit for Non Avro message for our
> >>> requirement.
> >>>
> >>> I see you mentioned about this patch (
> >>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> >>> which supports custom data writer for Camus. But this patch is not pulled
> >>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
> >>>
> >>> Regards,
> >>> Dibyendu
> >>>
> >>> --
> >>> You received this message because you are subscribed to a topic in the
> >>> Google Groups "Camus - Kafka ETL for Hadoop" group.
> >>> To unsubscribe from this topic, visit
> >>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> >>> To unsubscribe from this group and all its topics, send an email to
> >>> camus_etl+unsubscribe@googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Camus - Kafka ETL for Hadoop" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to camus_etl+unsubscribe@googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Camus - Kafka ETL for Hadoop" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to camus_etl+unsubscribe@googlegroups.com.
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

I would like to do this refactoring since I did a high level consumer a while ago. 
A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
pick this up instead. Thanks!

Kam


________________________________
 From: Jay Kreps <ja...@gmail.com>
To: Ken Goodhope <ke...@gmail.com> 
Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Saturday, August 10, 2013 3:30 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

So guys, just to throw my 2 cents in:

1. We aren't deprecating anything. I just noticed that the Hadoop contrib
package wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who would
be willing to volunteer to kind of adopt it that would be great. I am happy
to help in whatever way I can. The practical issue is that most of the
committers are either using Camus or not using Hadoop at all so we just
haven't been doing a good job of documenting, bug fixing, and supporting
the contrib packages.

3. Ken, if you could document how to use Camus that would likely make it a
lot more useful to people. I think most people would want a full-fledged
ETL solution and would likely prefer Camus, but very few people are using
Avro.

-Jay


On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:

> I just checked and that patch is in .8 branch.   Thanks for working on
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite
> identical.  Camus is intended to be a high throughput ETL for bulk
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> more of a simple KafkaInputFormat.  Neither can really replace the other.
> If you had a complex hadoop workflow and wanted to introduce some Kafka
> data into that workflow, using Camus would be a gigantic overkill and a
> pain to setup.  On the flipside, if what you want is frequent reliable
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> with that.
>
> I think it would be preferable to simplify the existing contrib
> Input/OutputFormats by refactoring them to use the more stable higher level
> Kafka APIs.  Currently they use the lower level APIs.  This should make
> them easier to maintain, and user friendly enough to avoid the need for
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
>> branch. I have not checked if the code was subsequently removed, however,
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>>> which supports custom data writer for Camus. But this patch is not pulled
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by Kam Kasravi <ka...@yahoo.com>.

I would like to do this refactoring since I did a high level consumer a while ago. 
A few weeks ago I had opened KAFKA-949 Kafka on Yarn which I was also hoping to add to contribute.
It's almost done. KAFKA-949 is paired with BIGTOP-989 which adds kafka 0.8 to the bigtop distribution.
KAFKA-949 basically allows kafka brokers to be started up using sysvinit services and would ease some of the 
startup/configuration issues that newbies have when getting started with kafka. Ideally I would like to 
fold a number of kafka/bin/* commands into the kafka service. Andrew please let me know if would like to 
pick this up instead. Thanks!

Kam


________________________________
 From: Jay Kreps <ja...@gmail.com>
To: Ken Goodhope <ke...@gmail.com> 
Cc: Andrew Psaltis <ps...@gmail.com>; dibyendu.bhattacharya@pearson.com; "camus_etl@googlegroups.com" <ca...@googlegroups.com>; "aotto@wikimedia.org" <ao...@wikimedia.org>; Felix GV <fe...@mate1inc.com>; Cosmin Lehene <cl...@adobe.com>; "dev@kafka.apache.org" <de...@kafka.apache.org>; "users@kafka.apache.org" <us...@kafka.apache.org> 
Sent: Saturday, August 10, 2013 3:30 PM
Subject: Re: Kafka/Hadoop consumers and producers
 

So guys, just to throw my 2 cents in:

1. We aren't deprecating anything. I just noticed that the Hadoop contrib
package wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who would
be willing to volunteer to kind of adopt it that would be great. I am happy
to help in whatever way I can. The practical issue is that most of the
committers are either using Camus or not using Hadoop at all so we just
haven't been doing a good job of documenting, bug fixing, and supporting
the contrib packages.

3. Ken, if you could document how to use Camus that would likely make it a
lot more useful to people. I think most people would want a full-fledged
ETL solution and would likely prefer Camus, but very few people are using
Avro.

-Jay


On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:

> I just checked and that patch is in .8 branch.   Thanks for working on
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite
> identical.  Camus is intended to be a high throughput ETL for bulk
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> more of a simple KafkaInputFormat.  Neither can really replace the other.
> If you had a complex hadoop workflow and wanted to introduce some Kafka
> data into that workflow, using Camus would be a gigantic overkill and a
> pain to setup.  On the flipside, if what you want is frequent reliable
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> with that.
>
> I think it would be preferable to simplify the existing contrib
> Input/OutputFormats by refactoring them to use the more stable higher level
> Kafka APIs.  Currently they use the lower level APIs.  This should make
> them easier to maintain, and user friendly enough to avoid the need for
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
>> branch. I have not checked if the code was subsequently removed, however,
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>>> which supports custom data writer for Camus. But this patch is not pulled
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by Jay Kreps <ja...@gmail.com>.

So guys, just to throw my 2 cents in:

1. We aren't deprecating anything. I just noticed that the Hadoop contrib
package wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who would
be willing to volunteer to kind of adopt it that would be great. I am happy
to help in whatever way I can. The practical issue is that most of the
committers are either using Camus or not using Hadoop at all so we just
haven't been doing a good job of documenting, bug fixing, and supporting
the contrib packages.

3. Ken, if you could document how to use Camus that would likely make it a
lot more useful to people. I think most people would want a full-fledged
ETL solution and would likely prefer Camus, but very few people are using
Avro.

-Jay


On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:

> I just checked and that patch is in .8 branch.   Thanks for working on
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite
> identical.  Camus is intended to be a high throughput ETL for bulk
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> more of a simple KafkaInputFormat.  Neither can really replace the other.
> If you had a complex hadoop workflow and wanted to introduce some Kafka
> data into that workflow, using Camus would be a gigantic overkill and a
> pain to setup.  On the flipside, if what you want is frequent reliable
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> with that.
>
> I think it would be preferable to simplify the existing contrib
> Input/OutputFormats by refactoring them to use the more stable higher level
> Kafka APIs.  Currently they use the lower level APIs.  This should make
> them easier to maintain, and user friendly enough to avoid the need for
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
>> branch. I have not checked if the code was subsequently removed, however,
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>>> which supports custom data writer for Camus. But this patch is not pulled
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by Abhi Basu <90...@gmail.com>.

I agree with you. We are looking for a simple solution for data from Kafka 
to Hadoop. I have tried using Camus earlier (Non-Avro) and documentation is 
lacking to make it work correctly, as we do not need to introduce another 
component to the solution. In the meantime, can the Kafka Hadoop 
Consumer/Producer be documented well so we can try it out ASAP. :)  Thanks.

On Friday, August 9, 2013 12:27:12 PM UTC-7, Ken Goodhope wrote:
>
> I just checked and that patch is in .8 branch.   Thanks for working on 
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite 
> identical.  Camus is intended to be a high throughput ETL for bulk 
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is 
> more of a simple KafkaInputFormat.  Neither can really replace the other.  
> If you had a complex hadoop workflow and wanted to introduce some Kafka 
> data into that workflow, using Camus would be a gigantic overkill and a 
> pain to setup.  On the flipside, if what you want is frequent reliable 
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you 
> with that.
>
> I think it would be preferable to simplify the existing contrib 
> Input/OutputFormats by refactoring them to use the more stable higher level 
> Kafka APIs.  Currently they use the lower level APIs.  This should make 
> them easier to maintain, and user friendly enough to avoid the need for 
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <psaltis...@gmail.com<javascript:>
> > wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fpull%2F15&sa=D&sntz=1&usg=AFQjCNENlPRS_I-7w_drkTC09rmQKGNNVg>it was merged into the camus-kafka-0.8 
>> branch. I have not checked if the code was subsequently removed, however, 
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java) 
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <dibyendu.b...@pearson.com <javascript:>
>> > wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our 
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8<https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Flinkedin%2Fcamus%2Fcommit%2F87917a2aea46da9d21c8f67129f6463af52f7aa8&sa=D&sntz=1&usg=AFQjCNGxLfUhDjxOiEp-zHUb14dlNYwriw>) 
>>> which supports custom data writer for Camus. But this patch is not pulled 
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> camus_etl+...@googlegroups.com <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to camus_etl+...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

Re: Kafka/Hadoop consumers and producers

Posted by Jay Kreps <ja...@gmail.com>.

So guys, just to throw my 2 cents in:

1. We aren't deprecating anything. I just noticed that the Hadoop contrib
package wasn't getting as much attention as it should.

2. Andrew or anyone--if there is anyone using the contrib package who would
be willing to volunteer to kind of adopt it that would be great. I am happy
to help in whatever way I can. The practical issue is that most of the
committers are either using Camus or not using Hadoop at all so we just
haven't been doing a good job of documenting, bug fixing, and supporting
the contrib packages.

3. Ken, if you could document how to use Camus that would likely make it a
lot more useful to people. I think most people would want a full-fledged
ETL solution and would likely prefer Camus, but very few people are using
Avro.

-Jay


On Fri, Aug 9, 2013 at 12:27 PM, Ken Goodhope <ke...@gmail.com> wrote:

> I just checked and that patch is in .8 branch.   Thanks for working on
> back porting it Andrew.  We'd be happy to commit that work to master.
>
> As for the kafka contrib project vs Camus, they are similar but not quite
> identical.  Camus is intended to be a high throughput ETL for bulk
> ingestion of Kafka data into HDFS.  Where as what we have in contrib is
> more of a simple KafkaInputFormat.  Neither can really replace the other.
> If you had a complex hadoop workflow and wanted to introduce some Kafka
> data into that workflow, using Camus would be a gigantic overkill and a
> pain to setup.  On the flipside, if what you want is frequent reliable
> ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
> with that.
>
> I think it would be preferable to simplify the existing contrib
> Input/OutputFormats by refactoring them to use the more stable higher level
> Kafka APIs.  Currently they use the lower level APIs.  This should make
> them easier to maintain, and user friendly enough to avoid the need for
> extensive documentation.
>
> Ken
>
>
> On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:
>
>> Dibyendu,
>> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
>> branch. I have not checked if the code was subsequently removed, however,
>> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
>> is still present.
>>
>> Thanks,
>> Andrew
>>
>>
>>  On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com>wrote:
>>
>>>  Hi Ken,
>>>
>>> I am also working on making the Camus fit for Non Avro message for our
>>> requirement.
>>>
>>> I see you mentioned about this patch (
>>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>>> which supports custom data writer for Camus. But this patch is not pulled
>>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>>
>>> Regards,
>>> Dibyendu
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> camus_etl+unsubscribe@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by Ken Goodhope <ke...@gmail.com>.

I just checked and that patch is in .8 branch.   Thanks for working on back
porting it Andrew.  We'd be happy to commit that work to master.

As for the kafka contrib project vs Camus, they are similar but not quite
identical.  Camus is intended to be a high throughput ETL for bulk
ingestion of Kafka data into HDFS.  Where as what we have in contrib is
more of a simple KafkaInputFormat.  Neither can really replace the other.
If you had a complex hadoop workflow and wanted to introduce some Kafka
data into that workflow, using Camus would be a gigantic overkill and a
pain to setup.  On the flipside, if what you want is frequent reliable
ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
with that.

I think it would be preferable to simplify the existing contrib
Input/OutputFormats by refactoring them to use the more stable higher level
Kafka APIs.  Currently they use the lower level APIs.  This should make
them easier to maintain, and user friendly enough to avoid the need for
extensive documentation.

Ken

On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:

> Dibyendu,
> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> branch. I have not checked if the code was subsequently removed, however,
> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> is still present.
>
> Thanks,
> Andrew
>
>
> On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com> wrote:
>
>> Hi Ken,
>>
>> I am also working on making the Camus fit for Non Avro message for our
>> requirement.
>>
>> I see you mentioned about this patch (
>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>> which supports custom data writer for Camus. But this patch is not pulled
>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>
>> Regards,
>> Dibyendu
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Re: Kafka/Hadoop consumers and producers

Posted by Ken Goodhope <ke...@gmail.com>.

I just checked and that patch is in .8 branch.   Thanks for working on back
porting it Andrew.  We'd be happy to commit that work to master.

As for the kafka contrib project vs Camus, they are similar but not quite
identical.  Camus is intended to be a high throughput ETL for bulk
ingestion of Kafka data into HDFS.  Where as what we have in contrib is
more of a simple KafkaInputFormat.  Neither can really replace the other.
If you had a complex hadoop workflow and wanted to introduce some Kafka
data into that workflow, using Camus would be a gigantic overkill and a
pain to setup.  On the flipside, if what you want is frequent reliable
ingest of Kafka data into HDFS, a simple InputFormat doesn't provide you
with that.

I think it would be preferable to simplify the existing contrib
Input/OutputFormats by refactoring them to use the more stable higher level
Kafka APIs.  Currently they use the lower level APIs.  This should make
them easier to maintain, and user friendly enough to avoid the need for
extensive documentation.

Ken

On Fri, Aug 9, 2013 at 8:52 AM, Andrew Psaltis <ps...@gmail.com>wrote:

> Dibyendu,
> According to the pull request: https://github.com/linkedin/camus/pull/15it was merged into the camus-kafka-0.8
> branch. I have not checked if the code was subsequently removed, however,
> two at least one the important files from this patch (camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
> is still present.
>
> Thanks,
> Andrew
>
>
> On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com> wrote:
>
>> Hi Ken,
>>
>> I am also working on making the Camus fit for Non Avro message for our
>> requirement.
>>
>> I see you mentioned about this patch (
>> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
>> which supports custom data writer for Camus. But this patch is not pulled
>> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>>
>> Regards,
>> Dibyendu
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Camus - Kafka ETL for Hadoop" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> camus_etl+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <ps...@gmail.com>.

Dibyendu,
According to the pull request: https://github.com/linkedin/camus/pull/15 it
was merged into the camus-kafka-0.8 branch. I have not checked if the code
was subsequently removed, however, two at least one the important files
from this patch
(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
is still present.

Thanks,
Andrew

On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com> wrote:

> Hi Ken,
>
> I am also working on making the Camus fit for Non Avro message for our
> requirement.
>
> I see you mentioned about this patch (
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> which supports custom data writer for Camus. But this patch is not pulled
> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>
> Regards,
> Dibyendu
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by Andrew Psaltis <ps...@gmail.com>.

Dibyendu,
According to the pull request: https://github.com/linkedin/camus/pull/15 it
was merged into the camus-kafka-0.8 branch. I have not checked if the code
was subsequently removed, however, two at least one the important files
from this patch
(camus-api/src/main/java/com/linkedin/camus/etl/RecordWriterProvider.java)
is still present.

Thanks,
Andrew

On Fri, Aug 9, 2013 at 9:39 AM, <di...@pearson.com> wrote:

> Hi Ken,
>
> I am also working on making the Camus fit for Non Avro message for our
> requirement.
>
> I see you mentioned about this patch (
> https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8)
> which supports custom data writer for Camus. But this patch is not pulled
> into camus-kafka-0.8 branch. Is there any plan for doing the same ?
>
> Regards,
> Dibyendu
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/camus_etl/KKS6t5-O-Ng/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Kafka/Hadoop consumers and producers

Posted by di...@pearson.com.

Hi Ken, 

I am also working on making the Camus fit for Non Avro message for our requirement. 

I see you mentioned about this patch (https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8) which supports custom data writer for Camus. But this patch is not pulled into camus-kafka-0.8 branch. Is there any plan for doing the same ?

Regards, 
Dibyendu

Re: Kafka/Hadoop consumers and producers

Posted by di...@pearson.com.

Hi Ken, 

I am also working on making the Camus fit for Non Avro message for our requirement. 

I see you mentioned about this patch (https://github.com/linkedin/camus/commit/87917a2aea46da9d21c8f67129f6463af52f7aa8) which supports custom data writer for Camus. But this patch is not pulled into camus-kafka-0.8 branch. Is there any plan for doing the same ?

Regards, 
Dibyendu

Re: Kafka/Hadoop consumers and producers

Posted by ps...@gmail.com.

We also have a need today to ETL from Kafka into Hadoop and we do not currently nor have any plans to use Avro. 

So is the official direction based on this discussion to ditch the Kafka contrib code and direct people to use Camus without Avro as Ken described or are both solutions going to survive? 

I can put time into the contrib code and/or work on documenting the tutorial on how to make Camus work without Avro. 

Which is the preferred route, for the long term?

Thanks,
Andrew

On Wednesday, August 7, 2013 10:50:53 PM UTC-6, Ken Goodhope wrote:
> Hi Andrew,
> 
> 
> 
> Camus can be made to work without avro. You will need to implement a message decoder and and a data writer.   We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. 
> 
> 
> 
> -Ken
> 
> 
> 
> On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:
> 
> 
> 
> > Hi all,
> 
> > 
> 
> > Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm not sure if we are going to.  I came across this post.
> 
> > 
> 
> > If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?
> 
> > 
> 
> > Thanks!
> 
> > -Andrew
> 
> > 
> 
> > -- 
> 
> > You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> 
> > To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> 
> > For more options, visit https://groups.google.com/groups/opt_out.
> 
> > 
> 
> >

Re: Kafka/Hadoop consumers and producers

Posted by Ken <ke...@gmail.com>.

Hi Andrew,

Camus can be made to work without avro. You will need to implement a message decoder and and a data writer.   We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. 

-Ken

On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:

> Hi all,
> 
> Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm not sure if we are going to.  I came across this post.
> 
> If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?
> 
> Thanks!
> -Andrew
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
>

Re: Kafka/Hadoop consumers and producers

Posted by Ken <ke...@gmail.com>.

Hi Andrew,

Camus can be made to work without avro. You will need to implement a message decoder and and a data writer.   We need to add a better tutorial on how to do this, but it isn't that difficult. If you decide to go down this path, you can always ask questions on this list. I try to make sure each email gets answered. But it can take me a day or two. 

-Ken

On Aug 7, 2013, at 9:33 AM, aotto@wikimedia.org wrote:

> Hi all,
> 
> Over at the Wikimedia Foundation, we're trying to figure out the best way to do our ETL from Kafka into Hadoop.  We don't currently use Avro and I'm not sure if we are going to.  I came across this post.
> 
> If the plan is to remove the hadoop-consumer from Kafka contrib, do you think we should not consider it as one of our viable options?
> 
> Thanks!
> -Andrew
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
>