You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Raimon Bosch <ra...@gmail.com> on 2011/11/07 15:38:22 UTC

hadoop-consumer never finishing

Hi,

I have just compiled kafka from https://github.com/kafka-dev/kafka and
executed the DataGenerator:

./run-class.sh kafka.etl.impl.DataGenerator test/test.properties

After that I have executed the hadoop consumer:

./run-class.sh kafka.etl.impl.SimpleKafkaETLJob test/test.properties


The hadoop-consumer is generating a file on the specified output but it is
never finishing, even if I try to generate only 1 event
at test/test.properties. So this file is growing and growing, my guessing
is that maybe it is reading always the offset 0?

That is my test.properties:

# name of test topic
kafka.etl.topic=SimpleTestEvent5

# hdfs location of jars
hdfs.default.classpath.dir=/tmp/kafka/lib

# number of test events to be generated
event.count=1

# hadoop id and group
hadoop.job.ugi=kafka,hadoop

# kafka server uri
kafka.server.uri=tcp://localhost:9092

# hdfs location of input directory
input=/tmp/kafka/data

# hdfs location of output directory
output=/tmp/kafka/output

# limit the number of events to be fetched;
# value -1 means no limitation
kafka.request.limit=-1

# kafka parameters
client.buffer.size=1048576
client.so.timeout=60000


Any ideas where can I have the problem?

Re: hadoop-consumer never finishing

Posted by Raimon Bosch <ra...@gmail.com>.

Thanks!

Using the trunk:
svn co http://svn.apache.org/repos/asf/incubator/kafka/trunk kafka

you don't have this problem.

2011/11/7 Felix GV <fe...@mate1inc.com>

> I think I've had the same bug. It's a known issue that is fixed in the
> trunk.
>
> You should check out Kafka from the (Apache) trunk and use the hadoop
> consumer provided there in the contrib directory. If I'm not mistaken, that
> version is more up to date than the one you mentioned on github...
>
> --
> Felix
>
> On Monday, November 7, 2011, Raimon Bosch <ra...@gmail.com> wrote:
> > Problem solved! It was a configuration issue.
> >
> > Trying with:
> > event.count=1000
> > kafka.request.limit=1000
> >
> > The mapper has stopped and it has generated a file with 1000 events. But
> If
> > we use kafka.request.limit=-1 is sending the same events over and over
> > again that's why my hadoop-consumer couldn't stop.
> >
> > 2011/11/7 Raimon Bosch <ra...@gmail.com>
> >
> >>
> >> Hi,
> >>
> >> I have just compiled kafka from https://github.com/kafka-dev/kafka and
> >> executed the DataGenerator:
> >>
> >> ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties
> >>
> >> After that I have executed the hadoop consumer:
> >>
> >> ./run-class.sh kafka.etl.impl.SimpleKafkaETLJob test/test.properties
> >>
> >>
> >> The hadoop-consumer is generating a file on the specified output but it
> is
> >> never finishing, even if I try to generate only 1 event
> >> at test/test.properties. So this file is growing and growing, my
> guessing
> >> is that maybe it is reading always the offset 0?
> >>
> >> That is my test.properties:
> >>
> >> # name of test topic
> >> kafka.etl.topic=SimpleTestEvent5
> >>
> >> # hdfs location of jars
> >> hdfs.default.classpath.dir=/tmp/kafka/lib
> >>
> >> # number of test events to be generated
> >> event.count=1
> >>
> >> # hadoop id and group
> >> hadoop.job.ugi=kafka,hadoop
> >>
> >> # kafka server uri
> >> kafka.server.uri=tcp://localhost:9092
> >>
> >> # hdfs location of input directory
> >> input=/tmp/kafka/data
> >>
> >> # hdfs location of output directory
> >> output=/tmp/kafka/output
> >>
> >> # limit the number of events to be fetched;
> >> # value -1 means no limitation
> >> kafka.request.limit=-1
> >>
> >> # kafka parameters
> >> client.buffer.size=1048576
> >> client.so.timeout=60000
> >>
> >>
> >> Any ideas where can I have the problem?
> >>
> >
>
> --
> --
> Felix
>

Re: hadoop-consumer never finishing

Posted by Felix GV <fe...@mate1inc.com>.

I think I've had the same bug. It's a known issue that is fixed in the
trunk.

You should check out Kafka from the (Apache) trunk and use the hadoop
consumer provided there in the contrib directory. If I'm not mistaken, that
version is more up to date than the one you mentioned on github...

--
Felix

On Monday, November 7, 2011, Raimon Bosch <ra...@gmail.com> wrote:
> Problem solved! It was a configuration issue.
>
> Trying with:
> event.count=1000
> kafka.request.limit=1000
>
> The mapper has stopped and it has generated a file with 1000 events. But
If
> we use kafka.request.limit=-1 is sending the same events over and over
> again that's why my hadoop-consumer couldn't stop.
>
> 2011/11/7 Raimon Bosch <ra...@gmail.com>
>
>>
>> Hi,
>>
>> I have just compiled kafka from https://github.com/kafka-dev/kafka and
>> executed the DataGenerator:
>>
>> ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties
>>
>> After that I have executed the hadoop consumer:
>>
>> ./run-class.sh kafka.etl.impl.SimpleKafkaETLJob test/test.properties
>>
>>
>> The hadoop-consumer is generating a file on the specified output but it
is
>> never finishing, even if I try to generate only 1 event
>> at test/test.properties. So this file is growing and growing, my guessing
>> is that maybe it is reading always the offset 0?
>>
>> That is my test.properties:
>>
>> # name of test topic
>> kafka.etl.topic=SimpleTestEvent5
>>
>> # hdfs location of jars
>> hdfs.default.classpath.dir=/tmp/kafka/lib
>>
>> # number of test events to be generated
>> event.count=1
>>
>> # hadoop id and group
>> hadoop.job.ugi=kafka,hadoop
>>
>> # kafka server uri
>> kafka.server.uri=tcp://localhost:9092
>>
>> # hdfs location of input directory
>> input=/tmp/kafka/data
>>
>> # hdfs location of output directory
>> output=/tmp/kafka/output
>>
>> # limit the number of events to be fetched;
>> # value -1 means no limitation
>> kafka.request.limit=-1
>>
>> # kafka parameters
>> client.buffer.size=1048576
>> client.so.timeout=60000
>>
>>
>> Any ideas where can I have the problem?
>>
>

-- 
--
Felix

Re: hadoop-consumer never finishing

Posted by Raimon Bosch <ra...@gmail.com>.

Problem solved! It was a configuration issue.

Trying with:
event.count=1000
kafka.request.limit=1000

The mapper has stopped and it has generated a file with 1000 events. But If
we use kafka.request.limit=-1 is sending the same events over and over
again that's why my hadoop-consumer couldn't stop.

2011/11/7 Raimon Bosch <ra...@gmail.com>

>
> Hi,
>
> I have just compiled kafka from https://github.com/kafka-dev/kafka and
> executed the DataGenerator:
>
> ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties
>
> After that I have executed the hadoop consumer:
>
> ./run-class.sh kafka.etl.impl.SimpleKafkaETLJob test/test.properties
>
>
> The hadoop-consumer is generating a file on the specified output but it is
> never finishing, even if I try to generate only 1 event
> at test/test.properties. So this file is growing and growing, my guessing
> is that maybe it is reading always the offset 0?
>
> That is my test.properties:
>
> # name of test topic
> kafka.etl.topic=SimpleTestEvent5
>
> # hdfs location of jars
> hdfs.default.classpath.dir=/tmp/kafka/lib
>
> # number of test events to be generated
> event.count=1
>
> # hadoop id and group
> hadoop.job.ugi=kafka,hadoop
>
> # kafka server uri
> kafka.server.uri=tcp://localhost:9092
>
> # hdfs location of input directory
> input=/tmp/kafka/data
>
> # hdfs location of output directory
> output=/tmp/kafka/output
>
> # limit the number of events to be fetched;
> # value -1 means no limitation
> kafka.request.limit=-1
>
> # kafka parameters
> client.buffer.size=1048576
> client.so.timeout=60000
>
>
> Any ideas where can I have the problem?
>