You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Banias H <ba...@gmail.com> on 2016/08/05 15:47:48 UTC

How to leverage kafka-connect-hdfs in HDP

Hi,

We are using Hortonworks HDP 2.4 with Apache Kafla 0.9 and we have an
in-house solution to pull messages from Kafka to HDFS. I would like to try
using kakfa-connector-hdfs to push messages to HDFS. As far as I concern,
Apache Kafka 0.9 doesn't come with kafka-connector-hdfs. What is a solid
way to run kafka-connector-hdfs in HDP? I won't be able to install
Confluent platform there though... I would appreciate any pointers. Thanks.

-B

Re: How to leverage kafka-connect-hdfs in HDP

Posted by Gwen Shapira <gw...@confluent.io>.
This is wierd, and looks like an Ambari class that ended up in the
classpath and that somehow we are trying to load?

Perhaps Sriharsha or one of the HDP dudes can help.

Does it happen without the Connector too? It looks like it has to do
with Kafka broker metrics in general:
https://issues.apache.org/jira/browse/AMBARI-9185

Gwen

On Fri, Aug 5, 2016 at 2:16 PM, Banias H <ba...@gmail.com> wrote:
> Thanks Gwen. I went with Confluent 2.0 as it has Kakfa 0.9 that matches
> with that in HDP 2.4. I installed confluent-kafka-connect-hdfs and
> confluent-common and softlinked a couple jar into kafka libs/.
>
> I was able to start Kafka Connect but kafka.out was showing the following
> error:
>
> [2016-08-05 20:57:01,187] ERROR Exception emitting metrics
> (org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter)
> org.apache.hadoop.metrics2.sink.timeline.UnableToConnectException:
> java.net.ConnectException: Connection refused
> at
> org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink.emitMetrics(AbstractTimelineMetricsSink.java:87)
> at
> org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter.access$200(KafkaTimelineMetricsReporter.java:58)
> at
> org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter$TimelineScheduledReporter.report(KafkaTimelineMetricsReporter.java:253)
> at
> org.apache.hadoop.metrics2.sink.kafka.ScheduledReporter.report(ScheduledReporter.java:185)
> at
> org.apache.hadoop.metrics2.sink.kafka.ScheduledReporter$1.run(ScheduledReporter.java:137)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Is there any configuration I should also look into? I started with
>
> ./connect-standalone.sh ../config/connect-standalone.properties
> /etc/kafka-connect-hdfs/quickstart-hdfs.properties
>
> And here is my quickstart-hdfs.properties:
>
> name=hdfs-sink
> connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
> tasks.max=1
> topics=hdfs
> hdfs.url=hdfs://sandbox.hortonworks.com:8020
> flush.size=3
>
> Thanks,
> -B
>
> On Fri, Aug 5, 2016 at 3:31 PM, Gwen Shapira <gw...@confluent.io> wrote:
>
>> The installation instructions from Confluent will still work for you :)
>>
>> If you are using deb/rpm packages, basically add the repositories as
>> explained here:
>> http://docs.confluent.io/3.0.0/installation.html#rpm-packages-via-yum
>>
>> and then:
>> sudo yum install confluent-kafka-connect-hdfs
>> or
>> sudo apt-get install confluent-kafka-connect-hdfs
>>
>> This will put the connector config in /etc/kafka-connect-hdfs and the
>> connector jars in /usr/share/java/
>>
>> You may need to move the jar so it is on the classpath for connect
>> (I'm not sure what's the default kafka classpath for HDP).
>>
>> BTW. We (Confluent) are testing the HDFS connector with HDP (we
>> basically install Confluent platform on one machine and HDP on another
>> and use Connect to move data) - so this setup should work :)
>>
>> Gwen
>>
>>
>> On Fri, Aug 5, 2016 at 8:47 AM, Banias H <ba...@gmail.com> wrote:
>> > Hi,
>> >
>> > We are using Hortonworks HDP 2.4 with Apache Kafla 0.9 and we have an
>> > in-house solution to pull messages from Kafka to HDFS. I would like to
>> try
>> > using kakfa-connector-hdfs to push messages to HDFS. As far as I concern,
>> > Apache Kafka 0.9 doesn't come with kafka-connector-hdfs. What is a solid
>> > way to run kafka-connector-hdfs in HDP? I won't be able to install
>> > Confluent platform there though... I would appreciate any pointers.
>> Thanks.
>> >
>> > -B
>>
>>
>>
>> --
>> Gwen Shapira
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter | blog
>>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Re: How to leverage kafka-connect-hdfs in HDP

Posted by Banias H <ba...@gmail.com>.
Thanks Gwen. I went with Confluent 2.0 as it has Kakfa 0.9 that matches
with that in HDP 2.4. I installed confluent-kafka-connect-hdfs and
confluent-common and softlinked a couple jar into kafka libs/.

I was able to start Kafka Connect but kafka.out was showing the following
error:

[2016-08-05 20:57:01,187] ERROR Exception emitting metrics
(org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter)
org.apache.hadoop.metrics2.sink.timeline.UnableToConnectException:
java.net.ConnectException: Connection refused
at
org.apache.hadoop.metrics2.sink.timeline.AbstractTimelineMetricsSink.emitMetrics(AbstractTimelineMetricsSink.java:87)
at
org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter.access$200(KafkaTimelineMetricsReporter.java:58)
at
org.apache.hadoop.metrics2.sink.kafka.KafkaTimelineMetricsReporter$TimelineScheduledReporter.report(KafkaTimelineMetricsReporter.java:253)
at
org.apache.hadoop.metrics2.sink.kafka.ScheduledReporter.report(ScheduledReporter.java:185)
at
org.apache.hadoop.metrics2.sink.kafka.ScheduledReporter$1.run(ScheduledReporter.java:137)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Is there any configuration I should also look into? I started with

./connect-standalone.sh ../config/connect-standalone.properties
/etc/kafka-connect-hdfs/quickstart-hdfs.properties

And here is my quickstart-hdfs.properties:

name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=hdfs
hdfs.url=hdfs://sandbox.hortonworks.com:8020
flush.size=3

Thanks,
-B

On Fri, Aug 5, 2016 at 3:31 PM, Gwen Shapira <gw...@confluent.io> wrote:

> The installation instructions from Confluent will still work for you :)
>
> If you are using deb/rpm packages, basically add the repositories as
> explained here:
> http://docs.confluent.io/3.0.0/installation.html#rpm-packages-via-yum
>
> and then:
> sudo yum install confluent-kafka-connect-hdfs
> or
> sudo apt-get install confluent-kafka-connect-hdfs
>
> This will put the connector config in /etc/kafka-connect-hdfs and the
> connector jars in /usr/share/java/
>
> You may need to move the jar so it is on the classpath for connect
> (I'm not sure what's the default kafka classpath for HDP).
>
> BTW. We (Confluent) are testing the HDFS connector with HDP (we
> basically install Confluent platform on one machine and HDP on another
> and use Connect to move data) - so this setup should work :)
>
> Gwen
>
>
> On Fri, Aug 5, 2016 at 8:47 AM, Banias H <ba...@gmail.com> wrote:
> > Hi,
> >
> > We are using Hortonworks HDP 2.4 with Apache Kafla 0.9 and we have an
> > in-house solution to pull messages from Kafka to HDFS. I would like to
> try
> > using kakfa-connector-hdfs to push messages to HDFS. As far as I concern,
> > Apache Kafka 0.9 doesn't come with kafka-connector-hdfs. What is a solid
> > way to run kafka-connector-hdfs in HDP? I won't be able to install
> > Confluent platform there though... I would appreciate any pointers.
> Thanks.
> >
> > -B
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Re: How to leverage kafka-connect-hdfs in HDP

Posted by Gwen Shapira <gw...@confluent.io>.
The installation instructions from Confluent will still work for you :)

If you are using deb/rpm packages, basically add the repositories as
explained here:
http://docs.confluent.io/3.0.0/installation.html#rpm-packages-via-yum

and then:
sudo yum install confluent-kafka-connect-hdfs
or
sudo apt-get install confluent-kafka-connect-hdfs

This will put the connector config in /etc/kafka-connect-hdfs and the
connector jars in /usr/share/java/

You may need to move the jar so it is on the classpath for connect
(I'm not sure what's the default kafka classpath for HDP).

BTW. We (Confluent) are testing the HDFS connector with HDP (we
basically install Confluent platform on one machine and HDP on another
and use Connect to move data) - so this setup should work :)

Gwen


On Fri, Aug 5, 2016 at 8:47 AM, Banias H <ba...@gmail.com> wrote:
> Hi,
>
> We are using Hortonworks HDP 2.4 with Apache Kafla 0.9 and we have an
> in-house solution to pull messages from Kafka to HDFS. I would like to try
> using kakfa-connector-hdfs to push messages to HDFS. As far as I concern,
> Apache Kafka 0.9 doesn't come with kafka-connector-hdfs. What is a solid
> way to run kafka-connector-hdfs in HDP? I won't be able to install
> Confluent platform there though... I would appreciate any pointers. Thanks.
>
> -B



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog