You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Raymond Xu (Jira)" <ji...@apache.org> on 2022/03/30 04:13:00 UTC

[jira] [Closed] (HUDI-3610) Validate Hudi Kafka Connect Sink writing to S3

     [ https://issues.apache.org/jira/browse/HUDI-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raymond Xu closed HUDI-3610.
----------------------------
    Fix Version/s:     (was: 0.11.0)
       Resolution: Not A Problem

> Validate Hudi Kafka Connect Sink writing to S3
> ----------------------------------------------
>
>                 Key: HUDI-3610
>                 URL: https://issues.apache.org/jira/browse/HUDI-3610
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: kafka-connect
>            Reporter: Ethan Guo
>            Assignee: Rajesh Mahindra
>            Priority: Critical
>
> From community:
> Hi guys, I'm trying to implement this architecture with hudi
> db table — Debezium --> kafka ---Hudi sink connector --> S3 bucket
> My setting
> Kafka version 2.4
> Hudi version 0.10.1
> Hdf sink connector version 10.1.4
> I'm encountering this error
> {code:java}
> ERROR WorkerSinkTask{id=<XXX>} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
> java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
> at org.apache.hudi.connect.HoodieSinkTask.start(HoodieSinkTask.java:80)
> at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:312)
> at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
> at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
> at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
> at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
> at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:103)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
> ... 9 more {code}
> this is the Dockerfile I used to  bake the custom image 
> {code:java}
> #==================
> FROM maven:3.8.4-openjdk-8-slim as build-hudi
> ENV HUDI_VERSION=0.10.1
> RUN mkdir /home/hudi && \
>     curl -L https://github.com/apache/hudi/archive/refs/tags/release-$HUDI_VERSION.tar.gz \
>     > hudi-release-$HUDI_VERSION.tar.gz && \
>     tar -xzvf ./hudi-release-$HUDI_VERSION.tar.gz -C /home/hudi && \
>     rm ./hudi-release-$HUDI_VERSION.tar.gz && \
>     cd /home/hudi/hudi-release-$HUDI_VERSION && \
>     mvn package -DskipTests -pl packaging/hudi-kafka-connect-bundle -am
> #==================
> FROM confluentinc/cp-kafka-connect:7.0.1
> ENV DEBEZIUM_VERSION=1.4.1.Final \
>     MAVEN_REPO_CORE="https://repo1.maven.org/maven2" \
>     CONNECTOR=mysql \
>     KAFKA_CONNECT_PLUGINS_DIR=/usr/share/java \
>     DATAGEN_VERSION=0.5.3 \
>     ADX_SINK_CONNECTOR_VERSION=2.2.0 \
>     AMAZON_S3_SINK_CONNECTOR_VERSION=10.0.3 \
>     HDFS2_SINK_CONNECTOR_VERSION=10.1.4 \
>     HUDI_OUTPUT_JAR_FILE="hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar" \
>     HUDI_VERSION=0.10.1
> RUN curl -fSL -o /tmp/plugin.tar.gz \
>   $MAVEN_REPO_CORE/io/debezium/debezium-connector-$CONNECTOR/$DEBEZIUM_VERSION/debezium-connector-$CONNECTOR-$DEBEZIUM_VERSION-plugin.tar.gz && \
>   tar -xzf /tmp/plugin.tar.gz -C $KAFKA_CONNECT_PLUGINS_DIR && \
>   rm -f /tmp/plugin.tar.gz
> RUN confluent-hub install --no-prompt confluentinc/kafka-connect-datagen:$DATAGEN_VERSION && \
>     confluent-hub install --no-prompt microsoftcorporation/kafka-sink-azure-kusto:$ADX_SINK_CONNECTOR_VERSION && \
>     confluent-hub install --no-prompt confluentinc/kafka-connect-s3:$AMAZON_S3_SINK_CONNECTOR_VERSION && \
>     confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:$HDFS2_SINK_CONNECTOR_VERSION
> COPY --from=build-hudi /home/hudi/hudi-release-$HUDI_VERSION/packaging/hudi-kafka-connect-bundle/target/hudi-kafka-connect-bundle-$HUDI_VERSION.jar $KAFKA_CONNECT_PLUGINS_DIR/$HUDI_OUTPUT_JAR_FILE {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)