You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Spico Florin <sp...@gmail.com> on 2019/07/30 13:38:43 UTC
Kafka Integration libraries put in the fat jar
Hello!
I would like to use the spark structured streaming integrated with Kafka
the way is described here:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
but I got the following issue:
Caused by: org.apache.spark.sql.AnalysisException: Failed to find data
source: kafka. Please deploy the application as per the deployment section
of "Structured Streaming + Kafka Integration Guide".;
eventhough I've added in the generated fat jar the kafka-sql dependencies:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.3</version>
<scope>compile</scope>
</dependency>
When I submit with the command
spark-submit --master spark://spark-master:7077 --class myClass
--deploy-mode client *--packages
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3
my-fat-jar-with-dependencies.jar*
the problem is gone.
Since the packages option requires to download the libaries from an
environment that has access to internet and I don't have it, can you please
advice what can I do to add kafka dependecies either in the fat jar or
other solution.
Thank you.
Regards,
Florin
Re: Kafka Integration libraries put in the fat jar
Posted by Spico Florin <sp...@gmail.com>.
Hi!
Thanks to Jacek Laskowski
<https://stackoverflow.com/users/1305344/jacek-laskowski>, I found the
answer here
https://stackoverflow.com/questions/51792203/how-to-get-spark-kafka-org-apache-sparkspark-sql-kafka-0-10-2-112-1-0-dependen
Just add the maven shade plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.apache.spark.examples.sql.streaming.JavaStructuredKafkaWordCount</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
On Tue, Jul 30, 2019 at 4:38 PM Spico Florin <sp...@gmail.com> wrote:
> Hello!
>
> I would like to use the spark structured streaming integrated with Kafka
> the way is described here:
>
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
>
>
> but I got the following issue:
>
> Caused by: org.apache.spark.sql.AnalysisException: Failed to find data
> source: kafka. Please deploy the application as per the deployment section
> of "Structured Streaming + Kafka Integration Guide".;
>
> eventhough I've added in the generated fat jar the kafka-sql dependencies:
> <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
> <version>2.4.3</version>
> <scope>compile</scope>
> </dependency>
>
> When I submit with the command
>
> spark-submit --master spark://spark-master:7077 --class myClass
> --deploy-mode client *--packages
> org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3
> my-fat-jar-with-dependencies.jar*
>
> the problem is gone.
>
> Since the packages option requires to download the libaries from an
> environment that has access to internet and I don't have it, can you please
> advice what can I do to add kafka dependecies either in the fat jar or
> other solution.
>
> Thank you.
>
> Regards,
>
> Florin
>
>
>