You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Spico Florin <sp...@gmail.com> on 2019/07/30 13:38:43 UTC

Kafka Integration libraries put in the fat jar

Hello!

I would like to use the spark structured streaming integrated with Kafka
the way is described here:
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html


but I got the following issue:

Caused by: org.apache.spark.sql.AnalysisException: Failed to find data
source: kafka. Please deploy the application as per the deployment section
of "Structured Streaming + Kafka Integration Guide".;

eventhough  I've added in the generated fat jar the kafka-sql dependencies:
 <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.3</version>
<scope>compile</scope>
</dependency>

When I submit with the command

spark-submit  --master spark://spark-master:7077  --class myClass
--deploy-mode client *--packages
org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3
my-fat-jar-with-dependencies.jar*

the problem is gone.

Since the packages option requires to download the libaries from an
environment that has access to internet and I don't have it, can you please
advice what can I do to add kafka dependecies either in the fat jar or
other solution.

Thank you.

Regards,

Florin

Re: Kafka Integration libraries put in the fat jar

Posted by Spico Florin <sp...@gmail.com>.
Hi!
 Thanks to Jacek Laskowski
<https://stackoverflow.com/users/1305344/jacek-laskowski>, I found the
answer here

https://stackoverflow.com/questions/51792203/how-to-get-spark-kafka-org-apache-sparkspark-sql-kafka-0-10-2-112-1-0-dependen

Just add the maven shade plugin:

 <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.0.0</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer

implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">

<resource>META-INF/services/org.apache.spark.sql.sources.DataSourceRegister</resource>
                            </transformer>
                            <transformer

implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

<mainClass>org.apache.spark.examples.sql.streaming.JavaStructuredKafkaWordCount</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>


On Tue, Jul 30, 2019 at 4:38 PM Spico Florin <sp...@gmail.com> wrote:

> Hello!
>
> I would like to use the spark structured streaming integrated with Kafka
> the way is described here:
>
> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
>
>
> but I got the following issue:
>
> Caused by: org.apache.spark.sql.AnalysisException: Failed to find data
> source: kafka. Please deploy the application as per the deployment section
> of "Structured Streaming + Kafka Integration Guide".;
>
> eventhough  I've added in the generated fat jar the kafka-sql dependencies:
>  <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
> <version>2.4.3</version>
> <scope>compile</scope>
> </dependency>
>
> When I submit with the command
>
> spark-submit  --master spark://spark-master:7077  --class myClass
> --deploy-mode client *--packages
> org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.3
> my-fat-jar-with-dependencies.jar*
>
> the problem is gone.
>
> Since the packages option requires to download the libaries from an
> environment that has access to internet and I don't have it, can you please
> advice what can I do to add kafka dependecies either in the fat jar or
> other solution.
>
> Thank you.
>
> Regards,
>
> Florin
>
>
>