You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by satyajit vegesna <sa...@gmail.com> on 2017/06/28 19:00:38 UTC
Building Kafka 0.10 Source for Structured Streaming Error.
Hi All,
I am trying too build Kafka-0-10-sql module under external folder in apache
spark source code.
Once i generate jar file using,
build/mvn package -DskipTests -pl external/kafka-0-10-sql
i get jar file created under external/kafka-0-10-sql/target.
And try to run spark-shell with jars created in target folder as below,
bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar
i get below error based on the command,
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://10.1.10.241:4040
Spark context available as 'sc' (master = local[*], app id =
local-1498676043936).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val lines =
spark.readStream.format("kafka").option("kafka.bootstrap.servers",
"localhost:9092").option("subscribe", "test").load()
java.lang.NoClassDefFoundError:
org/apache/kafka/common/serialization/ByteArrayDeserializer
at
org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:378)
at
org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
at
org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:325)
at
org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
at
org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:192)
at
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
at
org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
at
org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
at
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
... 48 elided
Caused by: java.lang.ClassNotFoundException:
org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 57 more
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
i have tried building the jar with dependencies, but still face the same
error.
But when i try to do --package with spark-shell using bin/spark-shell
--package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works fine.
The reason, i am trying to build something from source code, is because i
want to try pushing dataframe data into kafka topic, based on the url
https://github.com/apache/spark/commit/b0a5cd89097c563e9949d8cfcf84d18b03b8d24c,
which doesn't work with version 2.1.0.
Any help would be highly appreciated.
Regards,
Satyajit.
Re: Building Kafka 0.10 Source for Structured Streaming Error.
Posted by ayan guha <gu...@gmail.com>.
--jars does not do wildcard expansion. List out the jars as comma separated.
On Thu, 29 Jun 2017 at 5:17 am, satyajit vegesna <sa...@gmail.com>
wrote:
> Have updated the pom.xml in external/kafka-0-10-sql folder, in yellow , as
> below, and have run the command
> build/mvn package -DskipTests -pl external/kafka-0-10-sql
> which generated
> spark-sql-kafka-0-10_2.11-2.3.0-SNAPSHOT-jar-with-dependencies.jar
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <!--
>
> ~ Licensed to the Apache Software Foundation (ASF) under one or more
>
> ~ contributor license agreements. See the NOTICE file distributed with
>
> ~ this work for additional information regarding copyright ownership.
>
> ~ The ASF licenses this file to You under the Apache License, Version 2.0
>
> ~ (the "License"); you may not use this file except in compliance with
>
> ~ the License. You may obtain a copy of the License at
>
> ~
>
> ~ http://www.apache.org/licenses/LICENSE-2.0
>
> ~
>
> ~ Unless required by applicable law or agreed to in writing, software
>
> ~ distributed under the License is distributed on an "AS IS" BASIS,
>
> ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
>
> ~ See the License for the specific language governing permissions and
>
> ~ limitations under the License.
>
> -->
>
>
> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
> http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
> http://maven.apache.org/POM/4.0.0
> http://maven.apache.org/xsd/maven-4.0.0.xsd">
>
> <modelVersion>4.0.0</modelVersion>
>
> <parent>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-parent_2.11</artifactId>
>
> <version>2.3.0-SNAPSHOT</version>
>
> <relativePath>../../pom.xml</relativePath>
>
> </parent>
>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
>
> <properties>
>
> <sbt.project.name>sql-kafka-0-10</sbt.project.name>
>
> </properties>
>
> <packaging>jar</packaging>
>
> <name>Kafka 0.10 Source for Structured Streaming</name>
>
> <url>http://spark.apache.org/</url>
>
>
> <dependencies>
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-sql_${scala.binary.version}</artifactId>
>
> <version>${project.version}</version>
>
> <scope>provided</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-core_${scala.binary.version}</artifactId>
>
> <version>${project.version}</version>
>
> <type>test-jar</type>
>
> <scope>test</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-catalyst_${scala.binary.version}</artifactId>
>
> <version>${project.version}</version>
>
> <type>test-jar</type>
>
> <scope>test</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-sql_${scala.binary.version}</artifactId>
>
> <version>${project.version}</version>
>
> <type>test-jar</type>
>
> <scope>test</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.kafka</groupId>
>
> <artifactId>kafka-clients</artifactId>
>
> <version>0.10.0.1</version>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.kafka</groupId>
>
> <artifactId>kafka_${scala.binary.version}</artifactId>
>
> <version>0.10.0.1</version>
>
> </dependency>
>
> <dependency>
>
> <groupId>net.sf.jopt-simple</groupId>
>
> <artifactId>jopt-simple</artifactId>
>
> <version>3.2</version>
>
> <scope>test</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.scalacheck</groupId>
>
> <artifactId>scalacheck_${scala.binary.version}</artifactId>
>
> <scope>test</scope>
>
> </dependency>
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-tags_${scala.binary.version}</artifactId>
>
> </dependency>
>
>
> <!--
>
> This spark-tags test-dep is needed even though it isn't used in this
> module, otherwise testing-cmds that exclude
>
> them will yield errors.
>
> -->
>
> <dependency>
>
> <groupId>org.apache.spark</groupId>
>
> <artifactId>spark-tags_${scala.binary.version}</artifactId>
>
> <type>test-jar</type>
>
> <scope>test</scope>
>
> </dependency>
>
>
> </dependencies>
>
> <build>
>
>
> <outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
>
>
> <testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
>
> <plugins>
>
> <plugin>
>
> <artifactId>maven-assembly-plugin</artifactId>
>
> <version>3.0.0</version>
>
> <configuration>
>
> <descriptorRefs>
>
> <descriptorRef>jar-with-dependencies</descriptorRef>
>
> </descriptorRefs>
>
> </configuration>
>
> <executions>
>
> <execution>
>
> <id>make-assembly</id> <!-- this is used for inheritance
> merges -->
>
> <phase>package</phase> <!-- bind to the packaging phase -->
>
> <goals>
>
> <goal>single</goal>
>
> </goals>
>
> </execution>
>
> </executions>
>
> </plugin>
>
> </plugins>
>
> </build>
>
> </project>
>
>
> Regards,
>
> Satyajit.
>
> On Wed, Jun 28, 2017 at 12:12 PM, Shixiong(Ryan) Zhu <
> shixiong@databricks.com> wrote:
>
>> "--package" will add transitive dependencies that are not
>> "$SPARK_HOME/external/kafka-0-10-sql/target/*.jar".
>>
>> > i have tried building the jar with dependencies, but still face the
>> same error.
>>
>> What's the command you used?
>>
>> On Wed, Jun 28, 2017 at 12:00 PM, satyajit vegesna <
>> satyajit.apasprk@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am trying too build Kafka-0-10-sql module under external folder in
>>> apache spark source code.
>>> Once i generate jar file using,
>>> build/mvn package -DskipTests -pl external/kafka-0-10-sql
>>> i get jar file created under external/kafka-0-10-sql/target.
>>>
>>> And try to run spark-shell with jars created in target folder as below,
>>> bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar
>>>
>>> i get below error based on the command,
>>>
>>> Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties
>>>
>>> Setting default log level to "WARN".
>>>
>>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>>> setLogLevel(newLevel).
>>>
>>> 17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>>
>>> Spark context Web UI available at http://10.1.10.241:4040
>>>
>>> Spark context available as 'sc' (master = local[*], app id =
>>> local-1498676043936).
>>>
>>> Spark session available as 'spark'.
>>>
>>> Welcome to
>>>
>>> ____ __
>>>
>>> / __/__ ___ _____/ /__
>>>
>>> _\ \/ _ \/ _ `/ __/ '_/
>>>
>>> /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT
>>>
>>> /_/
>>>
>>>
>>>
>>> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
>>> 1.8.0_131)
>>>
>>> Type in expressions to have them evaluated.
>>>
>>> Type :help for more information.
>>>
>>> scala> val lines =
>>> spark.readStream.format("kafka").option("kafka.bootstrap.servers",
>>> "localhost:9092").option("subscribe", "test").load()
>>>
>>> java.lang.NoClassDefFoundError:
>>> org/apache/kafka/common/serialization/ByteArrayDeserializer
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:378)
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:325)
>>>
>>> at
>>> org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:60)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:192)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
>>>
>>> at
>>> org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
>>>
>>> at
>>> org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
>>>
>>> at
>>> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:150)
>>>
>>> ... 48 elided
>>>
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.kafka.common.serialization.ByteArrayDeserializer
>>>
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>
>>> ... 57 more
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>> i have tried building the jar with dependencies, but still face the same
>>> error.
>>>
>>> But when i try to do --package with spark-shell using bin/spark-shell
>>> --package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works
>>> fine.
>>>
>>> The reason, i am trying to build something from source code, is because
>>> i want to try pushing dataframe data into kafka topic, based on the url
>>> https://github.com/apache/spark/commit/b0a5cd89097c563e9949d8cfcf84d18b03b8d24c,
>>> which doesn't work with version 2.1.0.
>>>
>>>
>>> Any help would be highly appreciated.
>>>
>>>
>>> Regards,
>>>
>>> Satyajit.
>>>
>>>
>>>
>>
> --
Best Regards,
Ayan Guha
Re: Building Kafka 0.10 Source for Structured Streaming Error.
Posted by satyajit vegesna <sa...@gmail.com>.
Have updated the pom.xml in external/kafka-0-10-sql folder, in yellow , as
below, and have run the command
build/mvn package -DskipTests -pl external/kafka-0-10-sql
which generated
spark-sql-kafka-0-10_2.11-2.3.0-SNAPSHOT-jar-with-dependencies.jar
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="
http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.3.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<properties>
<sbt.project.name>sql-kafka-0-10</sbt.project.name>
</properties>
<packaging>jar</packaging>
<name>Kafka 0.10 Source for Structured Streaming</name>
<url>http://spark.apache.org/</url>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_${scala.binary.version}</artifactId>
<version>0.10.0.1</version>
</dependency>
<dependency>
<groupId>net.sf.jopt-simple</groupId>
<artifactId>jopt-simple</artifactId>
<version>3.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
</dependency>
<!--
This spark-tags test-dep is needed even though it isn't used in this
module, otherwise testing-cmds that exclude
them will yield errors.
-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges
-->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Regards,
Satyajit.
On Wed, Jun 28, 2017 at 12:12 PM, Shixiong(Ryan) Zhu <
shixiong@databricks.com> wrote:
> "--package" will add transitive dependencies that are not
> "$SPARK_HOME/external/kafka-0-10-sql/target/*.jar".
>
> > i have tried building the jar with dependencies, but still face the same
> error.
>
> What's the command you used?
>
> On Wed, Jun 28, 2017 at 12:00 PM, satyajit vegesna <
> satyajit.apasprk@gmail.com> wrote:
>
>> Hi All,
>>
>> I am trying too build Kafka-0-10-sql module under external folder in
>> apache spark source code.
>> Once i generate jar file using,
>> build/mvn package -DskipTests -pl external/kafka-0-10-sql
>> i get jar file created under external/kafka-0-10-sql/target.
>>
>> And try to run spark-shell with jars created in target folder as below,
>> bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar
>>
>> i get below error based on the command,
>>
>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>> s.properties
>>
>> Setting default log level to "WARN".
>>
>> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
>> setLogLevel(newLevel).
>>
>> 17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>>
>> Spark context Web UI available at http://10.1.10.241:4040
>>
>> Spark context available as 'sc' (master = local[*], app id =
>> local-1498676043936).
>>
>> Spark session available as 'spark'.
>>
>> Welcome to
>>
>> ____ __
>>
>> / __/__ ___ _____/ /__
>>
>> _\ \/ _ \/ _ `/ __/ '_/
>>
>> /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT
>>
>> /_/
>>
>>
>>
>> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
>> 1.8.0_131)
>>
>> Type in expressions to have them evaluated.
>>
>> Type :help for more information.
>>
>> scala> val lines = spark.readStream.format("kafka
>> ").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe",
>> "test").load()
>>
>> java.lang.NoClassDefFoundError: org/apache/kafka/common/serial
>> ization/ByteArrayDeserializer
>>
>> at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(Ka
>> fkaSourceProvider.scala:378)
>>
>> at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(
>> KafkaSourceProvider.scala)
>>
>> at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateSt
>> reamOptions(KafkaSourceProvider.scala:325)
>>
>> at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSche
>> ma(KafkaSourceProvider.scala:60)
>>
>> at org.apache.spark.sql.execution.datasources.DataSource.
>> sourceSchema(DataSource.scala:192)
>>
>> at org.apache.spark.sql.execution.datasources.DataSource.
>> sourceInfo$lzycompute(DataSource.scala:87)
>>
>> at org.apache.spark.sql.execution.datasources.DataSource.
>> sourceInfo(DataSource.scala:87)
>>
>> at org.apache.spark.sql.execution.streaming.StreamingRelation$.
>> apply(StreamingRelation.scala:30)
>>
>> at org.apache.spark.sql.streaming.DataStreamReader.load(
>> DataStreamReader.scala:150)
>>
>> ... 48 elided
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.kafka.common.serialization.ByteArrayDeserializer
>>
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>
>> ... 57 more
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++++++++++++++
>>
>> i have tried building the jar with dependencies, but still face the same
>> error.
>>
>> But when i try to do --package with spark-shell using bin/spark-shell
>> --package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works
>> fine.
>>
>> The reason, i am trying to build something from source code, is because i
>> want to try pushing dataframe data into kafka topic, based on the url
>> https://github.com/apache/spark/commit/b0a5cd89097c563e9
>> 949d8cfcf84d18b03b8d24c, which doesn't work with version 2.1.0.
>>
>>
>> Any help would be highly appreciated.
>>
>>
>> Regards,
>>
>> Satyajit.
>>
>>
>>
>
Re: Building Kafka 0.10 Source for Structured Streaming Error.
Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
"--package" will add transitive dependencies that are not
"$SPARK_HOME/external/kafka-0-10-sql/target/*.jar".
> i have tried building the jar with dependencies, but still face the same
error.
What's the command you used?
On Wed, Jun 28, 2017 at 12:00 PM, satyajit vegesna <
satyajit.apasprk@gmail.com> wrote:
> Hi All,
>
> I am trying too build Kafka-0-10-sql module under external folder in
> apache spark source code.
> Once i generate jar file using,
> build/mvn package -DskipTests -pl external/kafka-0-10-sql
> i get jar file created under external/kafka-0-10-sql/target.
>
> And try to run spark-shell with jars created in target folder as below,
> bin/spark-shell --jars $SPARK_HOME/external/kafka-0-10-sql/target/*.jar
>
> i get below error based on the command,
>
> Using Spark's default log4j profile: org/apache/spark/log4j-
> defaults.properties
>
> Setting default log level to "WARN".
>
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
> setLogLevel(newLevel).
>
> 17/06/28 11:54:03 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> Spark context Web UI available at http://10.1.10.241:4040
>
> Spark context available as 'sc' (master = local[*], app id =
> local-1498676043936).
>
> Spark session available as 'spark'.
>
> Welcome to
>
> ____ __
>
> / __/__ ___ _____/ /__
>
> _\ \/ _ \/ _ `/ __/ '_/
>
> /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT
>
> /_/
>
>
>
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java
> 1.8.0_131)
>
> Type in expressions to have them evaluated.
>
> Type :help for more information.
>
> scala> val lines = spark.readStream.format("kafka").option("kafka.bootstrap.servers",
> "localhost:9092").option("subscribe", "test").load()
>
> java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/
> ByteArrayDeserializer
>
> at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(
> KafkaSourceProvider.scala:378)
>
> at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(
> KafkaSourceProvider.scala)
>
> at org.apache.spark.sql.kafka010.KafkaSourceProvider.
> validateStreamOptions(KafkaSourceProvider.scala:325)
>
> at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(
> KafkaSourceProvider.scala:60)
>
> at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(
> DataSource.scala:192)
>
> at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$
> lzycompute(DataSource.scala:87)
>
> at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(
> DataSource.scala:87)
>
> at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(
> StreamingRelation.scala:30)
>
> at org.apache.spark.sql.streaming.DataStreamReader.
> load(DataStreamReader.scala:150)
>
> ... 48 elided
>
> Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.
> serialization.ByteArrayDeserializer
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
> ... 57 more
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> i have tried building the jar with dependencies, but still face the same
> error.
>
> But when i try to do --package with spark-shell using bin/spark-shell
> --package org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 , it works
> fine.
>
> The reason, i am trying to build something from source code, is because i
> want to try pushing dataframe data into kafka topic, based on the url
> https://github.com/apache/spark/commit/b0a5cd89097c563e9949d8cfcf84d1
> 8b03b8d24c, which doesn't work with version 2.1.0.
>
>
> Any help would be highly appreciated.
>
>
> Regards,
>
> Satyajit.
>
>
>