You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stefan Schadwinkel <st...@smaato.com> on 2016/01/12 10:30:01 UTC

Using lz4 in Kafka seems to be broken by jpountz dependency upgrade in Spark 1.5.x+

Hi all,

we'd like to upgrade one of our Spark jobs from 1.4.1 to 1.5.2 (we run
Spark on Amazon EMR).

The job consumes and pushes lz4 compressed data from/to Kafka.

When upgrading to 1.5.2 everything works fine, except we get the following
exception:

java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange([BII)V
at
org.apache.kafka.common.message.KafkaLZ4BlockOutputStream.write(KafkaLZ4BlockOutputStream.java:179)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
at org.apache.kafka.common.record.Compressor.putLong(Compressor.java:132)
at
org.apache.kafka.common.record.MemoryRecords.append(MemoryRecords.java:85)
at
org.apache.kafka.clients.producer.internals.RecordBatch.tryAppend(RecordBatch.java:63)
at
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:171)
at
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:338)


Some digging yields:

- The net.jpountz.lz4/lz4 dependency was upgraded from 1.2.0 to 1.3.0 in
Spark 1.5+ to fix some issues with IBM JDK:
https://issues.apache.org/jira/browse/SPARK-7063

- The net.jpountz.lz4 1.3.0 version refactored net.jpountz.util.Utils to
net.jpountz.util.SafeUtils, thus yielding the above inconsistency:
https://github.com/jpountz/lz4-java/blob/1.3.0/src/java/net/jpountz/util/SafeUtils.java

- Kafka on github up to 0.9.0.0 uses jpountz 1.2.0 (
https://github.com/apache/kafka/blob/0.9.0.0/build.gradle), however, the
latest trunk upgrades to 1.3.0.


Patching Kafka to use SafeUtils is easy and creating Jar for our projects
that includes the correct depencies as well, the option of compiling Spark
1.5.x with jpountz 1.2.0 should also work, but I didn't try yet.

The main problem is that Spark 1.5.x+ and all Kafka 0.8 releases are
incompatible in regard to lz4 compression and we would like to avoid
provisioning EMR with a custom Spark through bootstrapping due to the
operational overhead.

One could try to play with the classpath and a Jar file with compatible
dependencies, but I was wondering if nobody else uses Kafka with lz4 and
Spark and has run into the same issue?

Maybe there's also an easier way to reconcile the situation?

BTW: There's a similar issue regarding Druid as well, but no reconciliation
beyond patching Kafka was discussed:
https://groups.google.com/forum/#!topic/druid-user/ZW_Clovf42k

Any input would be highly appreciated.


Best regards,
Stefan


-- 

*Dr. Stefan Schadwinkel*
Senior Big Data Developer
stefan.schadwinkel@smaato.com




Smaato Inc.
San Francisco – New York - Hamburg - Singapore
www.smaato.com





Germany:
Valentinskamp 70, Emporio, 19th Floor

20355 Hamburg


T          +49 (40) 3480 949 0
F          +49 (40) 492 19 055



The information contained in this communication may be CONFIDENTIAL and is
intended only for the use of the recipient(s) named above. If you are not
the intended recipient, you are hereby notified that any dissemination,
distribution, or copying of this communication, or any of its contents, is
strictly prohibited. If you have received this communication in error,
please notify the sender and delete/destroy the original message and any
copy of it from your computer or paper files.

Re: Using lz4 in Kafka seems to be broken by jpountz dependency upgrade in Spark 1.5.x+

Posted by Marcin Kuthan <ma...@gmail.com>.
Hi Stefan

Have you got any response from Spark team regarding LZ4 library
compatibility? To avoid this kind of problems, lz4 should be shaded in
Spark distribution, IMHO.

Currently I'm not able to update Spark in my application due to this issue.
It is not possible to consume compressed topics (lz4) using spark streaming
1.5 or higher :-(
As a temporary workaround I could patch LZ4 net.jpountz.util.Utils class or
Kafka KafkaLZ4BlockInputStream KafkaLZ4BlockOutputStream classes. Neither
elegant and safe.

Marcin


On 12 January 2016 at 10:30, Stefan Schadwinkel <
stefan.schadwinkel@smaato.com> wrote:

> Hi all,
>
> we'd like to upgrade one of our Spark jobs from 1.4.1 to 1.5.2 (we run
> Spark on Amazon EMR).
>
> The job consumes and pushes lz4 compressed data from/to Kafka.
>
> When upgrading to 1.5.2 everything works fine, except we get the following
> exception:
>
> java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange([BII)V
> at
> org.apache.kafka.common.message.KafkaLZ4BlockOutputStream.write(KafkaLZ4BlockOutputStream.java:179)
> at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
> at org.apache.kafka.common.record.Compressor.putLong(Compressor.java:132)
> at
> org.apache.kafka.common.record.MemoryRecords.append(MemoryRecords.java:85)
> at
> org.apache.kafka.clients.producer.internals.RecordBatch.tryAppend(RecordBatch.java:63)
> at
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:171)
> at
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:338)
>
>
> Some digging yields:
>
> - The net.jpountz.lz4/lz4 dependency was upgraded from 1.2.0 to 1.3.0 in
> Spark 1.5+ to fix some issues with IBM JDK:
> https://issues.apache.org/jira/browse/SPARK-7063
>
> - The net.jpountz.lz4 1.3.0 version refactored net.jpountz.util.Utils to
> net.jpountz.util.SafeUtils, thus yielding the above inconsistency:
> https://github.com/jpountz/lz4-java/blob/1.3.0/src/java/net/jpountz/util/SafeUtils.java
>
> - Kafka on github up to 0.9.0.0 uses jpountz 1.2.0 (
> https://github.com/apache/kafka/blob/0.9.0.0/build.gradle), however, the
> latest trunk upgrades to 1.3.0.
>
>
> Patching Kafka to use SafeUtils is easy and creating Jar for our projects
> that includes the correct depencies as well, the option of compiling Spark
> 1.5.x with jpountz 1.2.0 should also work, but I didn't try yet.
>
> The main problem is that Spark 1.5.x+ and all Kafka 0.8 releases are
> incompatible in regard to lz4 compression and we would like to avoid
> provisioning EMR with a custom Spark through bootstrapping due to the
> operational overhead.
>
> One could try to play with the classpath and a Jar file with compatible
> dependencies, but I was wondering if nobody else uses Kafka with lz4 and
> Spark and has run into the same issue?
>
> Maybe there's also an easier way to reconcile the situation?
>
> BTW: There's a similar issue regarding Druid as well, but no
> reconciliation beyond patching Kafka was discussed:
> https://groups.google.com/forum/#!topic/druid-user/ZW_Clovf42k
>
> Any input would be highly appreciated.
>
>
> Best regards,
> Stefan
>
>
> --
>
> *Dr. Stefan Schadwinkel*
> Senior Big Data Developer
> stefan.schadwinkel@smaato.com
>
>
>
>
> Smaato Inc.
> San Francisco – New York - Hamburg - Singapore
> www.smaato.com
>
>
>
>
>
> Germany:
> Valentinskamp 70, Emporio, 19th Floor
>
> 20355 Hamburg
>
>
> T          +49 (40) 3480 949 0
> F          +49 (40) 492 19 055
>
>
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>
>
>
>
>