You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (Jira)" <ji...@apache.org> on 2019/08/31 15:31:00 UTC

[jira] [Resolved] (SPARK-28903) Fix AWS JDK version conflict that breaks Pyspark Kinesis tests

     [ https://issues.apache.org/jira/browse/SPARK-28903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-28903.
-------------------------------
    Fix Version/s: 3.0.0
                   2.4.5
       Resolution: Fixed

Issue resolved by pull request 25559
[https://github.com/apache/spark/pull/25559]

> Fix AWS JDK version conflict that breaks Pyspark Kinesis tests
> --------------------------------------------------------------
>
>                 Key: SPARK-28903
>                 URL: https://issues.apache.org/jira/browse/SPARK-28903
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.0.0, 2.4.3
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Major
>             Fix For: 2.4.5, 3.0.0
>
>
> The Pyspark Kinesis tests are failing, at least in master:
> {code}
> ======================================================================
> ERROR: test_kinesis_stream (pyspark.streaming.tests.test_kinesis.KinesisStreamTests)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/pyspark/streaming/tests/test_kinesis.py", line 44, in test_kinesis_stream
>     kinesisTestUtils = self.ssc._jvm.org.apache.spark.streaming.kinesis.KinesisTestUtils(2)
>   File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1554, in __call__
>     answer, self._gateway_client, None, self._fqn)
>   File "/home/jenkins/workspace/SparkPullRequestBuilder@2/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
>     format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.kinesis.KinesisTestUtils.
> : java.lang.NoSuchMethodError: com.amazonaws.regions.Region.getAvailableEndpoints()Ljava/util/Collection;
> 	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1(KinesisTestUtils.scala:211)
> 	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$1$adapted(KinesisTestUtils.scala:211)
> 	at scala.collection.Iterator.find(Iterator.scala:993)
> 	at scala.collection.Iterator.find$(Iterator.scala:990)
> 	at scala.collection.AbstractIterator.find(Iterator.scala:1429)
> 	at scala.collection.IterableLike.find(IterableLike.scala:81)
> 	at scala.collection.IterableLike.find$(IterableLike.scala:80)
> 	at scala.collection.AbstractIterable.find(Iterable.scala:56)
> 	at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:211)
> 	at org.apache.spark.streaming.kinesis.KinesisTestUtils.<init>(KinesisTestUtils.scala:46)
> ...
> {code}
> The non-Python Kinesis tests are fine though. It turns out that this is because Pyspark tests use the output of the Spark assembly, and it pulls in hadoop-cloud, which in turn pulls in an old AWS Java SDK.
> Per [~stevel@apache.org], it seems like we can just resolve this by excluding the aws-java-sdk dependency. See the attached PR for some more detail about the debugging and other options.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org