You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Nick Pentreath <ni...@gmail.com> on 2015/12/11 11:07:30 UTC

Re: Spark streaming with Kinesis broken?

cc'ing dev list

Ok, looks like when the KCL version was updated in
https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
probably leading to dependency conflict, though as Burak mentions its hard
to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
in driver or worker logs, so any exception is getting swallowed somewhere.

Run starting. Expected test count is: 4
KinesisStreamSuite:
Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
Kinesis streams for tests.
- KinesisUtils API
- RDD generation
- basic operation *** FAILED ***
  The code passed to eventually never returned normally. Attempted 13 times
over 2.047777 minutes. Last failure message: Set() did not equal Set(5, 10,
1, 6, 9, 2, 7, 3, 8, 4)
  Data received does not match data sent. (KinesisStreamSuite.scala:188)
- failure recovery *** FAILED ***
  The code passed to eventually never returned normally. Attempted 63 times
over 2.0286383166666666 minutes. Last failure message: isCheckpointPresent
was true, but 0 was not greater than 10. (KinesisStreamSuite.scala:228)
Run completed in 5 minutes, 0 seconds.
Total number of tests run: 4
Suites: completed 1, aborted 0
Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
*** 2 TESTS FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------


KCL 1.3.0 depends on *1.9.37* SDK (
https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
while the Spark Kinesis dependency was kept at *1.9.16.*

I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
1.9.37 and everything works.

Run starting. Expected test count is: 28
KinesisBackedBlockRDDSuite:
Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
Kinesis streams for tests.
- Basic reading from Kinesis
- Read data available in both block manager and Kinesis
- Read data available only in block manager, not in Kinesis
- Read data available only in Kinesis, not in block manager
- Read data available partially in block manager, rest in Kinesis
- Test isBlockValid skips block fetching from block manager
- Test whether RDD is valid after removing blocks from block anager
KinesisStreamSuite:
- KinesisUtils API
- RDD generation
- basic operation
- failure recovery
KinesisReceiverSuite:
- check serializability of SerializableAWSCredentials
- process records including store and checkpoint
- shouldn't store and checkpoint when receiver is stopped
- shouldn't checkpoint when exception occurs during store
- should set checkpoint time to currentTime + checkpoint interval upon
instantiation
- should checkpoint if we have exceeded the checkpoint interval
- shouldn't checkpoint if we have not exceeded the checkpoint interval
- should add to time when advancing checkpoint
- shutdown should checkpoint if the reason is TERMINATE
- shutdown should not checkpoint if the reason is something other than
TERMINATE
- retry success on first attempt
- retry success on second attempt after a Kinesis throttling exception
- retry success on second attempt after a Kinesis dependency exception
- retry failed after a shutdown exception
- retry failed after an invalid state exception
- retry failed after unexpected exception
- retry failed after exhausing all retries
Run completed in 3 minutes, 28 seconds.
Total number of tests run: 28
Suites: completed 4, aborted 0
Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
file a JIRA for this?

@dev-list, since KCL brings in AWS SDK dependencies itself, is it necessary
to declare an explicit dependency on aws-java-sdk in the Kinesis POM? Also,
from KCL 1.5.0+, only the relevant components used from the AWS SDKs are
brought in, making things a bit leaner (this can be upgraded in Spark
1.7/2.0 perhaps). All local tests (and integration tests) pass with
removing the explicit dependency and only depending on KCL. Is aws-java-sdk
used anywhere else (AFAIK it is not, but in case I missed something let me
know any good reason to keep the explicit dependency)?

N



On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <ni...@gmail.com>
wrote:

> Yeah also the integration tests need to be specifically run - I would have
> thought the contributor would have run those tests and also tested the
> change themselves using live Kinesis :(
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <br...@gmail.com> wrote:
>
>> I don't think the Kinesis tests specifically ran when that was merged
>> into 1.5.2 :(
>> https://github.com/apache/spark/pull/8957
>>
>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>
>> AFAIK pom changes don't trigger the Kinesis tests.
>>
>> Burak
>>
>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <nick.pentreath@gmail.com
>> > wrote:
>>
>>> Yup also works for me on master branch as I've been testing DynamoDB
>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>> using.
>>>
>>> So theKCL version does seem like it could be the issue - somewhere
>>> along the line an exception must be getting swallowed. Though the tests
>>> should have picked this up? Will dig deeper.
>>>
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <br...@gmail.com>
>>> wrote:
>>>
>>>> Yes, it worked in the 1.6 branch as of commit
>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>> serious of an issue, although it would be nice to know what the root cause
>>>> is to avoid a regression.
>>>>
>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <br...@gmail.com> wrote:
>>>>
>>>>> I've noticed this happening when there was some dependency conflicts,
>>>>> and it is super hard to debug.
>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>> I feel like that seems to be the problem...
>>>>>
>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>
>>>>> Thanks,
>>>>> Burak
>>>>>
>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <brianmlondon@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>> just pulled the latest version from github and it seems to be working
>>>>>> there.  To reproduce:
>>>>>>
>>>>>>
>>>>>>    1. Download spark 1.5.2 from
>>>>>>    http://spark.apache.org/downloads.html
>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>    -DskipTests clean package
>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>    4. Then run simultaneously:
>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
>>>>>>       stream name] [endpoint URL] 100 10
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>> > Could you provide an example / test case and more detail on what
>>>>>>> issue
>>>>>>> > you're facing?
>>>>>>> >
>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>> stream and
>>>>>>> > using stream.print() to show the records, and it works under 1.5.1
>>>>>>> but
>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>> >
>>>>>>> > UI for 1.5.2:
>>>>>>> >
>>>>>>> > Inline image 1
>>>>>>> >
>>>>>>> > UI for 1.5.1:
>>>>>>> >
>>>>>>> > Inline image 2
>>>>>>> >
>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>> brianmlondon@gmail.com
>>>>>>> > <ma...@gmail.com>> wrote:
>>>>>>> >
>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon earlier
>>>>>>> in the
>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>> change the
>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>> before I
>>>>>>> >     open a JIRA issue for it?
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbonofre@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>

Re: Spark streaming with Kinesis broken?

Posted by Brian London <br...@gmail.com>.
Yes, it's against master: https://github.com/apache/spark/pull/10256

I'll push the KCL version bump after my local tests finish.

On Fri, Dec 11, 2015 at 10:42 AM Nick Pentreath <ni...@gmail.com>
wrote:

> Is that PR against master branch?
>
> S3 read comes from Hadoop / jet3t afaik
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Fri, Dec 11, 2015 at 5:38 PM, Brian London <br...@gmail.com>
> wrote:
>
>> That's good news  I've got a PR in to up the SDK version to 1.10.40 and
>> the KCL to 1.6.1 which I'm running tests on locally now.
>>
>> Is the AWS SDK not used for reading/writing from S3 or do we get that for
>> free from the Hadoop dependencies?
>>
>> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>>> cc'ing dev list
>>>
>>> Ok, looks like when the KCL version was updated in
>>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>>> probably leading to dependency conflict, though as Burak mentions its hard
>>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>>
>>> Run starting. Expected test count is: 4
>>> KinesisStreamSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 13
>>> times over 2.047777 minutes. Last failure message: Set() did not equal
>>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>>> - failure recovery *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 63
>>> times over 2.0286383166666666 minutes. Last failure message:
>>> isCheckpointPresent was true, but 0 was not greater than 10.
>>> (KinesisStreamSuite.scala:228)
>>> Run completed in 5 minutes, 0 seconds.
>>> Total number of tests run: 4
>>> Suites: completed 1, aborted 0
>>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>>> *** 2 TESTS FAILED ***
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> ------------------------------------------------------------------------
>>>
>>>
>>> KCL 1.3.0 depends on *1.9.37* SDK (
>>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>>
>>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS
>>> SDK 1.9.37 and everything works.
>>>
>>> Run starting. Expected test count is: 28
>>> KinesisBackedBlockRDDSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - Basic reading from Kinesis
>>> - Read data available in both block manager and Kinesis
>>> - Read data available only in block manager, not in Kinesis
>>> - Read data available only in Kinesis, not in block manager
>>> - Read data available partially in block manager, rest in Kinesis
>>> - Test isBlockValid skips block fetching from block manager
>>> - Test whether RDD is valid after removing blocks from block anager
>>> KinesisStreamSuite:
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation
>>> - failure recovery
>>> KinesisReceiverSuite:
>>> - check serializability of SerializableAWSCredentials
>>> - process records including store and checkpoint
>>> - shouldn't store and checkpoint when receiver is stopped
>>> - shouldn't checkpoint when exception occurs during store
>>> - should set checkpoint time to currentTime + checkpoint interval upon
>>> instantiation
>>> - should checkpoint if we have exceeded the checkpoint interval
>>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>>> - should add to time when advancing checkpoint
>>> - shutdown should checkpoint if the reason is TERMINATE
>>> - shutdown should not checkpoint if the reason is something other than
>>> TERMINATE
>>> - retry success on first attempt
>>> - retry success on second attempt after a Kinesis throttling exception
>>> - retry success on second attempt after a Kinesis dependency exception
>>> - retry failed after a shutdown exception
>>> - retry failed after an invalid state exception
>>> - retry failed after unexpected exception
>>> - retry failed after exhausing all retries
>>> Run completed in 3 minutes, 28 seconds.
>>> Total number of tests run: 28
>>> Suites: completed 4, aborted 0
>>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>>> All tests passed.
>>>
>>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can
>>> you file a JIRA for this?
>>>
>>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>>> used anywhere else (AFAIK it is not, but in case I missed something let me
>>> know any good reason to keep the explicit dependency)?
>>>
>>> N
>>>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <
>>> nick.pentreath@gmail.com> wrote:
>>>
>>>> Yeah also the integration tests need to be specifically run - I would
>>>> have thought the contributor would have run those tests and also tested the
>>>> change themselves using live Kinesis :(
>>>>
>>>> —
>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <br...@gmail.com> wrote:
>>>>
>>>>> I don't think the Kinesis tests specifically ran when that was merged
>>>>> into 1.5.2 :(
>>>>> https://github.com/apache/spark/pull/8957
>>>>>
>>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>>>
>>>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>>>
>>>>> Burak
>>>>>
>>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>>>> nick.pentreath@gmail.com> wrote:
>>>>>
>>>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>>>> using.
>>>>>>
>>>>>> So theKCL version does seem like it could be the issue - somewhere
>>>>>> along the line an exception must be getting swallowed. Though the tests
>>>>>> should have picked this up? Will dig deeper.
>>>>>>
>>>>>> —
>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <
>>>>>> brianmlondon@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>>>> serious of an issue, although it would be nice to know what the root cause
>>>>>>> is to avoid a regression.
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <br...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've noticed this happening when there was some dependency
>>>>>>>> conflicts, and it is super hard to debug.
>>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>>>> I feel like that seems to be the problem...
>>>>>>>>
>>>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Burak
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>>>> brianmlondon@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>>>> there.  To reproduce:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>>>    -DskipTests clean package
>>>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>>>    4. Then run simultaneously:
>>>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>>>> jb@nanthrax.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Nick,
>>>>>>>>>>
>>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>>>> > Could you provide an example / test case and more detail on
>>>>>>>>>> what issue
>>>>>>>>>> > you're facing?
>>>>>>>>>> >
>>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>>>> stream and
>>>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>>>> 1.5.1 but
>>>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>>>> >
>>>>>>>>>> > UI for 1.5.2:
>>>>>>>>>> >
>>>>>>>>>> > Inline image 1
>>>>>>>>>> >
>>>>>>>>>> > UI for 1.5.1:
>>>>>>>>>> >
>>>>>>>>>> > Inline image 2
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>>>> brianmlondon@gmail.com
>>>>>>>>>> > <ma...@gmail.com>> wrote:
>>>>>>>>>> >
>>>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>>>> The
>>>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon
>>>>>>>>>> earlier in the
>>>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>>>> change the
>>>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>>>> before I
>>>>>>>>>> >     open a JIRA issue for it?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>> jbonofre@apache.org
>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Spark streaming with Kinesis broken?

Posted by Nick Pentreath <ni...@gmail.com>.
Is that PR against master branch?




S3 read comes from Hadoop / jet3t afaik



—
Sent from Mailbox

On Fri, Dec 11, 2015 at 5:38 PM, Brian London <br...@gmail.com>
wrote:

> That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
> KCL to 1.6.1 which I'm running tests on locally now.
> Is the AWS SDK not used for reading/writing from S3 or do we get that for
> free from the Hadoop dependencies?
> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <ni...@gmail.com>
> wrote:
>> cc'ing dev list
>>
>> Ok, looks like when the KCL version was updated in
>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>> probably leading to dependency conflict, though as Burak mentions its hard
>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>
>> Run starting. Expected test count is: 4
>> KinesisStreamSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - KinesisUtils API
>> - RDD generation
>> - basic operation *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 13
>> times over 2.047777 minutes. Last failure message: Set() did not equal
>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>> - failure recovery *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 63
>> times over 2.0286383166666666 minutes. Last failure message:
>> isCheckpointPresent was true, but 0 was not greater than 10.
>> (KinesisStreamSuite.scala:228)
>> Run completed in 5 minutes, 0 seconds.
>> Total number of tests run: 4
>> Suites: completed 1, aborted 0
>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>> *** 2 TESTS FAILED ***
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD FAILURE
>> [INFO]
>> ------------------------------------------------------------------------
>>
>>
>> KCL 1.3.0 depends on *1.9.37* SDK (
>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>
>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
>> 1.9.37 and everything works.
>>
>> Run starting. Expected test count is: 28
>> KinesisBackedBlockRDDSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - Basic reading from Kinesis
>> - Read data available in both block manager and Kinesis
>> - Read data available only in block manager, not in Kinesis
>> - Read data available only in Kinesis, not in block manager
>> - Read data available partially in block manager, rest in Kinesis
>> - Test isBlockValid skips block fetching from block manager
>> - Test whether RDD is valid after removing blocks from block anager
>> KinesisStreamSuite:
>> - KinesisUtils API
>> - RDD generation
>> - basic operation
>> - failure recovery
>> KinesisReceiverSuite:
>> - check serializability of SerializableAWSCredentials
>> - process records including store and checkpoint
>> - shouldn't store and checkpoint when receiver is stopped
>> - shouldn't checkpoint when exception occurs during store
>> - should set checkpoint time to currentTime + checkpoint interval upon
>> instantiation
>> - should checkpoint if we have exceeded the checkpoint interval
>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>> - should add to time when advancing checkpoint
>> - shutdown should checkpoint if the reason is TERMINATE
>> - shutdown should not checkpoint if the reason is something other than
>> TERMINATE
>> - retry success on first attempt
>> - retry success on second attempt after a Kinesis throttling exception
>> - retry success on second attempt after a Kinesis dependency exception
>> - retry failed after a shutdown exception
>> - retry failed after an invalid state exception
>> - retry failed after unexpected exception
>> - retry failed after exhausing all retries
>> Run completed in 3 minutes, 28 seconds.
>> Total number of tests run: 28
>> Suites: completed 4, aborted 0
>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>> All tests passed.
>>
>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
>> file a JIRA for this?
>>
>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>> used anywhere else (AFAIK it is not, but in case I missed something let me
>> know any good reason to keep the explicit dependency)?
>>
>> N
>>
>>
>>
>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>>> Yeah also the integration tests need to be specifically run - I would
>>> have thought the contributor would have run those tests and also tested the
>>> change themselves using live Kinesis :(
>>>
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <br...@gmail.com> wrote:
>>>
>>>> I don't think the Kinesis tests specifically ran when that was merged
>>>> into 1.5.2 :(
>>>> https://github.com/apache/spark/pull/8957
>>>>
>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>>
>>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>>
>>>> Burak
>>>>
>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>>> nick.pentreath@gmail.com> wrote:
>>>>
>>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>>> using.
>>>>>
>>>>> So theKCL version does seem like it could be the issue - somewhere
>>>>> along the line an exception must be getting swallowed. Though the tests
>>>>> should have picked this up? Will dig deeper.
>>>>>
>>>>> —
>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>
>>>>>
>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <br...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>>> serious of an issue, although it would be nice to know what the root cause
>>>>>> is to avoid a regression.
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <br...@gmail.com> wrote:
>>>>>>
>>>>>>> I've noticed this happening when there was some dependency conflicts,
>>>>>>> and it is super hard to debug.
>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>>> I feel like that seems to be the problem...
>>>>>>>
>>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Burak
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>>> brianmlondon@gmail.com> wrote:
>>>>>>>
>>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>>> there.  To reproduce:
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>>    -DskipTests clean package
>>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>>    4. Then run simultaneously:
>>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>>> jb@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Nick,
>>>>>>>>>
>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>>> > Could you provide an example / test case and more detail on what
>>>>>>>>> issue
>>>>>>>>> > you're facing?
>>>>>>>>> >
>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>>> stream and
>>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>>> 1.5.1 but
>>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.2:
>>>>>>>>> >
>>>>>>>>> > Inline image 1
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.1:
>>>>>>>>> >
>>>>>>>>> > Inline image 2
>>>>>>>>> >
>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>>> brianmlondon@gmail.com
>>>>>>>>> > <ma...@gmail.com>> wrote:
>>>>>>>>> >
>>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>>> The
>>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon earlier
>>>>>>>>> in the
>>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>>> change the
>>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>>> before I
>>>>>>>>> >     open a JIRA issue for it?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbonofre@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Re: Spark streaming with Kinesis broken?

Posted by Nick Pentreath <ni...@gmail.com>.
Is that PR against master branch?




S3 read comes from Hadoop / jet3t afaik



—
Sent from Mailbox

On Fri, Dec 11, 2015 at 5:38 PM, Brian London <br...@gmail.com>
wrote:

> That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
> KCL to 1.6.1 which I'm running tests on locally now.
> Is the AWS SDK not used for reading/writing from S3 or do we get that for
> free from the Hadoop dependencies?
> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <ni...@gmail.com>
> wrote:
>> cc'ing dev list
>>
>> Ok, looks like when the KCL version was updated in
>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>> probably leading to dependency conflict, though as Burak mentions its hard
>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>
>> Run starting. Expected test count is: 4
>> KinesisStreamSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - KinesisUtils API
>> - RDD generation
>> - basic operation *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 13
>> times over 2.047777 minutes. Last failure message: Set() did not equal
>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>> - failure recovery *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 63
>> times over 2.0286383166666666 minutes. Last failure message:
>> isCheckpointPresent was true, but 0 was not greater than 10.
>> (KinesisStreamSuite.scala:228)
>> Run completed in 5 minutes, 0 seconds.
>> Total number of tests run: 4
>> Suites: completed 1, aborted 0
>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>> *** 2 TESTS FAILED ***
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD FAILURE
>> [INFO]
>> ------------------------------------------------------------------------
>>
>>
>> KCL 1.3.0 depends on *1.9.37* SDK (
>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>
>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
>> 1.9.37 and everything works.
>>
>> Run starting. Expected test count is: 28
>> KinesisBackedBlockRDDSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - Basic reading from Kinesis
>> - Read data available in both block manager and Kinesis
>> - Read data available only in block manager, not in Kinesis
>> - Read data available only in Kinesis, not in block manager
>> - Read data available partially in block manager, rest in Kinesis
>> - Test isBlockValid skips block fetching from block manager
>> - Test whether RDD is valid after removing blocks from block anager
>> KinesisStreamSuite:
>> - KinesisUtils API
>> - RDD generation
>> - basic operation
>> - failure recovery
>> KinesisReceiverSuite:
>> - check serializability of SerializableAWSCredentials
>> - process records including store and checkpoint
>> - shouldn't store and checkpoint when receiver is stopped
>> - shouldn't checkpoint when exception occurs during store
>> - should set checkpoint time to currentTime + checkpoint interval upon
>> instantiation
>> - should checkpoint if we have exceeded the checkpoint interval
>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>> - should add to time when advancing checkpoint
>> - shutdown should checkpoint if the reason is TERMINATE
>> - shutdown should not checkpoint if the reason is something other than
>> TERMINATE
>> - retry success on first attempt
>> - retry success on second attempt after a Kinesis throttling exception
>> - retry success on second attempt after a Kinesis dependency exception
>> - retry failed after a shutdown exception
>> - retry failed after an invalid state exception
>> - retry failed after unexpected exception
>> - retry failed after exhausing all retries
>> Run completed in 3 minutes, 28 seconds.
>> Total number of tests run: 28
>> Suites: completed 4, aborted 0
>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>> All tests passed.
>>
>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
>> file a JIRA for this?
>>
>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>> used anywhere else (AFAIK it is not, but in case I missed something let me
>> know any good reason to keep the explicit dependency)?
>>
>> N
>>
>>
>>
>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <ni...@gmail.com>
>> wrote:
>>
>>> Yeah also the integration tests need to be specifically run - I would
>>> have thought the contributor would have run those tests and also tested the
>>> change themselves using live Kinesis :(
>>>
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <br...@gmail.com> wrote:
>>>
>>>> I don't think the Kinesis tests specifically ran when that was merged
>>>> into 1.5.2 :(
>>>> https://github.com/apache/spark/pull/8957
>>>>
>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>>
>>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>>
>>>> Burak
>>>>
>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>>> nick.pentreath@gmail.com> wrote:
>>>>
>>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>>> using.
>>>>>
>>>>> So theKCL version does seem like it could be the issue - somewhere
>>>>> along the line an exception must be getting swallowed. Though the tests
>>>>> should have picked this up? Will dig deeper.
>>>>>
>>>>> —
>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>
>>>>>
>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <br...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>>> serious of an issue, although it would be nice to know what the root cause
>>>>>> is to avoid a regression.
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <br...@gmail.com> wrote:
>>>>>>
>>>>>>> I've noticed this happening when there was some dependency conflicts,
>>>>>>> and it is super hard to debug.
>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>>> I feel like that seems to be the problem...
>>>>>>>
>>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Burak
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>>> brianmlondon@gmail.com> wrote:
>>>>>>>
>>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>>> there.  To reproduce:
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>>    -DskipTests clean package
>>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>>    4. Then run simultaneously:
>>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>>> jb@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Nick,
>>>>>>>>>
>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>>> > Could you provide an example / test case and more detail on what
>>>>>>>>> issue
>>>>>>>>> > you're facing?
>>>>>>>>> >
>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>>> stream and
>>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>>> 1.5.1 but
>>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.2:
>>>>>>>>> >
>>>>>>>>> > Inline image 1
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.1:
>>>>>>>>> >
>>>>>>>>> > Inline image 2
>>>>>>>>> >
>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>>> brianmlondon@gmail.com
>>>>>>>>> > <ma...@gmail.com>> wrote:
>>>>>>>>> >
>>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>>> The
>>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon earlier
>>>>>>>>> in the
>>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>>> change the
>>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>>> before I
>>>>>>>>> >     open a JIRA issue for it?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbonofre@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Re: Spark streaming with Kinesis broken?

Posted by Brian London <br...@gmail.com>.
That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
KCL to 1.6.1 which I'm running tests on locally now.

Is the AWS SDK not used for reading/writing from S3 or do we get that for
free from the Hadoop dependencies?

On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <ni...@gmail.com>
wrote:

> cc'ing dev list
>
> Ok, looks like when the KCL version was updated in
> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
> probably leading to dependency conflict, though as Burak mentions its hard
> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
> in driver or worker logs, so any exception is getting swallowed somewhere.
>
> Run starting. Expected test count is: 4
> KinesisStreamSuite:
> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
> Kinesis streams for tests.
> - KinesisUtils API
> - RDD generation
> - basic operation *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 13
> times over 2.047777 minutes. Last failure message: Set() did not equal
> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
> - failure recovery *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 63
> times over 2.0286383166666666 minutes. Last failure message:
> isCheckpointPresent was true, but 0 was not greater than 10.
> (KinesisStreamSuite.scala:228)
> Run completed in 5 minutes, 0 seconds.
> Total number of tests run: 4
> Suites: completed 1, aborted 0
> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
> *** 2 TESTS FAILED ***
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
>
>
> KCL 1.3.0 depends on *1.9.37* SDK (
> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
> while the Spark Kinesis dependency was kept at *1.9.16.*
>
> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
> 1.9.37 and everything works.
>
> Run starting. Expected test count is: 28
> KinesisBackedBlockRDDSuite:
> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
> Kinesis streams for tests.
> - Basic reading from Kinesis
> - Read data available in both block manager and Kinesis
> - Read data available only in block manager, not in Kinesis
> - Read data available only in Kinesis, not in block manager
> - Read data available partially in block manager, rest in Kinesis
> - Test isBlockValid skips block fetching from block manager
> - Test whether RDD is valid after removing blocks from block anager
> KinesisStreamSuite:
> - KinesisUtils API
> - RDD generation
> - basic operation
> - failure recovery
> KinesisReceiverSuite:
> - check serializability of SerializableAWSCredentials
> - process records including store and checkpoint
> - shouldn't store and checkpoint when receiver is stopped
> - shouldn't checkpoint when exception occurs during store
> - should set checkpoint time to currentTime + checkpoint interval upon
> instantiation
> - should checkpoint if we have exceeded the checkpoint interval
> - shouldn't checkpoint if we have not exceeded the checkpoint interval
> - should add to time when advancing checkpoint
> - shutdown should checkpoint if the reason is TERMINATE
> - shutdown should not checkpoint if the reason is something other than
> TERMINATE
> - retry success on first attempt
> - retry success on second attempt after a Kinesis throttling exception
> - retry success on second attempt after a Kinesis dependency exception
> - retry failed after a shutdown exception
> - retry failed after an invalid state exception
> - retry failed after unexpected exception
> - retry failed after exhausing all retries
> Run completed in 3 minutes, 28 seconds.
> Total number of tests run: 28
> Suites: completed 4, aborted 0
> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
>
> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
> file a JIRA for this?
>
> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
> SDKs are brought in, making things a bit leaner (this can be upgraded in
> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
> used anywhere else (AFAIK it is not, but in case I missed something let me
> know any good reason to keep the explicit dependency)?
>
> N
>
>
>
> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <ni...@gmail.com>
> wrote:
>
>> Yeah also the integration tests need to be specifically run - I would
>> have thought the contributor would have run those tests and also tested the
>> change themselves using live Kinesis :(
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <br...@gmail.com> wrote:
>>
>>> I don't think the Kinesis tests specifically ran when that was merged
>>> into 1.5.2 :(
>>> https://github.com/apache/spark/pull/8957
>>>
>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>
>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>
>>> Burak
>>>
>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>> nick.pentreath@gmail.com> wrote:
>>>
>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>> using.
>>>>
>>>> So theKCL version does seem like it could be the issue - somewhere
>>>> along the line an exception must be getting swallowed. Though the tests
>>>> should have picked this up? Will dig deeper.
>>>>
>>>> —
>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <br...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>> serious of an issue, although it would be nice to know what the root cause
>>>>> is to avoid a regression.
>>>>>
>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <br...@gmail.com> wrote:
>>>>>
>>>>>> I've noticed this happening when there was some dependency conflicts,
>>>>>> and it is super hard to debug.
>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>> I feel like that seems to be the problem...
>>>>>>
>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>
>>>>>> Thanks,
>>>>>> Burak
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>> brianmlondon@gmail.com> wrote:
>>>>>>
>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>> there.  To reproduce:
>>>>>>>
>>>>>>>
>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>    -DskipTests clean package
>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>    4. Then run simultaneously:
>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>> jb@nanthrax.net> wrote:
>>>>>>>
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Regards
>>>>>>>> JB
>>>>>>>>
>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>> > Could you provide an example / test case and more detail on what
>>>>>>>> issue
>>>>>>>> > you're facing?
>>>>>>>> >
>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>> stream and
>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>> 1.5.1 but
>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>> >
>>>>>>>> > UI for 1.5.2:
>>>>>>>> >
>>>>>>>> > Inline image 1
>>>>>>>> >
>>>>>>>> > UI for 1.5.1:
>>>>>>>> >
>>>>>>>> > Inline image 2
>>>>>>>> >
>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>> brianmlondon@gmail.com
>>>>>>>> > <ma...@gmail.com>> wrote:
>>>>>>>> >
>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>> The
>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon earlier
>>>>>>>> in the
>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>> change the
>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>> before I
>>>>>>>> >     open a JIRA issue for it?
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jean-Baptiste Onofré
>>>>>>>> jbonofre@apache.org
>>>>>>>> http://blog.nanthrax.net
>>>>>>>> Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>