You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Asim Jalis <as...@gmail.com> on 2015/08/14 22:04:30 UTC

QueueStream Does Not Support Checkpointing

I want to test some Spark Streaming code that is using
reduceByKeyAndWindow. If I do not enable checkpointing, I get the error:

java.lang.IllegalArgumentException: requirement failed: The checkpoint
> directory has not been set. Please set it by StreamingContext.checkpoint().


But if I enable checkpointing I get

queueStream doesn't support checkpointing


Is there a workaround for this?

My goal is to test that the windowing logic in my code is correct. Is there
a way to disable these strict checks or a different dstream I can use that
I can populate programmatically and then use for testing?

Thanks.

Asim

Re: QueueStream Does Not Support Checkpointing

Posted by Asim Jalis <as...@gmail.com>.
Another fix might be to remove the exception that is thrown when windowing
and other stateful operations are used without checkpointing.

On Fri, Aug 14, 2015 at 5:43 PM, Asim Jalis <as...@gmail.com> wrote:

> I feel the real fix here is to remove the exception from QueueInputDStream
> class by reverting the fix of
> https://issues.apache.org/jira/browse/SPARK-8630
>
> I can write another class that is identical to the QueueInputDStream class
> except it does not throw the exception. But this feels like a convoluted
> solution.
>
> Throwing exceptions to forbid behavior in code is risky because it can
> easily break legitimate uses of a class.
>
> Is there a way to reopen https://issues.apache.org/jira/browse/SPARK-8630.
> I have added a comment to it, but I am not sure if that will have that
> effect.
>
> Thanks.
>
> Asim
>
> On Fri, Aug 14, 2015 at 4:03 PM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> I just pushed some code that does this for spark-testing-base (
>> https://github.com/holdenk/spark-testing-base )  (its in master) and
>> will publish an updated artifact with it for tonight.
>>
>> On Fri, Aug 14, 2015 at 3:35 PM, Tathagata Das <td...@databricks.com>
>> wrote:
>>
>>> A hacky workaround is to create a customer InputDStream that creates the
>>> right RDDs based on a function. The TestInputDStream
>>> <https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala#L61>
>>> does something similar for Spark Streaming unit tests.
>>>
>>> TD
>>>
>>> On Fri, Aug 14, 2015 at 1:04 PM, Asim Jalis <as...@gmail.com> wrote:
>>>
>>>> I want to test some Spark Streaming code that is using
>>>> reduceByKeyAndWindow. If I do not enable checkpointing, I get the error:
>>>>
>>>> java.lang.IllegalArgumentException: requirement failed: The checkpoint
>>>>> directory has not been set. Please set it by StreamingContext.checkpoint().
>>>>
>>>>
>>>> But if I enable checkpointing I get
>>>>
>>>> queueStream doesn't support checkpointing
>>>>
>>>>
>>>> Is there a workaround for this?
>>>>
>>>> My goal is to test that the windowing logic in my code is correct. Is
>>>> there a way to disable these strict checks or a different dstream I can use
>>>> that I can populate programmatically and then use for testing?
>>>>
>>>> Thanks.
>>>>
>>>> Asim
>>>>
>>>>
>>>
>>
>>
>> --
>> Cell : 425-233-8271
>> Twitter: https://twitter.com/holdenkarau
>> Linked In: https://www.linkedin.com/in/holdenkarau
>>
>
>

Re: QueueStream Does Not Support Checkpointing

Posted by Asim Jalis <as...@gmail.com>.
I feel the real fix here is to remove the exception from QueueInputDStream
class by reverting the fix of
https://issues.apache.org/jira/browse/SPARK-8630

I can write another class that is identical to the QueueInputDStream class
except it does not throw the exception. But this feels like a convoluted
solution.

Throwing exceptions to forbid behavior in code is risky because it can
easily break legitimate uses of a class.

Is there a way to reopen https://issues.apache.org/jira/browse/SPARK-8630.
I have added a comment to it, but I am not sure if that will have that
effect.

Thanks.

Asim

On Fri, Aug 14, 2015 at 4:03 PM, Holden Karau <ho...@pigscanfly.ca> wrote:

> I just pushed some code that does this for spark-testing-base (
> https://github.com/holdenk/spark-testing-base )  (its in master) and will
> publish an updated artifact with it for tonight.
>
> On Fri, Aug 14, 2015 at 3:35 PM, Tathagata Das <td...@databricks.com>
> wrote:
>
>> A hacky workaround is to create a customer InputDStream that creates the
>> right RDDs based on a function. The TestInputDStream
>> <https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala#L61>
>> does something similar for Spark Streaming unit tests.
>>
>> TD
>>
>> On Fri, Aug 14, 2015 at 1:04 PM, Asim Jalis <as...@gmail.com> wrote:
>>
>>> I want to test some Spark Streaming code that is using
>>> reduceByKeyAndWindow. If I do not enable checkpointing, I get the error:
>>>
>>> java.lang.IllegalArgumentException: requirement failed: The checkpoint
>>>> directory has not been set. Please set it by StreamingContext.checkpoint().
>>>
>>>
>>> But if I enable checkpointing I get
>>>
>>> queueStream doesn't support checkpointing
>>>
>>>
>>> Is there a workaround for this?
>>>
>>> My goal is to test that the windowing logic in my code is correct. Is
>>> there a way to disable these strict checks or a different dstream I can use
>>> that I can populate programmatically and then use for testing?
>>>
>>> Thanks.
>>>
>>> Asim
>>>
>>>
>>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
>

Re: QueueStream Does Not Support Checkpointing

Posted by Holden Karau <ho...@pigscanfly.ca>.
I just pushed some code that does this for spark-testing-base (
https://github.com/holdenk/spark-testing-base )  (its in master) and will
publish an updated artifact with it for tonight.

On Fri, Aug 14, 2015 at 3:35 PM, Tathagata Das <td...@databricks.com> wrote:

> A hacky workaround is to create a customer InputDStream that creates the
> right RDDs based on a function. The TestInputDStream
> <https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala#L61>
> does something similar for Spark Streaming unit tests.
>
> TD
>
> On Fri, Aug 14, 2015 at 1:04 PM, Asim Jalis <as...@gmail.com> wrote:
>
>> I want to test some Spark Streaming code that is using
>> reduceByKeyAndWindow. If I do not enable checkpointing, I get the error:
>>
>> java.lang.IllegalArgumentException: requirement failed: The checkpoint
>>> directory has not been set. Please set it by StreamingContext.checkpoint().
>>
>>
>> But if I enable checkpointing I get
>>
>> queueStream doesn't support checkpointing
>>
>>
>> Is there a workaround for this?
>>
>> My goal is to test that the windowing logic in my code is correct. Is
>> there a way to disable these strict checks or a different dstream I can use
>> that I can populate programmatically and then use for testing?
>>
>> Thanks.
>>
>> Asim
>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
Linked In: https://www.linkedin.com/in/holdenkarau

Re: QueueStream Does Not Support Checkpointing

Posted by Tathagata Das <td...@databricks.com>.
A hacky workaround is to create a customer InputDStream that creates the
right RDDs based on a function. The TestInputDStream
<https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala#L61>
does something similar for Spark Streaming unit tests.

TD

On Fri, Aug 14, 2015 at 1:04 PM, Asim Jalis <as...@gmail.com> wrote:

> I want to test some Spark Streaming code that is using
> reduceByKeyAndWindow. If I do not enable checkpointing, I get the error:
>
> java.lang.IllegalArgumentException: requirement failed: The checkpoint
>> directory has not been set. Please set it by StreamingContext.checkpoint().
>
>
> But if I enable checkpointing I get
>
> queueStream doesn't support checkpointing
>
>
> Is there a workaround for this?
>
> My goal is to test that the windowing logic in my code is correct. Is
> there a way to disable these strict checks or a different dstream I can use
> that I can populate programmatically and then use for testing?
>
> Thanks.
>
> Asim
>
>