You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joey Echeverria <jo...@rocana.com> on 2016/10/07 18:39:39 UTC

Map with state keys serialization

Looking at the source code for StateMap[1], which is used by
JavaPairDStream#mapWithState(), it looks like state keys are
serialized using an ObjectOutputStream. I couldn't find a reference to
this restriction in the documentation. Did I miss that?

Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
this serialization?

Thanks!

-Joey

[1] https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by Joey Echeverria <jo...@rocana.com>.
That fixed it!. I still had the serializer registered as a workaround
for SPARK-12591.

Thanks so much for your help Ryan!

-Joey

On Wed, Oct 12, 2016 at 2:16 PM, Shixiong(Ryan) Zhu
<sh...@databricks.com> wrote:
> Oh, OpenHashMapBasedStateMap is serialized using Kryo's
> "com.esotericsoftware.kryo.serializers.JavaSerializer". Did you set it for
> OpenHashMapBasedStateMap? You don't need to set anything for Spark's classes
> in 1.6.2.
>
>
> On Wed, Oct 12, 2016 at 7:11 AM, Joey Echeverria <jo...@rocana.com> wrote:
>>
>> I tried with 1.6.2 and saw the same behavior.
>>
>> -Joey
>>
>> On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu
>> <sh...@databricks.com> wrote:
>> > There are some known issues in 1.6.0, e.g.,
>> > https://issues.apache.org/jira/browse/SPARK-12591
>> >
>> > Could you try 1.6.1?
>> >
>> > On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria <jo...@rocana.com>
>> > wrote:
>> >>
>> >> I tried wrapping my Tuple class (which is generated by Avro) in a
>> >> class that implements Serializable, but now I'm getting a
>> >> ClassNotFoundException in my Spark application. The exception is
>> >> thrown while trying to deserialize checkpoint state:
>> >>
>> >> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf
>> >>
>> >> I set some flags[1] on the JVM and I can see the class get loaded in
>> >> the
>> >> logs.
>> >>
>> >> Does anyone have any suggestions/recommendations for debugging class
>> >> loading issues during checkpoint deserialization?
>> >>
>> >> I also looked into switching to byte[] for the state keys, but byte[]
>> >> doesn't implement value-based equals() or hashCode(). I can't use
>> >> ByteBuffer because it doesn't implement Serializable. Spark has a
>> >> SerializableBuffer class that wraps ByteBuffer, but it also doesn't
>> >> have value-based equals() or hashCode().
>> >>
>> >> -Joey
>> >>
>> >> [1] -verbose:class -Dsun.misc.URLClassPath.debug
>> >>
>> >> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria <jo...@rocana.com>
>> >> wrote:
>> >> > I do, I get the stack trace in this gist:
>> >> >
>> >> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1
>> >> >
>> >> > The class it references, com.rocana.data.Tuple, is registered with
>> >> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
>> >> > in a later release let me know.
>> >> >
>> >> > -Joey
>> >> >
>> >> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
>> >> > <sh...@databricks.com> wrote:
>> >> >> That's enough. Did you see any error?
>> >> >>
>> >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi Ryan!
>> >> >>>
>> >> >>> Do you know where I need to configure Kryo for this? I already have
>> >> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
>> >> >>> SparkConf and I registered the class. Is there a different
>> >> >>> configuration setting for the state map keys?
>> >> >>>
>> >> >>> Thanks!
>> >> >>>
>> >> >>> -Joey
>> >> >>>
>> >> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
>> >> >>> <sh...@databricks.com> wrote:
>> >> >>> > You can use Kryo. It also implements KryoSerializable which is
>> >> >>> > supported
>> >> >>> > by
>> >> >>> > Kryo.
>> >> >>> >
>> >> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria
>> >> >>> > <jo...@rocana.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Looking at the source code for StateMap[1], which is used by
>> >> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are
>> >> >>> >> serialized using an ObjectOutputStream. I couldn't find a
>> >> >>> >> reference
>> >> >>> >> to
>> >> >>> >> this restriction in the documentation. Did I miss that?
>> >> >>> >>
>> >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo
>> >> >>> >> for
>> >> >>> >> this serialization?
>> >> >>> >>
>> >> >>> >> Thanks!
>> >> >>> >>
>> >> >>> >> -Joey
>> >> >>> >>
>> >> >>> >> [1]
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ---------------------------------------------------------------------
>> >> >>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> -Joey
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > -Joey
>> >>
>> >>
>> >>
>> >> --
>> >> -Joey
>> >
>> >
>>
>>
>>
>> --
>> -Joey
>
>



-- 
-Joey

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
Oh, OpenHashMapBasedStateMap is serialized using Kryo's
"com.esotericsoftware.kryo.serializers.JavaSerializer". Did you set it for
OpenHashMapBasedStateMap? You don't need to set anything for Spark's
classes in 1.6.2.


On Wed, Oct 12, 2016 at 7:11 AM, Joey Echeverria <jo...@rocana.com> wrote:

> I tried with 1.6.2 and saw the same behavior.
>
> -Joey
>
> On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu
> <sh...@databricks.com> wrote:
> > There are some known issues in 1.6.0, e.g.,
> > https://issues.apache.org/jira/browse/SPARK-12591
> >
> > Could you try 1.6.1?
> >
> > On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria <jo...@rocana.com>
> wrote:
> >>
> >> I tried wrapping my Tuple class (which is generated by Avro) in a
> >> class that implements Serializable, but now I'm getting a
> >> ClassNotFoundException in my Spark application. The exception is
> >> thrown while trying to deserialize checkpoint state:
> >>
> >> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf
> >>
> >> I set some flags[1] on the JVM and I can see the class get loaded in the
> >> logs.
> >>
> >> Does anyone have any suggestions/recommendations for debugging class
> >> loading issues during checkpoint deserialization?
> >>
> >> I also looked into switching to byte[] for the state keys, but byte[]
> >> doesn't implement value-based equals() or hashCode(). I can't use
> >> ByteBuffer because it doesn't implement Serializable. Spark has a
> >> SerializableBuffer class that wraps ByteBuffer, but it also doesn't
> >> have value-based equals() or hashCode().
> >>
> >> -Joey
> >>
> >> [1] -verbose:class -Dsun.misc.URLClassPath.debug
> >>
> >> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria <jo...@rocana.com>
> wrote:
> >> > I do, I get the stack trace in this gist:
> >> >
> >> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1
> >> >
> >> > The class it references, com.rocana.data.Tuple, is registered with
> >> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
> >> > in a later release let me know.
> >> >
> >> > -Joey
> >> >
> >> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
> >> > <sh...@databricks.com> wrote:
> >> >> That's enough. Did you see any error?
> >> >>
> >> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Ryan!
> >> >>>
> >> >>> Do you know where I need to configure Kryo for this? I already have
> >> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
> >> >>> SparkConf and I registered the class. Is there a different
> >> >>> configuration setting for the state map keys?
> >> >>>
> >> >>> Thanks!
> >> >>>
> >> >>> -Joey
> >> >>>
> >> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
> >> >>> <sh...@databricks.com> wrote:
> >> >>> > You can use Kryo. It also implements KryoSerializable which is
> >> >>> > supported
> >> >>> > by
> >> >>> > Kryo.
> >> >>> >
> >> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <joey@rocana.com
> >
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Looking at the source code for StateMap[1], which is used by
> >> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are
> >> >>> >> serialized using an ObjectOutputStream. I couldn't find a
> reference
> >> >>> >> to
> >> >>> >> this restriction in the documentation. Did I miss that?
> >> >>> >>
> >> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo
> for
> >> >>> >> this serialization?
> >> >>> >>
> >> >>> >> Thanks!
> >> >>> >>
> >> >>> >> -Joey
> >> >>> >>
> >> >>> >> [1]
> >> >>> >>
> >> >>> >>
> >> >>> >> https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
> >> >>> >>
> >> >>> >>
> >> >>> >> ------------------------------------------------------------
> ---------
> >> >>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >> >>> >>
> >> >>> >
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> -Joey
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > -Joey
> >>
> >>
> >>
> >> --
> >> -Joey
> >
> >
>
>
>
> --
> -Joey
>

Re: Map with state keys serialization

Posted by Joey Echeverria <jo...@rocana.com>.
I tried with 1.6.2 and saw the same behavior.

-Joey

On Tue, Oct 11, 2016 at 5:18 PM, Shixiong(Ryan) Zhu
<sh...@databricks.com> wrote:
> There are some known issues in 1.6.0, e.g.,
> https://issues.apache.org/jira/browse/SPARK-12591
>
> Could you try 1.6.1?
>
> On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria <jo...@rocana.com> wrote:
>>
>> I tried wrapping my Tuple class (which is generated by Avro) in a
>> class that implements Serializable, but now I'm getting a
>> ClassNotFoundException in my Spark application. The exception is
>> thrown while trying to deserialize checkpoint state:
>>
>> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf
>>
>> I set some flags[1] on the JVM and I can see the class get loaded in the
>> logs.
>>
>> Does anyone have any suggestions/recommendations for debugging class
>> loading issues during checkpoint deserialization?
>>
>> I also looked into switching to byte[] for the state keys, but byte[]
>> doesn't implement value-based equals() or hashCode(). I can't use
>> ByteBuffer because it doesn't implement Serializable. Spark has a
>> SerializableBuffer class that wraps ByteBuffer, but it also doesn't
>> have value-based equals() or hashCode().
>>
>> -Joey
>>
>> [1] -verbose:class -Dsun.misc.URLClassPath.debug
>>
>> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria <jo...@rocana.com> wrote:
>> > I do, I get the stack trace in this gist:
>> >
>> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1
>> >
>> > The class it references, com.rocana.data.Tuple, is registered with
>> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
>> > in a later release let me know.
>> >
>> > -Joey
>> >
>> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
>> > <sh...@databricks.com> wrote:
>> >> That's enough. Did you see any error?
>> >>
>> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com>
>> >> wrote:
>> >>>
>> >>> Hi Ryan!
>> >>>
>> >>> Do you know where I need to configure Kryo for this? I already have
>> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
>> >>> SparkConf and I registered the class. Is there a different
>> >>> configuration setting for the state map keys?
>> >>>
>> >>> Thanks!
>> >>>
>> >>> -Joey
>> >>>
>> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
>> >>> <sh...@databricks.com> wrote:
>> >>> > You can use Kryo. It also implements KryoSerializable which is
>> >>> > supported
>> >>> > by
>> >>> > Kryo.
>> >>> >
>> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Looking at the source code for StateMap[1], which is used by
>> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are
>> >>> >> serialized using an ObjectOutputStream. I couldn't find a reference
>> >>> >> to
>> >>> >> this restriction in the documentation. Did I miss that?
>> >>> >>
>> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
>> >>> >> this serialization?
>> >>> >>
>> >>> >> Thanks!
>> >>> >>
>> >>> >> -Joey
>> >>> >>
>> >>> >> [1]
>> >>> >>
>> >>> >>
>> >>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>> >>> >>
>> >>> >>
>> >>> >> ---------------------------------------------------------------------
>> >>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> -Joey
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > -Joey
>>
>>
>>
>> --
>> -Joey
>
>



-- 
-Joey

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
There are some known issues in 1.6.0, e.g.,
https://issues.apache.org/jira/browse/SPARK-12591

Could you try 1.6.1?

On Tue, Oct 11, 2016 at 9:55 AM, Joey Echeverria <jo...@rocana.com> wrote:

> I tried wrapping my Tuple class (which is generated by Avro) in a
> class that implements Serializable, but now I'm getting a
> ClassNotFoundException in my Spark application. The exception is
> thrown while trying to deserialize checkpoint state:
>
> https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf
>
> I set some flags[1] on the JVM and I can see the class get loaded in the
> logs.
>
> Does anyone have any suggestions/recommendations for debugging class
> loading issues during checkpoint deserialization?
>
> I also looked into switching to byte[] for the state keys, but byte[]
> doesn't implement value-based equals() or hashCode(). I can't use
> ByteBuffer because it doesn't implement Serializable. Spark has a
> SerializableBuffer class that wraps ByteBuffer, but it also doesn't
> have value-based equals() or hashCode().
>
> -Joey
>
> [1] -verbose:class -Dsun.misc.URLClassPath.debug
>
> On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria <jo...@rocana.com> wrote:
> > I do, I get the stack trace in this gist:
> >
> > https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1
> >
> > The class it references, com.rocana.data.Tuple, is registered with
> > Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
> > in a later release let me know.
> >
> > -Joey
> >
> > On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
> > <sh...@databricks.com> wrote:
> >> That's enough. Did you see any error?
> >>
> >> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com>
> wrote:
> >>>
> >>> Hi Ryan!
> >>>
> >>> Do you know where I need to configure Kryo for this? I already have
> >>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
> >>> SparkConf and I registered the class. Is there a different
> >>> configuration setting for the state map keys?
> >>>
> >>> Thanks!
> >>>
> >>> -Joey
> >>>
> >>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
> >>> <sh...@databricks.com> wrote:
> >>> > You can use Kryo. It also implements KryoSerializable which is
> supported
> >>> > by
> >>> > Kryo.
> >>> >
> >>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com>
> >>> > wrote:
> >>> >>
> >>> >> Looking at the source code for StateMap[1], which is used by
> >>> >> JavaPairDStream#mapWithState(), it looks like state keys are
> >>> >> serialized using an ObjectOutputStream. I couldn't find a reference
> to
> >>> >> this restriction in the documentation. Did I miss that?
> >>> >>
> >>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
> >>> >> this serialization?
> >>> >>
> >>> >> Thanks!
> >>> >>
> >>> >> -Joey
> >>> >>
> >>> >> [1]
> >>> >>
> >>> >> https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
> >>> >>
> >>> >> ------------------------------------------------------------
> ---------
> >>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>> >>
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> -Joey
> >>
> >>
> >
> >
> >
> > --
> > -Joey
>
>
>
> --
> -Joey
>

Re: Map with state keys serialization

Posted by Joey Echeverria <jo...@rocana.com>.
I tried wrapping my Tuple class (which is generated by Avro) in a
class that implements Serializable, but now I'm getting a
ClassNotFoundException in my Spark application. The exception is
thrown while trying to deserialize checkpoint state:

https://gist.github.com/joey/7b374a2d483e25f15e20c0c4cb81b5cf

I set some flags[1] on the JVM and I can see the class get loaded in the logs.

Does anyone have any suggestions/recommendations for debugging class
loading issues during checkpoint deserialization?

I also looked into switching to byte[] for the state keys, but byte[]
doesn't implement value-based equals() or hashCode(). I can't use
ByteBuffer because it doesn't implement Serializable. Spark has a
SerializableBuffer class that wraps ByteBuffer, but it also doesn't
have value-based equals() or hashCode().

-Joey

[1] -verbose:class -Dsun.misc.URLClassPath.debug

On Mon, Oct 10, 2016 at 11:28 AM, Joey Echeverria <jo...@rocana.com> wrote:
> I do, I get the stack trace in this gist:
>
> https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1
>
> The class it references, com.rocana.data.Tuple, is registered with
> Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
> in a later release let me know.
>
> -Joey
>
> On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
> <sh...@databricks.com> wrote:
>> That's enough. Did you see any error?
>>
>> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com> wrote:
>>>
>>> Hi Ryan!
>>>
>>> Do you know where I need to configure Kryo for this? I already have
>>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
>>> SparkConf and I registered the class. Is there a different
>>> configuration setting for the state map keys?
>>>
>>> Thanks!
>>>
>>> -Joey
>>>
>>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
>>> <sh...@databricks.com> wrote:
>>> > You can use Kryo. It also implements KryoSerializable which is supported
>>> > by
>>> > Kryo.
>>> >
>>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com>
>>> > wrote:
>>> >>
>>> >> Looking at the source code for StateMap[1], which is used by
>>> >> JavaPairDStream#mapWithState(), it looks like state keys are
>>> >> serialized using an ObjectOutputStream. I couldn't find a reference to
>>> >> this restriction in the documentation. Did I miss that?
>>> >>
>>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
>>> >> this serialization?
>>> >>
>>> >> Thanks!
>>> >>
>>> >> -Joey
>>> >>
>>> >> [1]
>>> >>
>>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>> >>
>>> >
>>>
>>>
>>>
>>> --
>>> -Joey
>>
>>
>
>
>
> --
> -Joey



-- 
-Joey

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by Joey Echeverria <jo...@rocana.com>.
I do, I get the stack trace in this gist:

https://gist.github.com/joey/d3bf040af31e854b3be374e2c016d7e1

The class it references, com.rocana.data.Tuple, is registered with
Kryo. Also, this is with 1.6.0 so if this behavior changed/got fixed
in a later release let me know.

-Joey

On Mon, Oct 10, 2016 at 9:54 AM, Shixiong(Ryan) Zhu
<sh...@databricks.com> wrote:
> That's enough. Did you see any error?
>
> On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com> wrote:
>>
>> Hi Ryan!
>>
>> Do you know where I need to configure Kryo for this? I already have
>> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
>> SparkConf and I registered the class. Is there a different
>> configuration setting for the state map keys?
>>
>> Thanks!
>>
>> -Joey
>>
>> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
>> <sh...@databricks.com> wrote:
>> > You can use Kryo. It also implements KryoSerializable which is supported
>> > by
>> > Kryo.
>> >
>> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com>
>> > wrote:
>> >>
>> >> Looking at the source code for StateMap[1], which is used by
>> >> JavaPairDStream#mapWithState(), it looks like state keys are
>> >> serialized using an ObjectOutputStream. I couldn't find a reference to
>> >> this restriction in the documentation. Did I miss that?
>> >>
>> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
>> >> this serialization?
>> >>
>> >> Thanks!
>> >>
>> >> -Joey
>> >>
>> >> [1]
>> >>
>> >> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >>
>> >
>>
>>
>>
>> --
>> -Joey
>
>



-- 
-Joey

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
That's enough. Did you see any error?

On Mon, Oct 10, 2016 at 5:08 AM, Joey Echeverria <jo...@rocana.com> wrote:

> Hi Ryan!
>
> Do you know where I need to configure Kryo for this? I already have
> spark.serializer=org.apache.spark.serializer.KryoSerializer in my
> SparkConf and I registered the class. Is there a different
> configuration setting for the state map keys?
>
> Thanks!
>
> -Joey
>
> On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
> <sh...@databricks.com> wrote:
> > You can use Kryo. It also implements KryoSerializable which is supported
> by
> > Kryo.
> >
> > On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com>
> wrote:
> >>
> >> Looking at the source code for StateMap[1], which is used by
> >> JavaPairDStream#mapWithState(), it looks like state keys are
> >> serialized using an ObjectOutputStream. I couldn't find a reference to
> >> this restriction in the documentation. Did I miss that?
> >>
> >> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
> >> this serialization?
> >>
> >> Thanks!
> >>
> >> -Joey
> >>
> >> [1]
> >> https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >>
> >
>
>
>
> --
> -Joey
>

Re: Map with state keys serialization

Posted by Joey Echeverria <jo...@rocana.com>.
Hi Ryan!

Do you know where I need to configure Kryo for this? I already have
spark.serializer=org.apache.spark.serializer.KryoSerializer in my
SparkConf and I registered the class. Is there a different
configuration setting for the state map keys?

Thanks!

-Joey

On Sun, Oct 9, 2016 at 10:58 PM, Shixiong(Ryan) Zhu
<sh...@databricks.com> wrote:
> You can use Kryo. It also implements KryoSerializable which is supported by
> Kryo.
>
> On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com> wrote:
>>
>> Looking at the source code for StateMap[1], which is used by
>> JavaPairDStream#mapWithState(), it looks like state keys are
>> serialized using an ObjectOutputStream. I couldn't find a reference to
>> this restriction in the documentation. Did I miss that?
>>
>> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
>> this serialization?
>>
>> Thanks!
>>
>> -Joey
>>
>> [1]
>> https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>



-- 
-Joey

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Map with state keys serialization

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
You can use Kryo. It also implements KryoSerializable which is supported by
Kryo.

On Fri, Oct 7, 2016 at 11:39 AM, Joey Echeverria <jo...@rocana.com> wrote:

> Looking at the source code for StateMap[1], which is used by
> JavaPairDStream#mapWithState(), it looks like state keys are
> serialized using an ObjectOutputStream. I couldn't find a reference to
> this restriction in the documentation. Did I miss that?
>
> Unless I'm mistaken, I'm guessing there isn't a way to use Kryo for
> this serialization?
>
> Thanks!
>
> -Joey
>
> [1] https://github.com/apache/spark/blob/master/streaming/
> src/main/scala/org/apache/spark/streaming/util/StateMap.scala#L251
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>