You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aris <ar...@gmail.com> on 2016/09/01 20:58:26 UTC

Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

Hello Spark community -

Does Spark 2.0 Datasets *not support* Scala Value classes (basically
"extends AnyVal" with a bunch of limitations) ?

I am trying to do something like this:

case class FeatureId(value: Int) extends AnyVal
val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
import spark.implicits._
val ds = spark.createDataset(seq)
ds.count


This will compile, but then it will break at runtime with a cryptic error
about "cannot find int at value". If I remove the "extends AnyVal" part,
then everything works.

Value classes are a great performance boost / static type checking feature
in Scala, but are they prohibited in Spark Datasets?

Thanks!

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

Posted by Jakob Odersky <ja...@odersky.com>.
I'm not sure how the shepherd thing works, but just FYI Michael
Armbrust originally wrote Catalyst, the engine behind Datasets.

You can find a list of all committers here
https://cwiki.apache.org/confluence/display/SPARK/Committers. Another
good resource is to check https://spark-prs.appspot.com/ and filter
issues by components. People frequently involved in comments on a
specific component will likely be quite knowledgeable in that area.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

Posted by Aris <ar...@gmail.com>.
On a more serious note -- yes, Datasets breaks with  Scala value classes in
Spark 2.0.0 and Spark 1.6.1. I wrote up a JIRA bug and I hope some more
knowledgable people can look at it.

Sean Own has commented on other code generation errors before I put him as
shepherd in JIRA. Michael Armbrust has expressed interest in code
generation / encoder problems I have found recently.

Here is the problem:

https://issues.apache.org/jira/browse/SPARK-17368

Thank you



On Thu, Sep 1, 2016 at 3:09 PM, Aris <ar...@gmail.com> wrote:

> Thank you Jakob on two counts
>
> 1. Yes, thanks for pointing out that spark-shell cannot take value
> classes, that was an additional confusion to me!
>
> 2. We have a Spark 2.0 project which is definitely breaking at runtime
> with a Dataset of value classes. I am not sure if this is also the case in
> Spark 1.6, I'm going to verify.
>
> Once I can make a trivial example with value classes breaking I will open
> a JIRA for Spark.
>
> Is Martin Odersky your father? Does this mean Scala is your sister? And
> does that mean Spark is your cousin? ;-)
>
>
> On Thu, Sep 1, 2016 at 2:44 PM, Jakob Odersky <ja...@odersky.com> wrote:
>
>> Hi Aris,
>> thanks for sharing this issue. I can confirm that value classes
>> currently don't work, however I can't think of reason why they
>> shouldn't be supported. I would therefore recommend that you report
>> this as a bug.
>>
>> (Btw, value classes also currently aren't definable in the REPL. See
>> https://issues.apache.org/jira/browse/SPARK-17367)
>>
>> regards,
>> --Jakob
>>
>> On Thu, Sep 1, 2016 at 1:58 PM, Aris <ar...@gmail.com> wrote:
>> > Hello Spark community -
>> >
>> > Does Spark 2.0 Datasets *not support* Scala Value classes (basically
>> > "extends AnyVal" with a bunch of limitations) ?
>> >
>> > I am trying to do something like this:
>> >
>> > case class FeatureId(value: Int) extends AnyVal
>> > val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
>> > import spark.implicits._
>> > val ds = spark.createDataset(seq)
>> > ds.count
>> >
>> >
>> > This will compile, but then it will break at runtime with a cryptic
>> error
>> > about "cannot find int at value". If I remove the "extends AnyVal" part,
>> > then everything works.
>> >
>> > Value classes are a great performance boost / static type checking
>> feature
>> > in Scala, but are they prohibited in Spark Datasets?
>> >
>> > Thanks!
>> >
>>
>
>

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

Posted by Aris <ar...@gmail.com>.
Thank you Jakob on two counts

1. Yes, thanks for pointing out that spark-shell cannot take value classes,
that was an additional confusion to me!

2. We have a Spark 2.0 project which is definitely breaking at runtime with
a Dataset of value classes. I am not sure if this is also the case in Spark
1.6, I'm going to verify.

Once I can make a trivial example with value classes breaking I will open a
JIRA for Spark.

Is Martin Odersky your father? Does this mean Scala is your sister? And
does that mean Spark is your cousin? ;-)


On Thu, Sep 1, 2016 at 2:44 PM, Jakob Odersky <ja...@odersky.com> wrote:

> Hi Aris,
> thanks for sharing this issue. I can confirm that value classes
> currently don't work, however I can't think of reason why they
> shouldn't be supported. I would therefore recommend that you report
> this as a bug.
>
> (Btw, value classes also currently aren't definable in the REPL. See
> https://issues.apache.org/jira/browse/SPARK-17367)
>
> regards,
> --Jakob
>
> On Thu, Sep 1, 2016 at 1:58 PM, Aris <ar...@gmail.com> wrote:
> > Hello Spark community -
> >
> > Does Spark 2.0 Datasets *not support* Scala Value classes (basically
> > "extends AnyVal" with a bunch of limitations) ?
> >
> > I am trying to do something like this:
> >
> > case class FeatureId(value: Int) extends AnyVal
> > val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
> > import spark.implicits._
> > val ds = spark.createDataset(seq)
> > ds.count
> >
> >
> > This will compile, but then it will break at runtime with a cryptic error
> > about "cannot find int at value". If I remove the "extends AnyVal" part,
> > then everything works.
> >
> > Value classes are a great performance boost / static type checking
> feature
> > in Scala, but are they prohibited in Spark Datasets?
> >
> > Thanks!
> >
>

Re: Possible Code Generation Bug: Can Spark 2.0 Datasets handle Scala Value Classes?

Posted by Jakob Odersky <ja...@odersky.com>.
Hi Aris,
thanks for sharing this issue. I can confirm that value classes
currently don't work, however I can't think of reason why they
shouldn't be supported. I would therefore recommend that you report
this as a bug.

(Btw, value classes also currently aren't definable in the REPL. See
https://issues.apache.org/jira/browse/SPARK-17367)

regards,
--Jakob

On Thu, Sep 1, 2016 at 1:58 PM, Aris <ar...@gmail.com> wrote:
> Hello Spark community -
>
> Does Spark 2.0 Datasets *not support* Scala Value classes (basically
> "extends AnyVal" with a bunch of limitations) ?
>
> I am trying to do something like this:
>
> case class FeatureId(value: Int) extends AnyVal
> val seq = Seq(FeatureId(1),FeatureId(2),FeatureId(3))
> import spark.implicits._
> val ds = spark.createDataset(seq)
> ds.count
>
>
> This will compile, but then it will break at runtime with a cryptic error
> about "cannot find int at value". If I remove the "extends AnyVal" part,
> then everything works.
>
> Value classes are a great performance boost / static type checking feature
> in Scala, but are they prohibited in Spark Datasets?
>
> Thanks!
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org