You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Xiangrui Meng <me...@gmail.com> on 2015/03/05 02:11:01 UTC

enum-like types in Spark

Hi all,

There are many places where we use enum-like types in Spark, but in
different ways. Every approach has both pros and cons. I wonder
whether there should be an “official” approach for enum-like types in
Spark.

1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)

* All types show up as Enumeration.Value in Java.
http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html

2. Java’s Enum (e.g., SaveMode, IOMode)

* Implementation must be in a Java file.
* Values doesn’t show up in the ScalaDoc:
http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode

3. Static fields in Java (e.g., TripletFields)

* Implementation must be in a Java file.
* Doesn’t need “()” in Java code.
* Values don't show up in the ScalaDoc:
http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields

4. Objects in Scala. (e.g., StorageLevel)

* Needs “()” in Java code.
* Values show up in both ScalaDoc and JavaDoc:
  http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
  http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html

It would be great if we have an “official” approach for this as well
as the naming convention for enum-like values (“MEMORY_ONLY” or
“MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts?

Best,
Xiangrui

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Patrick Wendell <pw...@gmail.com>.

Yes - only new or internal API's. I doubt we'd break any exposed APIs for
the purpose of clean up.

Patrick
On Mar 5, 2015 12:16 AM, "Mridul Muralidharan" <mr...@gmail.com> wrote:

> While I dont have any strong opinions about how we handle enum's
> either way in spark, I assume the discussion is targetted at (new) api
> being designed in spark.
> Rewiring what we already have exposed will lead to incompatible api
> change (StorageLevel for example, is in 1.0).
>
> Regards,
> Mridul
>
> On Wed, Mar 4, 2015 at 11:45 PM, Aaron Davidson <il...@gmail.com>
> wrote:
> > That's kinda annoying, but it's just a little extra boilerplate. Can you
> > call it as StorageLevel.DiskOnly() from Java? Would it also work if they
> > were case classes with empty constructors, without the field?
> >
> > On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng <me...@gmail.com> wrote:
> >
> >> `case object` inside an `object` doesn't show up in Java. This is the
> >> minimal code I found to make everything show up correctly in both
> >> Scala and Java:
> >>
> >> sealed abstract class StorageLevel // cannot be a trait
> >>
> >> object StorageLevel {
> >>   private[this] case object _MemoryOnly extends StorageLevel
> >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> >>
> >>   private[this] case object _DiskOnly extends StorageLevel
> >>   final val DiskOnly: StorageLevel = _DiskOnly
> >> }
> >>
> >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >> > I like #4 as well and agree with Aaron's suggestion.
> >> >
> >> > - Patrick
> >> >
> >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> >> wrote:
> >> >> I'm cool with #4 as well, but make sure we dictate that the values
> >> should
> >> >> be defined within an object with the same name as the enumeration
> (like
> >> we
> >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >> >>
> >> >> e.g. we SHOULD do:
> >> >>
> >> >> trait StorageLevel
> >> >> object StorageLevel {
> >> >>   case object MemoryOnly extends StorageLevel
> >> >>   case object DiskOnly extends StorageLevel
> >> >> }
> >> >>
> >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >> michael@databricks.com>
> >> >> wrote:
> >> >>
> >> >>> #4 with a preference for CamelCaseEnums
> >> >>>
> >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> joseph@databricks.com>
> >> >>> wrote:
> >> >>>
> >> >>> > another vote for #4
> >> >>> > People are already used to adding "()" in Java.
> >> >>> >
> >> >>> >
> >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <javadba@gmail.com
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > > #4 but with MemoryOnly (more scala-like)
> >> >>> > >
> >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >> >>> > >
> >> >>> > > Constants, Values, Variable and Methods
> >> >>> > >
> >> >>> > > Constant names should be in upper camel case. That is, if the
> >> member is
> >> >>> > > final, immutable and it belongs to a package object or an
> object,
> >> it
> >> >>> may
> >> >>> > be
> >> >>> > > considered a constant (similar to Java'sstatic final members):
> >> >>> > >
> >> >>> > >
> >> >>> > >    1. object Container {
> >> >>> > >    2.     val MyConstant = ...
> >> >>> > >    3. }
> >> >>> > >
> >> >>> > >
> >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >> >>> > >
> >> >>> > > > Hi all,
> >> >>> > > >
> >> >>> > > > There are many places where we use enum-like types in Spark,
> but
> >> in
> >> >>> > > > different ways. Every approach has both pros and cons. I
> wonder
> >> >>> > > > whether there should be an "official" approach for enum-like
> >> types in
> >> >>> > > > Spark.
> >> >>> > > >
> >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
> etc)
> >> >>> > > >
> >> >>> > > > * All types show up as Enumeration.Value in Java.
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >> >>> > > >
> >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Values doesn't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >> >>> > > >
> >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Doesn't need "()" in Java code.
> >> >>> > > > * Values don't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >> >>> > > >
> >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >> >>> > > >
> >> >>> > > > * Needs "()" in Java code.
> >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >> >>> > > >
> >> >>> > > > It would be great if we have an "official" approach for this
> as
> >> well
> >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY"
> or
> >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >> >>> thoughts?
> >> >>> > > >
> >> >>> > > > Best,
> >> >>> > > > Xiangrui
> >> >>> > > >
> >> >>> > > >
> >> ---------------------------------------------------------------------
> >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
>

Re: enum-like types in Spark

Posted by Mridul Muralidharan <mr...@gmail.com>.

While I dont have any strong opinions about how we handle enum's
either way in spark, I assume the discussion is targetted at (new) api
being designed in spark.
Rewiring what we already have exposed will lead to incompatible api
change (StorageLevel for example, is in 1.0).

Regards,
Mridul

On Wed, Mar 4, 2015 at 11:45 PM, Aaron Davidson <il...@gmail.com> wrote:
> That's kinda annoying, but it's just a little extra boilerplate. Can you
> call it as StorageLevel.DiskOnly() from Java? Would it also work if they
> were case classes with empty constructors, without the field?
>
> On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> `case object` inside an `object` doesn't show up in Java. This is the
>> minimal code I found to make everything show up correctly in both
>> Scala and Java:
>>
>> sealed abstract class StorageLevel // cannot be a trait
>>
>> object StorageLevel {
>>   private[this] case object _MemoryOnly extends StorageLevel
>>   final val MemoryOnly: StorageLevel = _MemoryOnly
>>
>>   private[this] case object _DiskOnly extends StorageLevel
>>   final val DiskOnly: StorageLevel = _DiskOnly
>> }
>>
>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> > I like #4 as well and agree with Aaron's suggestion.
>> >
>> > - Patrick
>> >
>> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>> wrote:
>> >> I'm cool with #4 as well, but make sure we dictate that the values
>> should
>> >> be defined within an object with the same name as the enumeration (like
>> we
>> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>> >>
>> >> e.g. we SHOULD do:
>> >>
>> >> trait StorageLevel
>> >> object StorageLevel {
>> >>   case object MemoryOnly extends StorageLevel
>> >>   case object DiskOnly extends StorageLevel
>> >> }
>> >>
>> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> michael@databricks.com>
>> >> wrote:
>> >>
>> >>> #4 with a preference for CamelCaseEnums
>> >>>
>> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>> >>> wrote:
>> >>>
>> >>> > another vote for #4
>> >>> > People are already used to adding "()" in Java.
>> >>> >
>> >>> >
>> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > > #4 but with MemoryOnly (more scala-like)
>> >>> > >
>> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> >>> > >
>> >>> > > Constants, Values, Variable and Methods
>> >>> > >
>> >>> > > Constant names should be in upper camel case. That is, if the
>> member is
>> >>> > > final, immutable and it belongs to a package object or an object,
>> it
>> >>> may
>> >>> > be
>> >>> > > considered a constant (similar to Java'sstatic final members):
>> >>> > >
>> >>> > >
>> >>> > >    1. object Container {
>> >>> > >    2.     val MyConstant = ...
>> >>> > >    3. }
>> >>> > >
>> >>> > >
>> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>> >>> > >
>> >>> > > > Hi all,
>> >>> > > >
>> >>> > > > There are many places where we use enum-like types in Spark, but
>> in
>> >>> > > > different ways. Every approach has both pros and cons. I wonder
>> >>> > > > whether there should be an "official" approach for enum-like
>> types in
>> >>> > > > Spark.
>> >>> > > >
>> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>> >>> > > >
>> >>> > > > * All types show up as Enumeration.Value in Java.
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> >>> > > >
>> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Values doesn't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> >>> > > >
>> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Doesn't need "()" in Java code.
>> >>> > > > * Values don't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> >>> > > >
>> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> >>> > > >
>> >>> > > > * Needs "()" in Java code.
>> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> >>> > > >
>> >>> > > > It would be great if we have an "official" approach for this as
>> well
>> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>> >>> thoughts?
>> >>> > > >
>> >>> > > > Best,
>> >>> > > > Xiangrui
>> >>> > > >
>> >>> > > >
>> ---------------------------------------------------------------------
>> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Aaron Davidson <il...@gmail.com>.

That's kinda annoying, but it's just a little extra boilerplate. Can you
call it as StorageLevel.DiskOnly() from Java? Would it also work if they
were case classes with empty constructors, without the field?

On Wed, Mar 4, 2015 at 11:35 PM, Xiangrui Meng <me...@gmail.com> wrote:

> `case object` inside an `object` doesn't show up in Java. This is the
> minimal code I found to make everything show up correctly in both
> Scala and Java:
>
> sealed abstract class StorageLevel // cannot be a trait
>
> object StorageLevel {
>   private[this] case object _MemoryOnly extends StorageLevel
>   final val MemoryOnly: StorageLevel = _MemoryOnly
>
>   private[this] case object _DiskOnly extends StorageLevel
>   final val DiskOnly: StorageLevel = _DiskOnly
> }
>
> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > I like #4 as well and agree with Aaron's suggestion.
> >
> > - Patrick
> >
> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> wrote:
> >> I'm cool with #4 as well, but make sure we dictate that the values
> should
> >> be defined within an object with the same name as the enumeration (like
> we
> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >>
> >> e.g. we SHOULD do:
> >>
> >> trait StorageLevel
> >> object StorageLevel {
> >>   case object MemoryOnly extends StorageLevel
> >>   case object DiskOnly extends StorageLevel
> >> }
> >>
> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> michael@databricks.com>
> >> wrote:
> >>
> >>> #4 with a preference for CamelCaseEnums
> >>>
> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
> >>> wrote:
> >>>
> >>> > another vote for #4
> >>> > People are already used to adding "()" in Java.
> >>> >
> >>> >
> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > #4 but with MemoryOnly (more scala-like)
> >>> > >
> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >>> > >
> >>> > > Constants, Values, Variable and Methods
> >>> > >
> >>> > > Constant names should be in upper camel case. That is, if the
> member is
> >>> > > final, immutable and it belongs to a package object or an object,
> it
> >>> may
> >>> > be
> >>> > > considered a constant (similar to Java'sstatic final members):
> >>> > >
> >>> > >
> >>> > >    1. object Container {
> >>> > >    2.     val MyConstant = ...
> >>> > >    3. }
> >>> > >
> >>> > >
> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > > There are many places where we use enum-like types in Spark, but
> in
> >>> > > > different ways. Every approach has both pros and cons. I wonder
> >>> > > > whether there should be an "official" approach for enum-like
> types in
> >>> > > > Spark.
> >>> > > >
> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
> >>> > > >
> >>> > > > * All types show up as Enumeration.Value in Java.
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >>> > > >
> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >>> > > >
> >>> > > > * Implementation must be in a Java file.
> >>> > > > * Values doesn't show up in the ScalaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >>> > > >
> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >>> > > >
> >>> > > > * Implementation must be in a Java file.
> >>> > > > * Doesn't need "()" in Java code.
> >>> > > > * Values don't show up in the ScalaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >>> > > >
> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >>> > > >
> >>> > > > * Needs "()" in Java code.
> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >>> > > >
> >>> > > > It would be great if we have an "official" approach for this as
> well
> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >>> thoughts?
> >>> > > >
> >>> > > > Best,
> >>> > > > Xiangrui
> >>> > > >
> >>> > > >
> ---------------------------------------------------------------------
> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
>

Re: enum-like types in Spark

Posted by Sean Owen <so...@cloudera.com>.

This has some disadvantage for Java, I think. You can't switch on an
object defined like this, but you can with an enum. And although the
scala compiler understands that the set of values is fixed because of
'sealed' and so can warn about missing cases, the JVM won't know this,
and can't do the same.

On Fri, Mar 6, 2015 at 3:58 AM, Xiangrui Meng <me...@gmail.com> wrote:
> For #4, my previous proposal may confuse the IDEs with additional
> types generated by the case objects, and their toString contain the
> underscore. The following works better:
>
> sealed abstract class StorageLevel
>
> object StorageLevel {
>   final val MemoryOnly: StorageLevel = {
>     case object MemoryOnly extends StorageLevel
>     MemoryOnly
>   }
>
>   final val DiskOnly: StorageLevel = {
>     case object DiskOnly extends StorageLevel
>     DiskOnly
>  }
> }
>
> MemoryOnly and DiskOnly can be used in pattern matching. If people are
> okay with this approach, I can add it to the code style guide.
>
> Imran, this is not just for internal APIs, which are relatively more
> flexible. It is good to use the same approach to implement public
> enum-like types from now on.
>
> Best,
> Xiangrui
>
> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with #4
>> (with Xiangrui's final suggestion, especially making it sealed & available
>> in Java), but I really think #2, java enums, are the best option.
>>
>> Java enums actually have some very real advantages over the other
>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There has
>> been endless debate in the Scala community about the problems with the
>> approaches in Scala.  Very smart, level-headed Scala gurus have complained
>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>> not positive about that); there have been numerous well-thought out
>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>> always reject them.  IIRC the explanation for rejecting is basically that
>> (a) enums aren't important enough for introducing some new special feature,
>> scala's got bigger things to work on and (b) if you really need a good
>> enum, just use java's enum.
>>
>> I doubt it really matters that much for Spark internals, which is why I
>> think #4 is fine.  But I figured I'd give my spiel, because every developer
>> loves language wars :)
>>
>> Imran
>>
>>
>>
>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>
>>> `case object` inside an `object` doesn't show up in Java. This is the
>>> minimal code I found to make everything show up correctly in both
>>> Scala and Java:
>>>
>>> sealed abstract class StorageLevel // cannot be a trait
>>>
>>> object StorageLevel {
>>>   private[this] case object _MemoryOnly extends StorageLevel
>>>   final val MemoryOnly: StorageLevel = _MemoryOnly
>>>
>>>   private[this] case object _DiskOnly extends StorageLevel
>>>   final val DiskOnly: StorageLevel = _DiskOnly
>>> }
>>>
>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>> > I like #4 as well and agree with Aaron's suggestion.
>>> >
>>> > - Patrick
>>> >
>>> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>>> wrote:
>>> >> I'm cool with #4 as well, but make sure we dictate that the values
>>> should
>>> >> be defined within an object with the same name as the enumeration (like
>>> we
>>> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>> >>
>>> >> e.g. we SHOULD do:
>>> >>
>>> >> trait StorageLevel
>>> >> object StorageLevel {
>>> >>   case object MemoryOnly extends StorageLevel
>>> >>   case object DiskOnly extends StorageLevel
>>> >> }
>>> >>
>>> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>> michael@databricks.com>
>>> >> wrote:
>>> >>
>>> >>> #4 with a preference for CamelCaseEnums
>>> >>>
>>> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>>> >>> wrote:
>>> >>>
>>> >>> > another vote for #4
>>> >>> > People are already used to adding "()" in Java.
>>> >>> >
>>> >>> >
>>> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>> >>> wrote:
>>> >>> >
>>> >>> > > #4 but with MemoryOnly (more scala-like)
>>> >>> > >
>>> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>>> >>> > >
>>> >>> > > Constants, Values, Variable and Methods
>>> >>> > >
>>> >>> > > Constant names should be in upper camel case. That is, if the
>>> member is
>>> >>> > > final, immutable and it belongs to a package object or an object,
>>> it
>>> >>> may
>>> >>> > be
>>> >>> > > considered a constant (similar to Java'sstatic final members):
>>> >>> > >
>>> >>> > >
>>> >>> > >    1. object Container {
>>> >>> > >    2.     val MyConstant = ...
>>> >>> > >    3. }
>>> >>> > >
>>> >>> > >
>>> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>> >>> > >
>>> >>> > > > Hi all,
>>> >>> > > >
>>> >>> > > > There are many places where we use enum-like types in Spark, but
>>> in
>>> >>> > > > different ways. Every approach has both pros and cons. I wonder
>>> >>> > > > whether there should be an "official" approach for enum-like
>>> types in
>>> >>> > > > Spark.
>>> >>> > > >
>>> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>> >>> > > >
>>> >>> > > > * All types show up as Enumeration.Value in Java.
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>>> >>> > > >
>>> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>>> >>> > > >
>>> >>> > > > * Implementation must be in a Java file.
>>> >>> > > > * Values doesn't show up in the ScalaDoc:
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>>> >>> > > >
>>> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>>> >>> > > >
>>> >>> > > > * Implementation must be in a Java file.
>>> >>> > > > * Doesn't need "()" in Java code.
>>> >>> > > > * Values don't show up in the ScalaDoc:
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>>> >>> > > >
>>> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>>> >>> > > >
>>> >>> > > > * Needs "()" in Java code.
>>> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>>> >>> > > >
>>> >>> > > > It would be great if we have an "official" approach for this as
>>> well
>>> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>>> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>> >>> thoughts?
>>> >>> > > >
>>> >>> > > > Best,
>>> >>> > > > Xiangrui
>>> >>> > > >
>>> >>> > > >
>>> ---------------------------------------------------------------------
>>> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>>> >>> > > >
>>> >>> > > >
>>> >>> > >
>>> >>> >
>>> >>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Xiangrui Meng <me...@gmail.com>.

For #4, my previous proposal may confuse the IDEs with additional
types generated by the case objects, and their toString contain the
underscore. The following works better:

sealed abstract class StorageLevel

object StorageLevel {
  final val MemoryOnly: StorageLevel = {
    case object MemoryOnly extends StorageLevel
    MemoryOnly
  }

  final val DiskOnly: StorageLevel = {
    case object DiskOnly extends StorageLevel
    DiskOnly
 }
}

MemoryOnly and DiskOnly can be used in pattern matching. If people are
okay with this approach, I can add it to the code style guide.

Imran, this is not just for internal APIs, which are relatively more
flexible. It is good to use the same approach to implement public
enum-like types from now on.

Best,
Xiangrui

On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
> I have a very strong dislike for #1 (scala enumerations).   I'm ok with #4
> (with Xiangrui's final suggestion, especially making it sealed & available
> in Java), but I really think #2, java enums, are the best option.
>
> Java enums actually have some very real advantages over the other
> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There has
> been endless debate in the Scala community about the problems with the
> approaches in Scala.  Very smart, level-headed Scala gurus have complained
> about their short-comings (Rex Kerr's name is coming to mind, though I'm
> not positive about that); there have been numerous well-thought out
> proposals to give Scala a better enum.  But the powers-that-be in Scala
> always reject them.  IIRC the explanation for rejecting is basically that
> (a) enums aren't important enough for introducing some new special feature,
> scala's got bigger things to work on and (b) if you really need a good
> enum, just use java's enum.
>
> I doubt it really matters that much for Spark internals, which is why I
> think #4 is fine.  But I figured I'd give my spiel, because every developer
> loves language wars :)
>
> Imran
>
>
>
> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> `case object` inside an `object` doesn't show up in Java. This is the
>> minimal code I found to make everything show up correctly in both
>> Scala and Java:
>>
>> sealed abstract class StorageLevel // cannot be a trait
>>
>> object StorageLevel {
>>   private[this] case object _MemoryOnly extends StorageLevel
>>   final val MemoryOnly: StorageLevel = _MemoryOnly
>>
>>   private[this] case object _DiskOnly extends StorageLevel
>>   final val DiskOnly: StorageLevel = _DiskOnly
>> }
>>
>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> > I like #4 as well and agree with Aaron's suggestion.
>> >
>> > - Patrick
>> >
>> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>> wrote:
>> >> I'm cool with #4 as well, but make sure we dictate that the values
>> should
>> >> be defined within an object with the same name as the enumeration (like
>> we
>> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>> >>
>> >> e.g. we SHOULD do:
>> >>
>> >> trait StorageLevel
>> >> object StorageLevel {
>> >>   case object MemoryOnly extends StorageLevel
>> >>   case object DiskOnly extends StorageLevel
>> >> }
>> >>
>> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> michael@databricks.com>
>> >> wrote:
>> >>
>> >>> #4 with a preference for CamelCaseEnums
>> >>>
>> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>> >>> wrote:
>> >>>
>> >>> > another vote for #4
>> >>> > People are already used to adding "()" in Java.
>> >>> >
>> >>> >
>> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > > #4 but with MemoryOnly (more scala-like)
>> >>> > >
>> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> >>> > >
>> >>> > > Constants, Values, Variable and Methods
>> >>> > >
>> >>> > > Constant names should be in upper camel case. That is, if the
>> member is
>> >>> > > final, immutable and it belongs to a package object or an object,
>> it
>> >>> may
>> >>> > be
>> >>> > > considered a constant (similar to Java'sstatic final members):
>> >>> > >
>> >>> > >
>> >>> > >    1. object Container {
>> >>> > >    2.     val MyConstant = ...
>> >>> > >    3. }
>> >>> > >
>> >>> > >
>> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>> >>> > >
>> >>> > > > Hi all,
>> >>> > > >
>> >>> > > > There are many places where we use enum-like types in Spark, but
>> in
>> >>> > > > different ways. Every approach has both pros and cons. I wonder
>> >>> > > > whether there should be an "official" approach for enum-like
>> types in
>> >>> > > > Spark.
>> >>> > > >
>> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>> >>> > > >
>> >>> > > > * All types show up as Enumeration.Value in Java.
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> >>> > > >
>> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Values doesn't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> >>> > > >
>> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Doesn't need "()" in Java code.
>> >>> > > > * Values don't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> >>> > > >
>> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> >>> > > >
>> >>> > > > * Needs "()" in Java code.
>> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> >>> > > >
>> >>> > > > It would be great if we have an "official" approach for this as
>> well
>> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>> >>> thoughts?
>> >>> > > >
>> >>> > > > Best,
>> >>> > > > Xiangrui
>> >>> > > >
>> >>> > > >
>> ---------------------------------------------------------------------
>> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Xiangrui Meng <me...@gmail.com>.

In MLlib, we use strings for emu-like types in Python APIs, which is
quite common in Python and easy for py4j. On the JVM side, we
implement `fromString` to convert them back to enums. -Xiangrui

On Wed, Mar 11, 2015 at 12:56 PM, RJ Nowling <rn...@gmail.com> wrote:
> How do these proposals affect PySpark?  I think compatibility with PySpark
> through Py4J should be considered.
>
> On Mon, Mar 9, 2015 at 8:39 PM, Patrick Wendell <pw...@gmail.com> wrote:
>
>> Does this matter for our own internal types in Spark? I don't think
>> any of these types are designed to be used in RDD records, for
>> instance.
>>
>> On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <il...@gmail.com> wrote:
>> > Perhaps the problem with Java enums that was brought up was actually that
>> > their hashCode is not stable across JVMs, as it depends on the memory
>> > location of the enum itself.
>> >
>> > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>> >
>> >> Can you expand on the serde issues w/ java enum's at all?  I haven't
>> heard
>> >> of any problems specific to enums.  The java object serialization rules
>> >> seem very clear and it doesn't seem like different jvms should have a
>> >> choice on what they do:
>> >>
>> >>
>> >>
>> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
>> >>
>> >> (in a nutshell, serialization must use enum.name())
>> >>
>> >> of course there are plenty of ways the user could screw this up(eg.
>> rename
>> >> the enums, or change their meaning, or remove them).  But then again,
>> all
>> >> of java serialization has issues w/ serialization the user has to be
>> aware
>> >> of.  Eg., if we go with case objects, than java serialization blows up
>> if
>> >> you add another helper method, even if that helper method is completely
>> >> compatible.
>> >>
>> >> Some prior debate in the scala community:
>> >>
>> >>
>> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
>> >>
>> >> SO post on which version to use in scala:
>> >>
>> >>
>> >>
>> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
>> >>
>> >> SO post about the macro-craziness people try to add to scala to make
>> them
>> >> almost as good as a simple java enum:
>> >> (NB: the accepted answer doesn't actually work in all cases ...)
>> >>
>> >>
>> >>
>> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
>> >>
>> >> Another proposal to add better enums built into scala ... but seems to
>> be
>> >> dormant:
>> >>
>> >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
>> >>
>> >>
>> >>
>> >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mr...@gmail.com>
>> >> wrote:
>> >>
>> >> >   I have a strong dislike for java enum's due to the fact that they
>> >> > are not stable across JVM's - if it undergoes serde, you end up with
>> >> > unpredictable results at times [1].
>> >> > One of the reasons why we prevent enum's from being key : though it is
>> >> > highly possible users might depend on it internally and shoot
>> >> > themselves in the foot.
>> >> >
>> >> > Would be better to keep away from them in general and use something
>> more
>> >> > stable.
>> >> >
>> >> > Regards,
>> >> > Mridul
>> >> >
>> >> > [1] Having had to debug this issue for 2 weeks - I really really hate
>> it.
>> >> >
>> >> >
>> >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
>> >> wrote:
>> >> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok
>> with
>> >> > #4
>> >> > > (with Xiangrui's final suggestion, especially making it sealed &
>> >> > available
>> >> > > in Java), but I really think #2, java enums, are the best option.
>> >> > >
>> >> > > Java enums actually have some very real advantages over the other
>> >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.
>> There
>> >> > has
>> >> > > been endless debate in the Scala community about the problems with
>> the
>> >> > > approaches in Scala.  Very smart, level-headed Scala gurus have
>> >> > complained
>> >> > > about their short-comings (Rex Kerr's name is coming to mind, though
>> >> I'm
>> >> > > not positive about that); there have been numerous well-thought out
>> >> > > proposals to give Scala a better enum.  But the powers-that-be in
>> Scala
>> >> > > always reject them.  IIRC the explanation for rejecting is basically
>> >> that
>> >> > > (a) enums aren't important enough for introducing some new special
>> >> > feature,
>> >> > > scala's got bigger things to work on and (b) if you really need a
>> good
>> >> > > enum, just use java's enum.
>> >> > >
>> >> > > I doubt it really matters that much for Spark internals, which is
>> why I
>> >> > > think #4 is fine.  But I figured I'd give my spiel, because every
>> >> > developer
>> >> > > loves language wars :)
>> >> > >
>> >> > > Imran
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com>
>> >> wrote:
>> >> > >
>> >> > >> `case object` inside an `object` doesn't show up in Java. This is
>> the
>> >> > >> minimal code I found to make everything show up correctly in both
>> >> > >> Scala and Java:
>> >> > >>
>> >> > >> sealed abstract class StorageLevel // cannot be a trait
>> >> > >>
>> >> > >> object StorageLevel {
>> >> > >>   private[this] case object _MemoryOnly extends StorageLevel
>> >> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
>> >> > >>
>> >> > >>   private[this] case object _DiskOnly extends StorageLevel
>> >> > >>   final val DiskOnly: StorageLevel = _DiskOnly
>> >> > >> }
>> >> > >>
>> >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <
>> pwendell@gmail.com>
>> >> > >> wrote:
>> >> > >> > I like #4 as well and agree with Aaron's suggestion.
>> >> > >> >
>> >> > >> > - Patrick
>> >> > >> >
>> >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <
>> ilikerps@gmail.com>
>> >> > >> wrote:
>> >> > >> >> I'm cool with #4 as well, but make sure we dictate that the
>> values
>> >> > >> should
>> >> > >> >> be defined within an object with the same name as the
>> enumeration
>> >> > (like
>> >> > >> we
>> >> > >> >> do for StorageLevel). Otherwise we may pollute a higher
>> namespace.
>> >> > >> >>
>> >> > >> >> e.g. we SHOULD do:
>> >> > >> >>
>> >> > >> >> trait StorageLevel
>> >> > >> >> object StorageLevel {
>> >> > >> >>   case object MemoryOnly extends StorageLevel
>> >> > >> >>   case object DiskOnly extends StorageLevel
>> >> > >> >> }
>> >> > >> >>
>> >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> >> > >> michael@databricks.com>
>> >> > >> >> wrote:
>> >> > >> >>
>> >> > >> >>> #4 with a preference for CamelCaseEnums
>> >> > >> >>>
>> >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>> >> > joseph@databricks.com>
>> >> > >> >>> wrote:
>> >> > >> >>>
>> >> > >> >>> > another vote for #4
>> >> > >> >>> > People are already used to adding "()" in Java.
>> >> > >> >>> >
>> >> > >> >>> >
>> >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
>> >> javadba@gmail.com
>> >> > >
>> >> > >> >>> wrote:
>> >> > >> >>> >
>> >> > >> >>> > > #4 but with MemoryOnly (more scala-like)
>> >> > >> >>> > >
>> >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> >> > >> >>> > >
>> >> > >> >>> > > Constants, Values, Variable and Methods
>> >> > >> >>> > >
>> >> > >> >>> > > Constant names should be in upper camel case. That is, if
>> the
>> >> > >> member is
>> >> > >> >>> > > final, immutable and it belongs to a package object or an
>> >> > object,
>> >> > >> it
>> >> > >> >>> may
>> >> > >> >>> > be
>> >> > >> >>> > > considered a constant (similar to Java'sstatic final
>> members):
>> >> > >> >>> > >
>> >> > >> >>> > >
>> >> > >> >>> > >    1. object Container {
>> >> > >> >>> > >    2.     val MyConstant = ...
>> >> > >> >>> > >    3. }
>> >> > >> >>> > >
>> >> > >> >>> > >
>> >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com
>> >:
>> >> > >> >>> > >
>> >> > >> >>> > > > Hi all,
>> >> > >> >>> > > >
>> >> > >> >>> > > > There are many places where we use enum-like types in
>> Spark,
>> >> > but
>> >> > >> in
>> >> > >> >>> > > > different ways. Every approach has both pros and cons. I
>> >> > wonder
>> >> > >> >>> > > > whether there should be an "official" approach for
>> enum-like
>> >> > >> types in
>> >> > >> >>> > > > Spark.
>> >> > >> >>> > > >
>> >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode,
>> WorkerState,
>> >> > etc)
>> >> > >> >>> > > >
>> >> > >> >>> > > > * All types show up as Enumeration.Value in Java.
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> >
>> >>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> >> > >> >>> > > >
>> >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> >> > >> >>> > > >
>> >> > >> >>> > > > * Implementation must be in a Java file.
>> >> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> >
>> >>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> >> > >> >>> > > >
>> >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> >> > >> >>> > > >
>> >> > >> >>> > > > * Implementation must be in a Java file.
>> >> > >> >>> > > > * Doesn't need "()" in Java code.
>> >> > >> >>> > > > * Values don't show up in the ScalaDoc:
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> >
>> >>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> >> > >> >>> > > >
>> >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> >> > >> >>> > > >
>> >> > >> >>> > > > * Needs "()" in Java code.
>> >> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> >
>> >>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> >
>> >>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> >> > >> >>> > > >
>> >> > >> >>> > > > It would be great if we have an "official" approach for
>> this
>> >> > as
>> >> > >> well
>> >> > >> >>> > > > as the naming convention for enum-like values
>> ("MEMORY_ONLY"
>> >> > or
>> >> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY".
>> Any
>> >> > >> >>> thoughts?
>> >> > >> >>> > > >
>> >> > >> >>> > > > Best,
>> >> > >> >>> > > > Xiangrui
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >>
>> ---------------------------------------------------------------------
>> >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> > >> >>> > > > For additional commands, e-mail:
>> dev-help@spark.apache.org
>> >> > >> >>> > > >
>> >> > >> >>> > > >
>> >> > >> >>> > >
>> >> > >> >>> >
>> >> > >> >>>
>> >> > >>
>> >> > >>
>> ---------------------------------------------------------------------
>> >> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> > >> For additional commands, e-mail: dev-help@spark.apache.org
>> >> > >>
>> >> > >>
>> >> >
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by RJ Nowling <rn...@gmail.com>.

How do these proposals affect PySpark?  I think compatibility with PySpark
through Py4J should be considered.

On Mon, Mar 9, 2015 at 8:39 PM, Patrick Wendell <pw...@gmail.com> wrote:

> Does this matter for our own internal types in Spark? I don't think
> any of these types are designed to be used in RDD records, for
> instance.
>
> On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <il...@gmail.com> wrote:
> > Perhaps the problem with Java enums that was brought up was actually that
> > their hashCode is not stable across JVMs, as it depends on the memory
> > location of the enum itself.
> >
> > On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> >
> >> Can you expand on the serde issues w/ java enum's at all?  I haven't
> heard
> >> of any problems specific to enums.  The java object serialization rules
> >> seem very clear and it doesn't seem like different jvms should have a
> >> choice on what they do:
> >>
> >>
> >>
> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
> >>
> >> (in a nutshell, serialization must use enum.name())
> >>
> >> of course there are plenty of ways the user could screw this up(eg.
> rename
> >> the enums, or change their meaning, or remove them).  But then again,
> all
> >> of java serialization has issues w/ serialization the user has to be
> aware
> >> of.  Eg., if we go with case objects, than java serialization blows up
> if
> >> you add another helper method, even if that helper method is completely
> >> compatible.
> >>
> >> Some prior debate in the scala community:
> >>
> >>
> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
> >>
> >> SO post on which version to use in scala:
> >>
> >>
> >>
> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
> >>
> >> SO post about the macro-craziness people try to add to scala to make
> them
> >> almost as good as a simple java enum:
> >> (NB: the accepted answer doesn't actually work in all cases ...)
> >>
> >>
> >>
> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
> >>
> >> Another proposal to add better enums built into scala ... but seems to
> be
> >> dormant:
> >>
> >> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
> >>
> >>
> >>
> >> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mr...@gmail.com>
> >> wrote:
> >>
> >> >   I have a strong dislike for java enum's due to the fact that they
> >> > are not stable across JVM's - if it undergoes serde, you end up with
> >> > unpredictable results at times [1].
> >> > One of the reasons why we prevent enum's from being key : though it is
> >> > highly possible users might depend on it internally and shoot
> >> > themselves in the foot.
> >> >
> >> > Would be better to keep away from them in general and use something
> more
> >> > stable.
> >> >
> >> > Regards,
> >> > Mridul
> >> >
> >> > [1] Having had to debug this issue for 2 weeks - I really really hate
> it.
> >> >
> >> >
> >> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
> >> wrote:
> >> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok
> with
> >> > #4
> >> > > (with Xiangrui's final suggestion, especially making it sealed &
> >> > available
> >> > > in Java), but I really think #2, java enums, are the best option.
> >> > >
> >> > > Java enums actually have some very real advantages over the other
> >> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.
> There
> >> > has
> >> > > been endless debate in the Scala community about the problems with
> the
> >> > > approaches in Scala.  Very smart, level-headed Scala gurus have
> >> > complained
> >> > > about their short-comings (Rex Kerr's name is coming to mind, though
> >> I'm
> >> > > not positive about that); there have been numerous well-thought out
> >> > > proposals to give Scala a better enum.  But the powers-that-be in
> Scala
> >> > > always reject them.  IIRC the explanation for rejecting is basically
> >> that
> >> > > (a) enums aren't important enough for introducing some new special
> >> > feature,
> >> > > scala's got bigger things to work on and (b) if you really need a
> good
> >> > > enum, just use java's enum.
> >> > >
> >> > > I doubt it really matters that much for Spark internals, which is
> why I
> >> > > think #4 is fine.  But I figured I'd give my spiel, because every
> >> > developer
> >> > > loves language wars :)
> >> > >
> >> > > Imran
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com>
> >> wrote:
> >> > >
> >> > >> `case object` inside an `object` doesn't show up in Java. This is
> the
> >> > >> minimal code I found to make everything show up correctly in both
> >> > >> Scala and Java:
> >> > >>
> >> > >> sealed abstract class StorageLevel // cannot be a trait
> >> > >>
> >> > >> object StorageLevel {
> >> > >>   private[this] case object _MemoryOnly extends StorageLevel
> >> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> >> > >>
> >> > >>   private[this] case object _DiskOnly extends StorageLevel
> >> > >>   final val DiskOnly: StorageLevel = _DiskOnly
> >> > >> }
> >> > >>
> >> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <
> pwendell@gmail.com>
> >> > >> wrote:
> >> > >> > I like #4 as well and agree with Aaron's suggestion.
> >> > >> >
> >> > >> > - Patrick
> >> > >> >
> >> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <
> ilikerps@gmail.com>
> >> > >> wrote:
> >> > >> >> I'm cool with #4 as well, but make sure we dictate that the
> values
> >> > >> should
> >> > >> >> be defined within an object with the same name as the
> enumeration
> >> > (like
> >> > >> we
> >> > >> >> do for StorageLevel). Otherwise we may pollute a higher
> namespace.
> >> > >> >>
> >> > >> >> e.g. we SHOULD do:
> >> > >> >>
> >> > >> >> trait StorageLevel
> >> > >> >> object StorageLevel {
> >> > >> >>   case object MemoryOnly extends StorageLevel
> >> > >> >>   case object DiskOnly extends StorageLevel
> >> > >> >> }
> >> > >> >>
> >> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >> > >> michael@databricks.com>
> >> > >> >> wrote:
> >> > >> >>
> >> > >> >>> #4 with a preference for CamelCaseEnums
> >> > >> >>>
> >> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> >> > joseph@databricks.com>
> >> > >> >>> wrote:
> >> > >> >>>
> >> > >> >>> > another vote for #4
> >> > >> >>> > People are already used to adding "()" in Java.
> >> > >> >>> >
> >> > >> >>> >
> >> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
> >> javadba@gmail.com
> >> > >
> >> > >> >>> wrote:
> >> > >> >>> >
> >> > >> >>> > > #4 but with MemoryOnly (more scala-like)
> >> > >> >>> > >
> >> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >> > >> >>> > >
> >> > >> >>> > > Constants, Values, Variable and Methods
> >> > >> >>> > >
> >> > >> >>> > > Constant names should be in upper camel case. That is, if
> the
> >> > >> member is
> >> > >> >>> > > final, immutable and it belongs to a package object or an
> >> > object,
> >> > >> it
> >> > >> >>> may
> >> > >> >>> > be
> >> > >> >>> > > considered a constant (similar to Java'sstatic final
> members):
> >> > >> >>> > >
> >> > >> >>> > >
> >> > >> >>> > >    1. object Container {
> >> > >> >>> > >    2.     val MyConstant = ...
> >> > >> >>> > >    3. }
> >> > >> >>> > >
> >> > >> >>> > >
> >> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <mengxr@gmail.com
> >:
> >> > >> >>> > >
> >> > >> >>> > > > Hi all,
> >> > >> >>> > > >
> >> > >> >>> > > > There are many places where we use enum-like types in
> Spark,
> >> > but
> >> > >> in
> >> > >> >>> > > > different ways. Every approach has both pros and cons. I
> >> > wonder
> >> > >> >>> > > > whether there should be an "official" approach for
> enum-like
> >> > >> types in
> >> > >> >>> > > > Spark.
> >> > >> >>> > > >
> >> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode,
> WorkerState,
> >> > etc)
> >> > >> >>> > > >
> >> > >> >>> > > > * All types show up as Enumeration.Value in Java.
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >> > >> >>> > > >
> >> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >> > >> >>> > > >
> >> > >> >>> > > > * Implementation must be in a Java file.
> >> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >> > >> >>> > > >
> >> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >> > >> >>> > > >
> >> > >> >>> > > > * Implementation must be in a Java file.
> >> > >> >>> > > > * Doesn't need "()" in Java code.
> >> > >> >>> > > > * Values don't show up in the ScalaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >> > >> >>> > > >
> >> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >> > >> >>> > > >
> >> > >> >>> > > > * Needs "()" in Java code.
> >> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> >
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >> > >> >>> > > >
> >> > >> >>> > > > It would be great if we have an "official" approach for
> this
> >> > as
> >> > >> well
> >> > >> >>> > > > as the naming convention for enum-like values
> ("MEMORY_ONLY"
> >> > or
> >> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY".
> Any
> >> > >> >>> thoughts?
> >> > >> >>> > > >
> >> > >> >>> > > > Best,
> >> > >> >>> > > > Xiangrui
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >>
> ---------------------------------------------------------------------
> >> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> > >> >>> > > > For additional commands, e-mail:
> dev-help@spark.apache.org
> >> > >> >>> > > >
> >> > >> >>> > > >
> >> > >> >>> > >
> >> > >> >>> >
> >> > >> >>>
> >> > >>
> >> > >>
> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> > >> For additional commands, e-mail: dev-help@spark.apache.org
> >> > >>
> >> > >>
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Patrick Wendell <pw...@gmail.com>.

Does this matter for our own internal types in Spark? I don't think
any of these types are designed to be used in RDD records, for
instance.

On Mon, Mar 9, 2015 at 6:25 PM, Aaron Davidson <il...@gmail.com> wrote:
> Perhaps the problem with Java enums that was brought up was actually that
> their hashCode is not stable across JVMs, as it depends on the memory
> location of the enum itself.
>
> On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <ir...@cloudera.com> wrote:
>
>> Can you expand on the serde issues w/ java enum's at all?  I haven't heard
>> of any problems specific to enums.  The java object serialization rules
>> seem very clear and it doesn't seem like different jvms should have a
>> choice on what they do:
>>
>>
>> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
>>
>> (in a nutshell, serialization must use enum.name())
>>
>> of course there are plenty of ways the user could screw this up(eg. rename
>> the enums, or change their meaning, or remove them).  But then again, all
>> of java serialization has issues w/ serialization the user has to be aware
>> of.  Eg., if we go with case objects, than java serialization blows up if
>> you add another helper method, even if that helper method is completely
>> compatible.
>>
>> Some prior debate in the scala community:
>>
>> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
>>
>> SO post on which version to use in scala:
>>
>>
>> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
>>
>> SO post about the macro-craziness people try to add to scala to make them
>> almost as good as a simple java enum:
>> (NB: the accepted answer doesn't actually work in all cases ...)
>>
>>
>> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
>>
>> Another proposal to add better enums built into scala ... but seems to be
>> dormant:
>>
>> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
>>
>>
>>
>> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>> >   I have a strong dislike for java enum's due to the fact that they
>> > are not stable across JVM's - if it undergoes serde, you end up with
>> > unpredictable results at times [1].
>> > One of the reasons why we prevent enum's from being key : though it is
>> > highly possible users might depend on it internally and shoot
>> > themselves in the foot.
>> >
>> > Would be better to keep away from them in general and use something more
>> > stable.
>> >
>> > Regards,
>> > Mridul
>> >
>> > [1] Having had to debug this issue for 2 weeks - I really really hate it.
>> >
>> >
>> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>> > #4
>> > > (with Xiangrui's final suggestion, especially making it sealed &
>> > available
>> > > in Java), but I really think #2, java enums, are the best option.
>> > >
>> > > Java enums actually have some very real advantages over the other
>> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>> > has
>> > > been endless debate in the Scala community about the problems with the
>> > > approaches in Scala.  Very smart, level-headed Scala gurus have
>> > complained
>> > > about their short-comings (Rex Kerr's name is coming to mind, though
>> I'm
>> > > not positive about that); there have been numerous well-thought out
>> > > proposals to give Scala a better enum.  But the powers-that-be in Scala
>> > > always reject them.  IIRC the explanation for rejecting is basically
>> that
>> > > (a) enums aren't important enough for introducing some new special
>> > feature,
>> > > scala's got bigger things to work on and (b) if you really need a good
>> > > enum, just use java's enum.
>> > >
>> > > I doubt it really matters that much for Spark internals, which is why I
>> > > think #4 is fine.  But I figured I'd give my spiel, because every
>> > developer
>> > > loves language wars :)
>> > >
>> > > Imran
>> > >
>> > >
>> > >
>> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com>
>> wrote:
>> > >
>> > >> `case object` inside an `object` doesn't show up in Java. This is the
>> > >> minimal code I found to make everything show up correctly in both
>> > >> Scala and Java:
>> > >>
>> > >> sealed abstract class StorageLevel // cannot be a trait
>> > >>
>> > >> object StorageLevel {
>> > >>   private[this] case object _MemoryOnly extends StorageLevel
>> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
>> > >>
>> > >>   private[this] case object _DiskOnly extends StorageLevel
>> > >>   final val DiskOnly: StorageLevel = _DiskOnly
>> > >> }
>> > >>
>> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>> > >> wrote:
>> > >> > I like #4 as well and agree with Aaron's suggestion.
>> > >> >
>> > >> > - Patrick
>> > >> >
>> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>> > >> wrote:
>> > >> >> I'm cool with #4 as well, but make sure we dictate that the values
>> > >> should
>> > >> >> be defined within an object with the same name as the enumeration
>> > (like
>> > >> we
>> > >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>> > >> >>
>> > >> >> e.g. we SHOULD do:
>> > >> >>
>> > >> >> trait StorageLevel
>> > >> >> object StorageLevel {
>> > >> >>   case object MemoryOnly extends StorageLevel
>> > >> >>   case object DiskOnly extends StorageLevel
>> > >> >> }
>> > >> >>
>> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> > >> michael@databricks.com>
>> > >> >> wrote:
>> > >> >>
>> > >> >>> #4 with a preference for CamelCaseEnums
>> > >> >>>
>> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>> > joseph@databricks.com>
>> > >> >>> wrote:
>> > >> >>>
>> > >> >>> > another vote for #4
>> > >> >>> > People are already used to adding "()" in Java.
>> > >> >>> >
>> > >> >>> >
>> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
>> javadba@gmail.com
>> > >
>> > >> >>> wrote:
>> > >> >>> >
>> > >> >>> > > #4 but with MemoryOnly (more scala-like)
>> > >> >>> > >
>> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> > >> >>> > >
>> > >> >>> > > Constants, Values, Variable and Methods
>> > >> >>> > >
>> > >> >>> > > Constant names should be in upper camel case. That is, if the
>> > >> member is
>> > >> >>> > > final, immutable and it belongs to a package object or an
>> > object,
>> > >> it
>> > >> >>> may
>> > >> >>> > be
>> > >> >>> > > considered a constant (similar to Java'sstatic final members):
>> > >> >>> > >
>> > >> >>> > >
>> > >> >>> > >    1. object Container {
>> > >> >>> > >    2.     val MyConstant = ...
>> > >> >>> > >    3. }
>> > >> >>> > >
>> > >> >>> > >
>> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>> > >> >>> > >
>> > >> >>> > > > Hi all,
>> > >> >>> > > >
>> > >> >>> > > > There are many places where we use enum-like types in Spark,
>> > but
>> > >> in
>> > >> >>> > > > different ways. Every approach has both pros and cons. I
>> > wonder
>> > >> >>> > > > whether there should be an "official" approach for enum-like
>> > >> types in
>> > >> >>> > > > Spark.
>> > >> >>> > > >
>> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
>> > etc)
>> > >> >>> > > >
>> > >> >>> > > > * All types show up as Enumeration.Value in Java.
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> > >> >>> > > >
>> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> > >> >>> > > >
>> > >> >>> > > > * Implementation must be in a Java file.
>> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> > >> >>> > > >
>> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> > >> >>> > > >
>> > >> >>> > > > * Implementation must be in a Java file.
>> > >> >>> > > > * Doesn't need "()" in Java code.
>> > >> >>> > > > * Values don't show up in the ScalaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> > >> >>> > > >
>> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> > >> >>> > > >
>> > >> >>> > > > * Needs "()" in Java code.
>> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> > >> >>> > > >
>> > >> >>> > > > It would be great if we have an "official" approach for this
>> > as
>> > >> well
>> > >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY"
>> > or
>> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>> > >> >>> thoughts?
>> > >> >>> > > >
>> > >> >>> > > > Best,
>> > >> >>> > > > Xiangrui
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> ---------------------------------------------------------------------
>> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> > >> >>> > > >
>> > >> >>> > > >
>> > >> >>> > >
>> > >> >>> >
>> > >> >>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > >> For additional commands, e-mail: dev-help@spark.apache.org
>> > >>
>> > >>
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Aaron Davidson <il...@gmail.com>.

Perhaps the problem with Java enums that was brought up was actually that
their hashCode is not stable across JVMs, as it depends on the memory
location of the enum itself.

On Mon, Mar 9, 2015 at 6:15 PM, Imran Rashid <ir...@cloudera.com> wrote:

> Can you expand on the serde issues w/ java enum's at all?  I haven't heard
> of any problems specific to enums.  The java object serialization rules
> seem very clear and it doesn't seem like different jvms should have a
> choice on what they do:
>
>
> http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469
>
> (in a nutshell, serialization must use enum.name())
>
> of course there are plenty of ways the user could screw this up(eg. rename
> the enums, or change their meaning, or remove them).  But then again, all
> of java serialization has issues w/ serialization the user has to be aware
> of.  Eg., if we go with case objects, than java serialization blows up if
> you add another helper method, even if that helper method is completely
> compatible.
>
> Some prior debate in the scala community:
>
> https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ
>
> SO post on which version to use in scala:
>
>
> http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types
>
> SO post about the macro-craziness people try to add to scala to make them
> almost as good as a simple java enum:
> (NB: the accepted answer doesn't actually work in all cases ...)
>
>
> http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched
>
> Another proposal to add better enums built into scala ... but seems to be
> dormant:
>
> https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk
>
>
>
> On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
> >   I have a strong dislike for java enum's due to the fact that they
> > are not stable across JVM's - if it undergoes serde, you end up with
> > unpredictable results at times [1].
> > One of the reasons why we prevent enum's from being key : though it is
> > highly possible users might depend on it internally and shoot
> > themselves in the foot.
> >
> > Would be better to keep away from them in general and use something more
> > stable.
> >
> > Regards,
> > Mridul
> >
> > [1] Having had to debug this issue for 2 weeks - I really really hate it.
> >
> >
> > On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> > > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
> > #4
> > > (with Xiangrui's final suggestion, especially making it sealed &
> > available
> > > in Java), but I really think #2, java enums, are the best option.
> > >
> > > Java enums actually have some very real advantages over the other
> > > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
> > has
> > > been endless debate in the Scala community about the problems with the
> > > approaches in Scala.  Very smart, level-headed Scala gurus have
> > complained
> > > about their short-comings (Rex Kerr's name is coming to mind, though
> I'm
> > > not positive about that); there have been numerous well-thought out
> > > proposals to give Scala a better enum.  But the powers-that-be in Scala
> > > always reject them.  IIRC the explanation for rejecting is basically
> that
> > > (a) enums aren't important enough for introducing some new special
> > feature,
> > > scala's got bigger things to work on and (b) if you really need a good
> > > enum, just use java's enum.
> > >
> > > I doubt it really matters that much for Spark internals, which is why I
> > > think #4 is fine.  But I figured I'd give my spiel, because every
> > developer
> > > loves language wars :)
> > >
> > > Imran
> > >
> > >
> > >
> > > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com>
> wrote:
> > >
> > >> `case object` inside an `object` doesn't show up in Java. This is the
> > >> minimal code I found to make everything show up correctly in both
> > >> Scala and Java:
> > >>
> > >> sealed abstract class StorageLevel // cannot be a trait
> > >>
> > >> object StorageLevel {
> > >>   private[this] case object _MemoryOnly extends StorageLevel
> > >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> > >>
> > >>   private[this] case object _DiskOnly extends StorageLevel
> > >>   final val DiskOnly: StorageLevel = _DiskOnly
> > >> }
> > >>
> > >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> > >> wrote:
> > >> > I like #4 as well and agree with Aaron's suggestion.
> > >> >
> > >> > - Patrick
> > >> >
> > >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> > >> wrote:
> > >> >> I'm cool with #4 as well, but make sure we dictate that the values
> > >> should
> > >> >> be defined within an object with the same name as the enumeration
> > (like
> > >> we
> > >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> > >> >>
> > >> >> e.g. we SHOULD do:
> > >> >>
> > >> >> trait StorageLevel
> > >> >> object StorageLevel {
> > >> >>   case object MemoryOnly extends StorageLevel
> > >> >>   case object DiskOnly extends StorageLevel
> > >> >> }
> > >> >>
> > >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> > >> michael@databricks.com>
> > >> >> wrote:
> > >> >>
> > >> >>> #4 with a preference for CamelCaseEnums
> > >> >>>
> > >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> > joseph@databricks.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>> > another vote for #4
> > >> >>> > People are already used to adding "()" in Java.
> > >> >>> >
> > >> >>> >
> > >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
> javadba@gmail.com
> > >
> > >> >>> wrote:
> > >> >>> >
> > >> >>> > > #4 but with MemoryOnly (more scala-like)
> > >> >>> > >
> > >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> > >> >>> > >
> > >> >>> > > Constants, Values, Variable and Methods
> > >> >>> > >
> > >> >>> > > Constant names should be in upper camel case. That is, if the
> > >> member is
> > >> >>> > > final, immutable and it belongs to a package object or an
> > object,
> > >> it
> > >> >>> may
> > >> >>> > be
> > >> >>> > > considered a constant (similar to Java'sstatic final members):
> > >> >>> > >
> > >> >>> > >
> > >> >>> > >    1. object Container {
> > >> >>> > >    2.     val MyConstant = ...
> > >> >>> > >    3. }
> > >> >>> > >
> > >> >>> > >
> > >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> > >> >>> > >
> > >> >>> > > > Hi all,
> > >> >>> > > >
> > >> >>> > > > There are many places where we use enum-like types in Spark,
> > but
> > >> in
> > >> >>> > > > different ways. Every approach has both pros and cons. I
> > wonder
> > >> >>> > > > whether there should be an "official" approach for enum-like
> > >> types in
> > >> >>> > > > Spark.
> > >> >>> > > >
> > >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
> > etc)
> > >> >>> > > >
> > >> >>> > > > * All types show up as Enumeration.Value in Java.
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> > >> >>> > > >
> > >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> > >> >>> > > >
> > >> >>> > > > * Implementation must be in a Java file.
> > >> >>> > > > * Values doesn't show up in the ScalaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> > >> >>> > > >
> > >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> > >> >>> > > >
> > >> >>> > > > * Implementation must be in a Java file.
> > >> >>> > > > * Doesn't need "()" in Java code.
> > >> >>> > > > * Values don't show up in the ScalaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> > >> >>> > > >
> > >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> > >> >>> > > >
> > >> >>> > > > * Needs "()" in Java code.
> > >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> > >> >>> > > >
> > >> >>> > > > It would be great if we have an "official" approach for this
> > as
> > >> well
> > >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY"
> > or
> > >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> > >> >>> thoughts?
> > >> >>> > > >
> > >> >>> > > > Best,
> > >> >>> > > > Xiangrui
> > >> >>> > > >
> > >> >>> > > >
> > >> ---------------------------------------------------------------------
> > >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> > >> >>> > > >
> > >> >>> > > >
> > >> >>> > >
> > >> >>> >
> > >> >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > >> For additional commands, e-mail: dev-help@spark.apache.org
> > >>
> > >>
> >
>

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

Can you expand on the serde issues w/ java enum's at all?  I haven't heard
of any problems specific to enums.  The java object serialization rules
seem very clear and it doesn't seem like different jvms should have a
choice on what they do:

http://docs.oracle.com/javase/6/docs/platform/serialization/spec/serial-arch.html#6469

(in a nutshell, serialization must use enum.name())

of course there are plenty of ways the user could screw this up(eg. rename
the enums, or change their meaning, or remove them).  But then again, all
of java serialization has issues w/ serialization the user has to be aware
of.  Eg., if we go with case objects, than java serialization blows up if
you add another helper method, even if that helper method is completely
compatible.

Some prior debate in the scala community:

https://groups.google.com/d/msg/scala-internals/8RWkccSRBxQ/AN5F_ZbdKIsJ

SO post on which version to use in scala:

http://stackoverflow.com/questions/1321745/how-to-model-type-safe-enum-types

SO post about the macro-craziness people try to add to scala to make them
almost as good as a simple java enum:
(NB: the accepted answer doesn't actually work in all cases ...)

http://stackoverflow.com/questions/20089920/custom-scala-enum-most-elegant-version-searched

Another proposal to add better enums built into scala ... but seems to be
dormant:

https://groups.google.com/forum/#!topic/scala-sips/Bf82LxK02Kk



On Thu, Mar 5, 2015 at 10:49 PM, Mridul Muralidharan <mr...@gmail.com>
wrote:

>   I have a strong dislike for java enum's due to the fact that they
> are not stable across JVM's - if it undergoes serde, you end up with
> unpredictable results at times [1].
> One of the reasons why we prevent enum's from being key : though it is
> highly possible users might depend on it internally and shoot
> themselves in the foot.
>
> Would be better to keep away from them in general and use something more
> stable.
>
> Regards,
> Mridul
>
> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>
>
> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
> > I have a very strong dislike for #1 (scala enumerations).   I'm ok with
> #4
> > (with Xiangrui's final suggestion, especially making it sealed &
> available
> > in Java), but I really think #2, java enums, are the best option.
> >
> > Java enums actually have some very real advantages over the other
> > approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
> has
> > been endless debate in the Scala community about the problems with the
> > approaches in Scala.  Very smart, level-headed Scala gurus have
> complained
> > about their short-comings (Rex Kerr's name is coming to mind, though I'm
> > not positive about that); there have been numerous well-thought out
> > proposals to give Scala a better enum.  But the powers-that-be in Scala
> > always reject them.  IIRC the explanation for rejecting is basically that
> > (a) enums aren't important enough for introducing some new special
> feature,
> > scala's got bigger things to work on and (b) if you really need a good
> > enum, just use java's enum.
> >
> > I doubt it really matters that much for Spark internals, which is why I
> > think #4 is fine.  But I figured I'd give my spiel, because every
> developer
> > loves language wars :)
> >
> > Imran
> >
> >
> >
> > On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
> >
> >> `case object` inside an `object` doesn't show up in Java. This is the
> >> minimal code I found to make everything show up correctly in both
> >> Scala and Java:
> >>
> >> sealed abstract class StorageLevel // cannot be a trait
> >>
> >> object StorageLevel {
> >>   private[this] case object _MemoryOnly extends StorageLevel
> >>   final val MemoryOnly: StorageLevel = _MemoryOnly
> >>
> >>   private[this] case object _DiskOnly extends StorageLevel
> >>   final val DiskOnly: StorageLevel = _DiskOnly
> >> }
> >>
> >> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >> > I like #4 as well and agree with Aaron's suggestion.
> >> >
> >> > - Patrick
> >> >
> >> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> >> wrote:
> >> >> I'm cool with #4 as well, but make sure we dictate that the values
> >> should
> >> >> be defined within an object with the same name as the enumeration
> (like
> >> we
> >> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >> >>
> >> >> e.g. we SHOULD do:
> >> >>
> >> >> trait StorageLevel
> >> >> object StorageLevel {
> >> >>   case object MemoryOnly extends StorageLevel
> >> >>   case object DiskOnly extends StorageLevel
> >> >> }
> >> >>
> >> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >> michael@databricks.com>
> >> >> wrote:
> >> >>
> >> >>> #4 with a preference for CamelCaseEnums
> >> >>>
> >> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> joseph@databricks.com>
> >> >>> wrote:
> >> >>>
> >> >>> > another vote for #4
> >> >>> > People are already used to adding "()" in Java.
> >> >>> >
> >> >>> >
> >> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <javadba@gmail.com
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > > #4 but with MemoryOnly (more scala-like)
> >> >>> > >
> >> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >> >>> > >
> >> >>> > > Constants, Values, Variable and Methods
> >> >>> > >
> >> >>> > > Constant names should be in upper camel case. That is, if the
> >> member is
> >> >>> > > final, immutable and it belongs to a package object or an
> object,
> >> it
> >> >>> may
> >> >>> > be
> >> >>> > > considered a constant (similar to Java'sstatic final members):
> >> >>> > >
> >> >>> > >
> >> >>> > >    1. object Container {
> >> >>> > >    2.     val MyConstant = ...
> >> >>> > >    3. }
> >> >>> > >
> >> >>> > >
> >> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >> >>> > >
> >> >>> > > > Hi all,
> >> >>> > > >
> >> >>> > > > There are many places where we use enum-like types in Spark,
> but
> >> in
> >> >>> > > > different ways. Every approach has both pros and cons. I
> wonder
> >> >>> > > > whether there should be an "official" approach for enum-like
> >> types in
> >> >>> > > > Spark.
> >> >>> > > >
> >> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState,
> etc)
> >> >>> > > >
> >> >>> > > > * All types show up as Enumeration.Value in Java.
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >> >>> > > >
> >> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Values doesn't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >> >>> > > >
> >> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >> >>> > > >
> >> >>> > > > * Implementation must be in a Java file.
> >> >>> > > > * Doesn't need "()" in Java code.
> >> >>> > > > * Values don't show up in the ScalaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >> >>> > > >
> >> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >> >>> > > >
> >> >>> > > > * Needs "()" in Java code.
> >> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >> >>> > > >
> >> >>> > > > It would be great if we have an "official" approach for this
> as
> >> well
> >> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY"
> or
> >> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >> >>> thoughts?
> >> >>> > > >
> >> >>> > > > Best,
> >> >>> > > > Xiangrui
> >> >>> > > >
> >> >>> > > >
> >> ---------------------------------------------------------------------
> >> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> >> >>> > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>

Re: enum-like types in Spark

Posted by Aaron Davidson <il...@gmail.com>.

The only issue I knew of with Java enums was that it does not appear in the
Scala documentation.

On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:

> Yeah the fully realized #4, which gets back the ability to use it in
> switch statements (? in Scala but not Java?) does end up being kind of
> huge.
>
> I confess I'm swayed a bit back to Java enums, seeing what it
> involves. The hashCode() issue can be 'solved' with the hash of the
> String representation.
>
> On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> > I've just switched some of my code over to the new format, and I just
> want
> > to make sure everyone realizes what we are getting into.  I went from 10
> > lines as java enums
> >
> >
> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
> >
> > to 30 lines with the new format:
> >
> >
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
> >
> > its not just that its verbose.  each name has to be repeated 4 times,
> with
> > potential typos in some locations that won't be caught by the compiler.
> > Also, you have to manually maintain the "values" as you update the set of
> > enums, the compiler won't do it for you.
> >
> > The only downside I've heard for java enums is enum.hashcode().  OTOH,
> the
> > downsides for this version are: maintainability / verbosity, no values(),
> > more cumbersome to use from java, no enum map / enumset.
> >
> > I did put together a little util to at least get back the equivalent of
> > enum.valueOf() with this format
> >
> >
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
> >
> > I'm not trying to prevent us from moving forward on this, its fine if
> this
> > is still what everyone wants, but I feel pretty strongly java enums make
> > more sense.
> >
> > thanks,
> > Imran
>

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

Hi Stephen,

I'm not sure which link you are referring to for the example code -- but
yes, the recommendation is that you create the enum in Java, eg. see

https://github.com/apache/spark/blob/v1.4.0/core/src/main/java/org/apache/spark/status/api/v1/StageStatus.java

Then nothing special is required to use it in scala.  This method both uses
the overall type of the enum in the return value, and uses specific values
in the body:

https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala#L114

(I did delete the branches for the code that is *not* recommended anymore)

Imran


On Wed, Jul 1, 2015 at 5:53 PM, Stephen Boesch <ja...@gmail.com> wrote:

> I am reviving an old thread here. The link for the example code for the
> java enum based solution is now dead: would someone please post an updated
> link showing the proper interop?
>
> Specifically: it is my understanding that java enum's may not be created
> within Scala.  So is the proposed solution requiring dropping out into Java
> to create the enum's?
>
> 2015-04-09 17:16 GMT-07:00 Xiangrui Meng <me...@gmail.com>:
>
>> Using Java enums sound good. We can list the values in the JavaDoc and
>> hope Scala will be able to correctly generate docs for Java enums in
>> the future. -Xiangrui
>>
>> On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>> > any update here?  This is relevant for a currently open PR of mine --
>> I've
>> > got a bunch of new public constants defined w/ format #4, but I'd gladly
>> > switch to java enums.  (Even if we are just going to postpone this
>> decision,
>> > I'm still inclined to switch to java enums ...)
>> >
>> > just to be clear about the existing problem with enums & scaladoc: right
>> > now, the scaladoc knows about the enum class, and generates a page for
>> it,
>> > but it does not display the enum constants.  It is at least labeled as a
>> > java enum, though, so a savvy user could switch to the javadocs to see
>> the
>> > constants.
>> >
>> >
>> >
>> > On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>> >>
>> >> well, perhaps I overstated things a little, I wouldn't call it the
>> >> "official" solution, just a recommendation in the never-ending debate
>> (and
>> >> the recommendation from folks with their hands on scala itself).
>> >>
>> >> Even if we do get this fixed in scaladoc eventually -- as its not in
>> the
>> >> current versions, where does that leave this proposal?  personally I'd
>> >> *still* prefer java enums, even if it doesn't get into scaladoc.  btw,
>> even
>> >> with sealed traits, the scaladoc still isn't great -- you don't see the
>> >> values from the class, you only see them listed from the companion
>> object.
>> >> (though, that is somewhat standard for scaladoc, so maybe I'm reaching
>> a
>> >> little)
>> >>
>> >>
>> >>
>> >> On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell <pw...@gmail.com>
>> >> wrote:
>> >>>
>> >>> If the official solution from the Scala community is to use Java
>> >>> enums, then it seems strange they aren't generated in scaldoc? Maybe
>> >>> we can just fix that w/ Typesafe's help and then we can use them.
>> >>>
>> >>> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com>
>> wrote:
>> >>> > Yeah the fully realized #4, which gets back the ability to use it in
>> >>> > switch statements (? in Scala but not Java?) does end up being kind
>> of
>> >>> > huge.
>> >>> >
>> >>> > I confess I'm swayed a bit back to Java enums, seeing what it
>> >>> > involves. The hashCode() issue can be 'solved' with the hash of the
>> >>> > String representation.
>> >>> >
>> >>> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <irashid@cloudera.com
>> >
>> >>> > wrote:
>> >>> >> I've just switched some of my code over to the new format, and I
>> just
>> >>> >> want
>> >>> >> to make sure everyone realizes what we are getting into.  I went
>> from
>> >>> >> 10
>> >>> >> lines as java enums
>> >>> >>
>> >>> >>
>> >>> >>
>> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>> >>> >>
>> >>> >> to 30 lines with the new format:
>> >>> >>
>> >>> >>
>> >>> >>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>> >>> >>
>> >>> >> its not just that its verbose.  each name has to be repeated 4
>> times,
>> >>> >> with
>> >>> >> potential typos in some locations that won't be caught by the
>> >>> >> compiler.
>> >>> >> Also, you have to manually maintain the "values" as you update the
>> set
>> >>> >> of
>> >>> >> enums, the compiler won't do it for you.
>> >>> >>
>> >>> >> The only downside I've heard for java enums is enum.hashcode().
>> OTOH,
>> >>> >> the
>> >>> >> downsides for this version are: maintainability / verbosity, no
>> >>> >> values(),
>> >>> >> more cumbersome to use from java, no enum map / enumset.
>> >>> >>
>> >>> >> I did put together a little util to at least get back the
>> equivalent
>> >>> >> of
>> >>> >> enum.valueOf() with this format
>> >>> >>
>> >>> >>
>> >>> >>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
>> >>> >>
>> >>> >> I'm not trying to prevent us from moving forward on this, its fine
>> if
>> >>> >> this
>> >>> >> is still what everyone wants, but I feel pretty strongly java enums
>> >>> >> make
>> >>> >> more sense.
>> >>> >>
>> >>> >> thanks,
>> >>> >> Imran
>> >>> >
>> >>> >
>> ---------------------------------------------------------------------
>> >>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >>> >
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Re: enum-like types in Spark

Posted by Stephen Boesch <ja...@gmail.com>.

I am reviving an old thread here. The link for the example code for the
java enum based solution is now dead: would someone please post an updated
link showing the proper interop?

Specifically: it is my understanding that java enum's may not be created
within Scala.  So is the proposed solution requiring dropping out into Java
to create the enum's?

2015-04-09 17:16 GMT-07:00 Xiangrui Meng <me...@gmail.com>:

> Using Java enums sound good. We can list the values in the JavaDoc and
> hope Scala will be able to correctly generate docs for Java enums in
> the future. -Xiangrui
>
> On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid <ir...@cloudera.com>
> wrote:
> > any update here?  This is relevant for a currently open PR of mine --
> I've
> > got a bunch of new public constants defined w/ format #4, but I'd gladly
> > switch to java enums.  (Even if we are just going to postpone this
> decision,
> > I'm still inclined to switch to java enums ...)
> >
> > just to be clear about the existing problem with enums & scaladoc: right
> > now, the scaladoc knows about the enum class, and generates a page for
> it,
> > but it does not display the enum constants.  It is at least labeled as a
> > java enum, though, so a savvy user could switch to the javadocs to see
> the
> > constants.
> >
> >
> >
> > On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> >>
> >> well, perhaps I overstated things a little, I wouldn't call it the
> >> "official" solution, just a recommendation in the never-ending debate
> (and
> >> the recommendation from folks with their hands on scala itself).
> >>
> >> Even if we do get this fixed in scaladoc eventually -- as its not in the
> >> current versions, where does that leave this proposal?  personally I'd
> >> *still* prefer java enums, even if it doesn't get into scaladoc.  btw,
> even
> >> with sealed traits, the scaladoc still isn't great -- you don't see the
> >> values from the class, you only see them listed from the companion
> object.
> >> (though, that is somewhat standard for scaladoc, so maybe I'm reaching a
> >> little)
> >>
> >>
> >>
> >> On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell <pw...@gmail.com>
> >> wrote:
> >>>
> >>> If the official solution from the Scala community is to use Java
> >>> enums, then it seems strange they aren't generated in scaldoc? Maybe
> >>> we can just fix that w/ Typesafe's help and then we can use them.
> >>>
> >>> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
> >>> > Yeah the fully realized #4, which gets back the ability to use it in
> >>> > switch statements (? in Scala but not Java?) does end up being kind
> of
> >>> > huge.
> >>> >
> >>> > I confess I'm swayed a bit back to Java enums, seeing what it
> >>> > involves. The hashCode() issue can be 'solved' with the hash of the
> >>> > String representation.
> >>> >
> >>> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
> >>> > wrote:
> >>> >> I've just switched some of my code over to the new format, and I
> just
> >>> >> want
> >>> >> to make sure everyone realizes what we are getting into.  I went
> from
> >>> >> 10
> >>> >> lines as java enums
> >>> >>
> >>> >>
> >>> >>
> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
> >>> >>
> >>> >> to 30 lines with the new format:
> >>> >>
> >>> >>
> >>> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
> >>> >>
> >>> >> its not just that its verbose.  each name has to be repeated 4
> times,
> >>> >> with
> >>> >> potential typos in some locations that won't be caught by the
> >>> >> compiler.
> >>> >> Also, you have to manually maintain the "values" as you update the
> set
> >>> >> of
> >>> >> enums, the compiler won't do it for you.
> >>> >>
> >>> >> The only downside I've heard for java enums is enum.hashcode().
> OTOH,
> >>> >> the
> >>> >> downsides for this version are: maintainability / verbosity, no
> >>> >> values(),
> >>> >> more cumbersome to use from java, no enum map / enumset.
> >>> >>
> >>> >> I did put together a little util to at least get back the equivalent
> >>> >> of
> >>> >> enum.valueOf() with this format
> >>> >>
> >>> >>
> >>> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
> >>> >>
> >>> >> I'm not trying to prevent us from moving forward on this, its fine
> if
> >>> >> this
> >>> >> is still what everyone wants, but I feel pretty strongly java enums
> >>> >> make
> >>> >> more sense.
> >>> >>
> >>> >> thanks,
> >>> >> Imran
> >>> >
> >>> > ---------------------------------------------------------------------
> >>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> > For additional commands, e-mail: dev-help@spark.apache.org
> >>> >
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Xiangrui Meng <me...@gmail.com>.

Using Java enums sound good. We can list the values in the JavaDoc and
hope Scala will be able to correctly generate docs for Java enums in
the future. -Xiangrui

On Thu, Apr 9, 2015 at 10:59 AM, Imran Rashid <ir...@cloudera.com> wrote:
> any update here?  This is relevant for a currently open PR of mine -- I've
> got a bunch of new public constants defined w/ format #4, but I'd gladly
> switch to java enums.  (Even if we are just going to postpone this decision,
> I'm still inclined to switch to java enums ...)
>
> just to be clear about the existing problem with enums & scaladoc: right
> now, the scaladoc knows about the enum class, and generates a page for it,
> but it does not display the enum constants.  It is at least labeled as a
> java enum, though, so a savvy user could switch to the javadocs to see the
> constants.
>
>
>
> On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid <ir...@cloudera.com> wrote:
>>
>> well, perhaps I overstated things a little, I wouldn't call it the
>> "official" solution, just a recommendation in the never-ending debate (and
>> the recommendation from folks with their hands on scala itself).
>>
>> Even if we do get this fixed in scaladoc eventually -- as its not in the
>> current versions, where does that leave this proposal?  personally I'd
>> *still* prefer java enums, even if it doesn't get into scaladoc.  btw, even
>> with sealed traits, the scaladoc still isn't great -- you don't see the
>> values from the class, you only see them listed from the companion object.
>> (though, that is somewhat standard for scaladoc, so maybe I'm reaching a
>> little)
>>
>>
>>
>> On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>>>
>>> If the official solution from the Scala community is to use Java
>>> enums, then it seems strange they aren't generated in scaldoc? Maybe
>>> we can just fix that w/ Typesafe's help and then we can use them.
>>>
>>> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
>>> > Yeah the fully realized #4, which gets back the ability to use it in
>>> > switch statements (? in Scala but not Java?) does end up being kind of
>>> > huge.
>>> >
>>> > I confess I'm swayed a bit back to Java enums, seeing what it
>>> > involves. The hashCode() issue can be 'solved' with the hash of the
>>> > String representation.
>>> >
>>> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
>>> > wrote:
>>> >> I've just switched some of my code over to the new format, and I just
>>> >> want
>>> >> to make sure everyone realizes what we are getting into.  I went from
>>> >> 10
>>> >> lines as java enums
>>> >>
>>> >>
>>> >> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>>> >>
>>> >> to 30 lines with the new format:
>>> >>
>>> >>
>>> >> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>>> >>
>>> >> its not just that its verbose.  each name has to be repeated 4 times,
>>> >> with
>>> >> potential typos in some locations that won't be caught by the
>>> >> compiler.
>>> >> Also, you have to manually maintain the "values" as you update the set
>>> >> of
>>> >> enums, the compiler won't do it for you.
>>> >>
>>> >> The only downside I've heard for java enums is enum.hashcode().  OTOH,
>>> >> the
>>> >> downsides for this version are: maintainability / verbosity, no
>>> >> values(),
>>> >> more cumbersome to use from java, no enum map / enumset.
>>> >>
>>> >> I did put together a little util to at least get back the equivalent
>>> >> of
>>> >> enum.valueOf() with this format
>>> >>
>>> >>
>>> >> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
>>> >>
>>> >> I'm not trying to prevent us from moving forward on this, its fine if
>>> >> this
>>> >> is still what everyone wants, but I feel pretty strongly java enums
>>> >> make
>>> >> more sense.
>>> >>
>>> >> thanks,
>>> >> Imran
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > For additional commands, e-mail: dev-help@spark.apache.org
>>> >
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

any update here?  This is relevant for a currently open PR of mine -- I've
got a bunch of new public constants defined w/ format #4, but I'd gladly
switch to java enums.  (Even if we are just going to postpone this
decision, I'm still inclined to switch to java enums ...)

just to be clear about the existing problem with enums & scaladoc: right
now, the scaladoc knows about the enum class, and generates a page for it,
but it does not display the enum constants.  It is at least labeled as a
java enum, though, so a savvy user could switch to the javadocs to see the
constants.



On Mon, Mar 23, 2015 at 4:50 PM, Imran Rashid <ir...@cloudera.com> wrote:

> well, perhaps I overstated things a little, I wouldn't call it the
> "official" solution, just a recommendation in the never-ending debate (and
> the recommendation from folks with their hands on scala itself).
>
> Even if we do get this fixed in scaladoc eventually -- as its not in the
> current versions, where does that leave this proposal?  personally I'd
> *still* prefer java enums, even if it doesn't get into scaladoc.  btw, even
> with sealed traits, the scaladoc still isn't great -- you don't see the
> values from the class, you only see them listed from the companion object.
>  (though, that is somewhat standard for scaladoc, so maybe I'm reaching a
> little)
>
>
>
> On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
>> If the official solution from the Scala community is to use Java
>> enums, then it seems strange they aren't generated in scaldoc? Maybe
>> we can just fix that w/ Typesafe's help and then we can use them.
>>
>> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
>> > Yeah the fully realized #4, which gets back the ability to use it in
>> > switch statements (? in Scala but not Java?) does end up being kind of
>> > huge.
>> >
>> > I confess I'm swayed a bit back to Java enums, seeing what it
>> > involves. The hashCode() issue can be 'solved' with the hash of the
>> > String representation.
>> >
>> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>> >> I've just switched some of my code over to the new format, and I just
>> want
>> >> to make sure everyone realizes what we are getting into.  I went from
>> 10
>> >> lines as java enums
>> >>
>> >>
>> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>> >>
>> >> to 30 lines with the new format:
>> >>
>> >>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>> >>
>> >> its not just that its verbose.  each name has to be repeated 4 times,
>> with
>> >> potential typos in some locations that won't be caught by the compiler.
>> >> Also, you have to manually maintain the "values" as you update the set
>> of
>> >> enums, the compiler won't do it for you.
>> >>
>> >> The only downside I've heard for java enums is enum.hashcode().  OTOH,
>> the
>> >> downsides for this version are: maintainability / verbosity, no
>> values(),
>> >> more cumbersome to use from java, no enum map / enumset.
>> >>
>> >> I did put together a little util to at least get back the equivalent of
>> >> enum.valueOf() with this format
>> >>
>> >>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
>> >>
>> >> I'm not trying to prevent us from moving forward on this, its fine if
>> this
>> >> is still what everyone wants, but I feel pretty strongly java enums
>> make
>> >> more sense.
>> >>
>> >> thanks,
>> >> Imran
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>>
>
>

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

well, perhaps I overstated things a little, I wouldn't call it the
"official" solution, just a recommendation in the never-ending debate (and
the recommendation from folks with their hands on scala itself).

Even if we do get this fixed in scaladoc eventually -- as its not in the
current versions, where does that leave this proposal?  personally I'd
*still* prefer java enums, even if it doesn't get into scaladoc.  btw, even
with sealed traits, the scaladoc still isn't great -- you don't see the
values from the class, you only see them listed from the companion object.
 (though, that is somewhat standard for scaladoc, so maybe I'm reaching a
little)



On Mon, Mar 23, 2015 at 4:11 PM, Patrick Wendell <pw...@gmail.com> wrote:

> If the official solution from the Scala community is to use Java
> enums, then it seems strange they aren't generated in scaldoc? Maybe
> we can just fix that w/ Typesafe's help and then we can use them.
>
> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
> > Yeah the fully realized #4, which gets back the ability to use it in
> > switch statements (? in Scala but not Java?) does end up being kind of
> > huge.
> >
> > I confess I'm swayed a bit back to Java enums, seeing what it
> > involves. The hashCode() issue can be 'solved' with the hash of the
> > String representation.
> >
> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> >> I've just switched some of my code over to the new format, and I just
> want
> >> to make sure everyone realizes what we are getting into.  I went from 10
> >> lines as java enums
> >>
> >>
> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
> >>
> >> to 30 lines with the new format:
> >>
> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
> >>
> >> its not just that its verbose.  each name has to be repeated 4 times,
> with
> >> potential typos in some locations that won't be caught by the compiler.
> >> Also, you have to manually maintain the "values" as you update the set
> of
> >> enums, the compiler won't do it for you.
> >>
> >> The only downside I've heard for java enums is enum.hashcode().  OTOH,
> the
> >> downsides for this version are: maintainability / verbosity, no
> values(),
> >> more cumbersome to use from java, no enum map / enumset.
> >>
> >> I did put together a little util to at least get back the equivalent of
> >> enum.valueOf() with this format
> >>
> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
> >>
> >> I'm not trying to prevent us from moving forward on this, its fine if
> this
> >> is still what everyone wants, but I feel pretty strongly java enums make
> >> more sense.
> >>
> >> thanks,
> >> Imran
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>

Re: enum-like types in Spark

Posted by Reynold Xin <rx...@databricks.com>.

If scaladoc can show the Java enum types, I do think the best way is then
just Java enum types.


On Mon, Mar 23, 2015 at 2:11 PM, Patrick Wendell <pw...@gmail.com> wrote:

> If the official solution from the Scala community is to use Java
> enums, then it seems strange they aren't generated in scaldoc? Maybe
> we can just fix that w/ Typesafe's help and then we can use them.
>
> On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
> > Yeah the fully realized #4, which gets back the ability to use it in
> > switch statements (? in Scala but not Java?) does end up being kind of
> > huge.
> >
> > I confess I'm swayed a bit back to Java enums, seeing what it
> > involves. The hashCode() issue can be 'solved' with the hash of the
> > String representation.
> >
> > On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com>
> wrote:
> >> I've just switched some of my code over to the new format, and I just
> want
> >> to make sure everyone realizes what we are getting into.  I went from 10
> >> lines as java enums
> >>
> >>
> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
> >>
> >> to 30 lines with the new format:
> >>
> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
> >>
> >> its not just that its verbose.  each name has to be repeated 4 times,
> with
> >> potential typos in some locations that won't be caught by the compiler.
> >> Also, you have to manually maintain the "values" as you update the set
> of
> >> enums, the compiler won't do it for you.
> >>
> >> The only downside I've heard for java enums is enum.hashcode().  OTOH,
> the
> >> downsides for this version are: maintainability / verbosity, no
> values(),
> >> more cumbersome to use from java, no enum map / enumset.
> >>
> >> I did put together a little util to at least get back the equivalent of
> >> enum.valueOf() with this format
> >>
> >>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
> >>
> >> I'm not trying to prevent us from moving forward on this, its fine if
> this
> >> is still what everyone wants, but I feel pretty strongly java enums make
> >> more sense.
> >>
> >> thanks,
> >> Imran
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Patrick Wendell <pw...@gmail.com>.

If the official solution from the Scala community is to use Java
enums, then it seems strange they aren't generated in scaldoc? Maybe
we can just fix that w/ Typesafe's help and then we can use them.

On Mon, Mar 23, 2015 at 1:46 PM, Sean Owen <so...@cloudera.com> wrote:
> Yeah the fully realized #4, which gets back the ability to use it in
> switch statements (? in Scala but not Java?) does end up being kind of
> huge.
>
> I confess I'm swayed a bit back to Java enums, seeing what it
> involves. The hashCode() issue can be 'solved' with the hash of the
> String representation.
>
> On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com> wrote:
>> I've just switched some of my code over to the new format, and I just want
>> to make sure everyone realizes what we are getting into.  I went from 10
>> lines as java enums
>>
>> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>>
>> to 30 lines with the new format:
>>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>>
>> its not just that its verbose.  each name has to be repeated 4 times, with
>> potential typos in some locations that won't be caught by the compiler.
>> Also, you have to manually maintain the "values" as you update the set of
>> enums, the compiler won't do it for you.
>>
>> The only downside I've heard for java enums is enum.hashcode().  OTOH, the
>> downsides for this version are: maintainability / verbosity, no values(),
>> more cumbersome to use from java, no enum map / enumset.
>>
>> I did put together a little util to at least get back the equivalent of
>> enum.valueOf() with this format
>>
>> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
>>
>> I'm not trying to prevent us from moving forward on this, its fine if this
>> is still what everyone wants, but I feel pretty strongly java enums make
>> more sense.
>>
>> thanks,
>> Imran
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Sean Owen <so...@cloudera.com>.

Yeah the fully realized #4, which gets back the ability to use it in
switch statements (? in Scala but not Java?) does end up being kind of
huge.

I confess I'm swayed a bit back to Java enums, seeing what it
involves. The hashCode() issue can be 'solved' with the hash of the
String representation.

On Mon, Mar 23, 2015 at 8:33 PM, Imran Rashid <ir...@cloudera.com> wrote:
> I've just switched some of my code over to the new format, and I just want
> to make sure everyone realizes what we are getting into.  I went from 10
> lines as java enums
>
> https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20
>
> to 30 lines with the new format:
>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250
>
> its not just that its verbose.  each name has to be repeated 4 times, with
> potential typos in some locations that won't be caught by the compiler.
> Also, you have to manually maintain the "values" as you update the set of
> enums, the compiler won't do it for you.
>
> The only downside I've heard for java enums is enum.hashcode().  OTOH, the
> downsides for this version are: maintainability / verbosity, no values(),
> more cumbersome to use from java, no enum map / enumset.
>
> I did put together a little util to at least get back the equivalent of
> enum.valueOf() with this format
>
> https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala
>
> I'm not trying to prevent us from moving forward on this, its fine if this
> is still what everyone wants, but I feel pretty strongly java enums make
> more sense.
>
> thanks,
> Imran

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

I've just switched some of my code over to the new format, and I just want
to make sure everyone realizes what we are getting into.  I went from 10
lines as java enums

https://github.com/squito/spark/blob/fef66058612ebf225e58dd5f5fea6bae1afd5b31/core/src/main/java/org/apache/spark/status/api/StageStatus.java#L20

to 30 lines with the new format:

https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/status/api/v1/api.scala#L250

its not just that its verbose.  each name has to be repeated 4 times, with
potential typos in some locations that won't be caught by the compiler.
Also, you have to manually maintain the "values" as you update the set of
enums, the compiler won't do it for you.

The only downside I've heard for java enums is enum.hashcode().  OTOH, the
downsides for this version are: maintainability / verbosity, no values(),
more cumbersome to use from java, no enum map / enumset.

I did put together a little util to at least get back the equivalent of
enum.valueOf() with this format

https://github.com/squito/spark/blob/SPARK-3454_w_jersey/core/src/main/scala/org/apache/spark/util/SparkEnum.scala

I'm not trying to prevent us from moving forward on this, its fine if this
is still what everyone wants, but I feel pretty strongly java enums make
more sense.

thanks,
Imran


On Tue, Mar 17, 2015 at 2:07 PM, Xiangrui Meng <me...@gmail.com> wrote:

> Let me put a quick summary. #4 got majority vote with CamelCase but
> not UPPERCASE. The following is a minimal implementation that works
> for both Scala and Java. In Python, we use string for enums. This
> proposal is only for new public APIs. We are not going to change
> existing ones. -Xiangrui
>
> ~~~
> sealed abstract class StorageLevel
>
> object StorageLevel {
>
>   def fromString(name: String): StorageLevel = ???
>
>   val MemoryOnly: StorageLevel = {
>     case object MemoryOnly extends StorageLevel
>     MemoryOnly
>   }
>
>   val DiskOnly: StorageLevel = {
>     case object DiskOnly extends StorageLevel
>     DiskOnly
>  }
> }
> ~~~
>
> On Mon, Mar 16, 2015 at 3:04 PM, Aaron Davidson <il...@gmail.com>
> wrote:
> > It's unrelated to the proposal, but Enum#ordinal() should be much faster,
> > assuming it's not serialized to JVMs with different versions of the enum
> :)
> >
> > On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <ke...@oracle.com>
> > wrote:
> >
> >> In some applications, I have rather heavy use of Java enums which are
> >> needed for related Java APIs that the application uses.  And
> unfortunately,
> >> they are also used as keys.  As such, using the native hashcodes makes
> any
> >> function over keys unstable and unpredictable, so we now use
> Enum.name() as
> >> the key instead.  Oh well.  But it works and seems to work well.
> >>
> >> Kevin
> >>
> >>
> >> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
> >>
> >>>    I have a strong dislike for java enum's due to the fact that they
> >>> are not stable across JVM's - if it undergoes serde, you end up with
> >>> unpredictable results at times [1].
> >>> One of the reasons why we prevent enum's from being key : though it is
> >>> highly possible users might depend on it internally and shoot
> >>> themselves in the foot.
> >>>
> >>> Would be better to keep away from them in general and use something
> more
> >>> stable.
> >>>
> >>> Regards,
> >>> Mridul
> >>>
> >>> [1] Having had to debug this issue for 2 weeks - I really really hate
> it.
> >>>
> >>>
> >>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
> >>> wrote:
> >>>
> >>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok
> with
> >>>> #4
> >>>> (with Xiangrui's final suggestion, especially making it sealed &
> >>>> available
> >>>> in Java), but I really think #2, java enums, are the best option.
> >>>>
> >>>> Java enums actually have some very real advantages over the other
> >>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.
> There
> >>>> has
> >>>> been endless debate in the Scala community about the problems with the
> >>>> approaches in Scala.  Very smart, level-headed Scala gurus have
> >>>> complained
> >>>> about their short-comings (Rex Kerr's name is coming to mind, though
> I'm
> >>>> not positive about that); there have been numerous well-thought out
> >>>> proposals to give Scala a better enum.  But the powers-that-be in
> Scala
> >>>> always reject them.  IIRC the explanation for rejecting is basically
> that
> >>>> (a) enums aren't important enough for introducing some new special
> >>>> feature,
> >>>> scala's got bigger things to work on and (b) if you really need a good
> >>>> enum, just use java's enum.
> >>>>
> >>>> I doubt it really matters that much for Spark internals, which is why
> I
> >>>> think #4 is fine.  But I figured I'd give my spiel, because every
> >>>> developer
> >>>> loves language wars :)
> >>>>
> >>>> Imran
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com>
> wrote:
> >>>>
> >>>>  `case object` inside an `object` doesn't show up in Java. This is the
> >>>>> minimal code I found to make everything show up correctly in both
> >>>>> Scala and Java:
> >>>>>
> >>>>> sealed abstract class StorageLevel // cannot be a trait
> >>>>>
> >>>>> object StorageLevel {
> >>>>>    private[this] case object _MemoryOnly extends StorageLevel
> >>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
> >>>>>
> >>>>>    private[this] case object _DiskOnly extends StorageLevel
> >>>>>    final val DiskOnly: StorageLevel = _DiskOnly
> >>>>> }
> >>>>>
> >>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I like #4 as well and agree with Aaron's suggestion.
> >>>>>>
> >>>>>> - Patrick
> >>>>>>
> >>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> I'm cool with #4 as well, but make sure we dictate that the values
> >>>>>>>
> >>>>>> should
> >>>>>
> >>>>>> be defined within an object with the same name as the enumeration
> (like
> >>>>>>>
> >>>>>> we
> >>>>>
> >>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >>>>>>>
> >>>>>>> e.g. we SHOULD do:
> >>>>>>>
> >>>>>>> trait StorageLevel
> >>>>>>> object StorageLevel {
> >>>>>>>    case object MemoryOnly extends StorageLevel
> >>>>>>>    case object DiskOnly extends StorageLevel
> >>>>>>> }
> >>>>>>>
> >>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> >>>>>>>
> >>>>>> michael@databricks.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>  #4 with a preference for CamelCaseEnums
> >>>>>>>>
> >>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
> >>>>>>>> joseph@databricks.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>  another vote for #4
> >>>>>>>>> People are already used to adding "()" in Java.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <
> javadba@gmail.com>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> #4 but with MemoryOnly (more scala-like)
> >>>>>>>>>>
> >>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
> >>>>>>>>>>
> >>>>>>>>>> Constants, Values, Variable and Methods
> >>>>>>>>>>
> >>>>>>>>>> Constant names should be in upper camel case. That is, if the
> >>>>>>>>>>
> >>>>>>>>> member is
> >>>>>
> >>>>>> final, immutable and it belongs to a package object or an object,
> >>>>>>>>>>
> >>>>>>>>> it
> >>>>>
> >>>>>> may
> >>>>>>>>
> >>>>>>>>> be
> >>>>>>>>>
> >>>>>>>>>> considered a constant (similar to Java'sstatic final members):
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>     1. object Container {
> >>>>>>>>>>     2.     val MyConstant = ...
> >>>>>>>>>>     3. }
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >>>>>>>>>>
> >>>>>>>>>>  Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> There are many places where we use enum-like types in Spark,
> but
> >>>>>>>>>>>
> >>>>>>>>>> in
> >>>>>
> >>>>>> different ways. Every approach has both pros and cons. I wonder
> >>>>>>>>>>> whether there should be an "official" approach for enum-like
> >>>>>>>>>>>
> >>>>>>>>>> types in
> >>>>>
> >>>>>> Spark.
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
> >>>>>>>>>>>
> >>>>>>>>>>> * All types show up as Enumeration.Value in Java.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>> spark/scheduler/SchedulingMode.html
> >>>>>
> >>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
> >>>>>>>>>>>
> >>>>>>>>>>> * Implementation must be in a Java file.
> >>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.network.util.IOMode
> >>>>>
> >>>>>> 3. Static fields in Java (e.g., TripletFields)
> >>>>>>>>>>>
> >>>>>>>>>>> * Implementation must be in a Java file.
> >>>>>>>>>>> * Doesn't need "()" in Java code.
> >>>>>>>>>>> * Values don't show up in the ScalaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.graphx.TripletFields
> >>>>>
> >>>>>> 4. Objects in Scala. (e.g., StorageLevel)
> >>>>>>>>>>>
> >>>>>>>>>>> * Needs "()" in Java code.
> >>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
> >>>>> spark.storage.StorageLevel$
> >>>>>
> >>>>>>
> >>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>> spark/storage/StorageLevel.html
> >>>>>
> >>>>>> It would be great if we have an "official" approach for this as
> >>>>>>>>>>>
> >>>>>>>>>> well
> >>>>>
> >>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
> >>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >>>>>>>>>>>
> >>>>>>>>>> thoughts?
> >>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>>> Xiangrui
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  ------------------------------------------------------------
> >>>>> ---------
> >>>>>
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  ------------------------------------------------------------
> >>>>> ---------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>>>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>>>
> >>>>>
> >>>>>
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Xiangrui Meng <me...@gmail.com>.

Let me put a quick summary. #4 got majority vote with CamelCase but
not UPPERCASE. The following is a minimal implementation that works
for both Scala and Java. In Python, we use string for enums. This
proposal is only for new public APIs. We are not going to change
existing ones. -Xiangrui

~~~
sealed abstract class StorageLevel

object StorageLevel {

  def fromString(name: String): StorageLevel = ???

  val MemoryOnly: StorageLevel = {
    case object MemoryOnly extends StorageLevel
    MemoryOnly
  }

  val DiskOnly: StorageLevel = {
    case object DiskOnly extends StorageLevel
    DiskOnly
 }
}
~~~

On Mon, Mar 16, 2015 at 3:04 PM, Aaron Davidson <il...@gmail.com> wrote:
> It's unrelated to the proposal, but Enum#ordinal() should be much faster,
> assuming it's not serialized to JVMs with different versions of the enum :)
>
> On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <ke...@oracle.com>
> wrote:
>
>> In some applications, I have rather heavy use of Java enums which are
>> needed for related Java APIs that the application uses.  And unfortunately,
>> they are also used as keys.  As such, using the native hashcodes makes any
>> function over keys unstable and unpredictable, so we now use Enum.name() as
>> the key instead.  Oh well.  But it works and seems to work well.
>>
>> Kevin
>>
>>
>> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>>
>>>    I have a strong dislike for java enum's due to the fact that they
>>> are not stable across JVM's - if it undergoes serde, you end up with
>>> unpredictable results at times [1].
>>> One of the reasons why we prevent enum's from being key : though it is
>>> highly possible users might depend on it internally and shoot
>>> themselves in the foot.
>>>
>>> Would be better to keep away from them in general and use something more
>>> stable.
>>>
>>> Regards,
>>> Mridul
>>>
>>> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>>>
>>>
>>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
>>> wrote:
>>>
>>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>>>> #4
>>>> (with Xiangrui's final suggestion, especially making it sealed &
>>>> available
>>>> in Java), but I really think #2, java enums, are the best option.
>>>>
>>>> Java enums actually have some very real advantages over the other
>>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>>>> has
>>>> been endless debate in the Scala community about the problems with the
>>>> approaches in Scala.  Very smart, level-headed Scala gurus have
>>>> complained
>>>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>>>> not positive about that); there have been numerous well-thought out
>>>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>>>> always reject them.  IIRC the explanation for rejecting is basically that
>>>> (a) enums aren't important enough for introducing some new special
>>>> feature,
>>>> scala's got bigger things to work on and (b) if you really need a good
>>>> enum, just use java's enum.
>>>>
>>>> I doubt it really matters that much for Spark internals, which is why I
>>>> think #4 is fine.  But I figured I'd give my spiel, because every
>>>> developer
>>>> loves language wars :)
>>>>
>>>> Imran
>>>>
>>>>
>>>>
>>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>>
>>>>  `case object` inside an `object` doesn't show up in Java. This is the
>>>>> minimal code I found to make everything show up correctly in both
>>>>> Scala and Java:
>>>>>
>>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>>
>>>>> object StorageLevel {
>>>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>>>
>>>>>    private[this] case object _DiskOnly extends StorageLevel
>>>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>>>> }
>>>>>
>>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>>>
>>>>>> - Patrick
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>>>>>>
>>>>>> should
>>>>>
>>>>>> be defined within an object with the same name as the enumeration (like
>>>>>>>
>>>>>> we
>>>>>
>>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>>>
>>>>>>> e.g. we SHOULD do:
>>>>>>>
>>>>>>> trait StorageLevel
>>>>>>> object StorageLevel {
>>>>>>>    case object MemoryOnly extends StorageLevel
>>>>>>>    case object DiskOnly extends StorageLevel
>>>>>>> }
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>>>>>>
>>>>>> michael@databricks.com>
>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>  #4 with a preference for CamelCaseEnums
>>>>>>>>
>>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>>>>>>>> joseph@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  another vote for #4
>>>>>>>>> People are already used to adding "()" in Java.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>>>
>>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>>>
>>>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>>>
>>>>>>>>>> Constant names should be in upper camel case. That is, if the
>>>>>>>>>>
>>>>>>>>> member is
>>>>>
>>>>>> final, immutable and it belongs to a package object or an object,
>>>>>>>>>>
>>>>>>>>> it
>>>>>
>>>>>> may
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>
>>>>>>>>>> considered a constant (similar to Java'sstatic final members):
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     1. object Container {
>>>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>>>     3. }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>  Hi all,
>>>>>>>>>>>
>>>>>>>>>>> There are many places where we use enum-like types in Spark, but
>>>>>>>>>>>
>>>>>>>>>> in
>>>>>
>>>>>> different ways. Every approach has both pros and cons. I wonder
>>>>>>>>>>> whether there should be an "official" approach for enum-like
>>>>>>>>>>>
>>>>>>>>>> types in
>>>>>
>>>>>> Spark.
>>>>>>>>>>>
>>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>>>>>>>>>>
>>>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>>> spark/scheduler/SchedulingMode.html
>>>>>
>>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>>>
>>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>>> spark.network.util.IOMode
>>>>>
>>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>>>
>>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>>> spark.graphx.TripletFields
>>>>>
>>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>>>
>>>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>>> spark.storage.StorageLevel$
>>>>>
>>>>>>
>>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>>> spark/storage/StorageLevel.html
>>>>>
>>>>>> It would be great if we have an "official" approach for this as
>>>>>>>>>>>
>>>>>>>>>> well
>>>>>
>>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
>>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>>>>>>>>>>
>>>>>>>>>> thoughts?
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>>> Xiangrui
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  ------------------------------------------------------------
>>>>> ---------
>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  ------------------------------------------------------------
>>>>> ---------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>>
>>>>>  ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Aaron Davidson <il...@gmail.com>.

It's unrelated to the proposal, but Enum#ordinal() should be much faster,
assuming it's not serialized to JVMs with different versions of the enum :)

On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <ke...@oracle.com>
wrote:

> In some applications, I have rather heavy use of Java enums which are
> needed for related Java APIs that the application uses.  And unfortunately,
> they are also used as keys.  As such, using the native hashcodes makes any
> function over keys unstable and unpredictable, so we now use Enum.name() as
> the key instead.  Oh well.  But it works and seems to work well.
>
> Kevin
>
>
> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>
>>    I have a strong dislike for java enum's due to the fact that they
>> are not stable across JVM's - if it undergoes serde, you end up with
>> unpredictable results at times [1].
>> One of the reasons why we prevent enum's from being key : though it is
>> highly possible users might depend on it internally and shoot
>> themselves in the foot.
>>
>> Would be better to keep away from them in general and use something more
>> stable.
>>
>> Regards,
>> Mridul
>>
>> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>>
>>
>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com>
>> wrote:
>>
>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>>> #4
>>> (with Xiangrui's final suggestion, especially making it sealed &
>>> available
>>> in Java), but I really think #2, java enums, are the best option.
>>>
>>> Java enums actually have some very real advantages over the other
>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>>> has
>>> been endless debate in the Scala community about the problems with the
>>> approaches in Scala.  Very smart, level-headed Scala gurus have
>>> complained
>>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>>> not positive about that); there have been numerous well-thought out
>>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>>> always reject them.  IIRC the explanation for rejecting is basically that
>>> (a) enums aren't important enough for introducing some new special
>>> feature,
>>> scala's got bigger things to work on and (b) if you really need a good
>>> enum, just use java's enum.
>>>
>>> I doubt it really matters that much for Spark internals, which is why I
>>> think #4 is fine.  But I figured I'd give my spiel, because every
>>> developer
>>> loves language wars :)
>>>
>>> Imran
>>>
>>>
>>>
>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>>  `case object` inside an `object` doesn't show up in Java. This is the
>>>> minimal code I found to make everything show up correctly in both
>>>> Scala and Java:
>>>>
>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>
>>>> object StorageLevel {
>>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>>
>>>>    private[this] case object _DiskOnly extends StorageLevel
>>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>>> }
>>>>
>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>>>> wrote:
>>>>
>>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>>>>>
>>>>> should
>>>>
>>>>> be defined within an object with the same name as the enumeration (like
>>>>>>
>>>>> we
>>>>
>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>>
>>>>>> e.g. we SHOULD do:
>>>>>>
>>>>>> trait StorageLevel
>>>>>> object StorageLevel {
>>>>>>    case object MemoryOnly extends StorageLevel
>>>>>>    case object DiskOnly extends StorageLevel
>>>>>> }
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>>>>>
>>>>> michael@databricks.com>
>>>>
>>>>> wrote:
>>>>>>
>>>>>>  #4 with a preference for CamelCaseEnums
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <
>>>>>>> joseph@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>  another vote for #4
>>>>>>>> People are already used to adding "()" in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>>
>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>>
>>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>>
>>>>>>>>> Constant names should be in upper camel case. That is, if the
>>>>>>>>>
>>>>>>>> member is
>>>>
>>>>> final, immutable and it belongs to a package object or an object,
>>>>>>>>>
>>>>>>>> it
>>>>
>>>>> may
>>>>>>>
>>>>>>>> be
>>>>>>>>
>>>>>>>>> considered a constant (similar to Java'sstatic final members):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     1. object Container {
>>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>>     3. }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>>>>>>>>
>>>>>>>>>  Hi all,
>>>>>>>>>>
>>>>>>>>>> There are many places where we use enum-like types in Spark, but
>>>>>>>>>>
>>>>>>>>> in
>>>>
>>>>> different ways. Every approach has both pros and cons. I wonder
>>>>>>>>>> whether there should be an "official" approach for enum-like
>>>>>>>>>>
>>>>>>>>> types in
>>>>
>>>>> Spark.
>>>>>>>>>>
>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>>>>>>>>>
>>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>> spark/scheduler/SchedulingMode.html
>>>>
>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.network.util.IOMode
>>>>
>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.graphx.TripletFields
>>>>
>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>>
>>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/scala/#org.apache.
>>>> spark.storage.StorageLevel$
>>>>
>>>>>
>>>>>>>>>>  http://spark.apache.org/docs/latest/api/java/org/apache/
>>>> spark/storage/StorageLevel.html
>>>>
>>>>> It would be great if we have an "official" approach for this as
>>>>>>>>>>
>>>>>>>>> well
>>>>
>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>>>>>>>>>
>>>>>>>>> thoughts?
>>>>>>>
>>>>>>>> Best,
>>>>>>>>>> Xiangrui
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ------------------------------------------------------------
>>>> ---------
>>>>
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  ------------------------------------------------------------
>>>> ---------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>>>  ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Patrick Wendell <pw...@gmail.com>.

Hey Xiangrui,

Do you want to write up a straw man proposal based on this line of discussion?

- Patrick

On Mon, Mar 16, 2015 at 12:12 PM, Kevin Markey <ke...@oracle.com> wrote:
> In some applications, I have rather heavy use of Java enums which are needed
> for related Java APIs that the application uses.  And unfortunately, they
> are also used as keys.  As such, using the native hashcodes makes any
> function over keys unstable and unpredictable, so we now use Enum.name() as
> the key instead.  Oh well.  But it works and seems to work well.
>
> Kevin
>
>
> On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>>
>>    I have a strong dislike for java enum's due to the fact that they
>> are not stable across JVM's - if it undergoes serde, you end up with
>> unpredictable results at times [1].
>> One of the reasons why we prevent enum's from being key : though it is
>> highly possible users might depend on it internally and shoot
>> themselves in the foot.
>>
>> Would be better to keep away from them in general and use something more
>> stable.
>>
>> Regards,
>> Mridul
>>
>> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>>
>>
>> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
>>>
>>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with
>>> #4
>>> (with Xiangrui's final suggestion, especially making it sealed &
>>> available
>>> in Java), but I really think #2, java enums, are the best option.
>>>
>>> Java enums actually have some very real advantages over the other
>>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There
>>> has
>>> been endless debate in the Scala community about the problems with the
>>> approaches in Scala.  Very smart, level-headed Scala gurus have
>>> complained
>>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>>> not positive about that); there have been numerous well-thought out
>>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>>> always reject them.  IIRC the explanation for rejecting is basically that
>>> (a) enums aren't important enough for introducing some new special
>>> feature,
>>> scala's got bigger things to work on and (b) if you really need a good
>>> enum, just use java's enum.
>>>
>>> I doubt it really matters that much for Spark internals, which is why I
>>> think #4 is fine.  But I figured I'd give my spiel, because every
>>> developer
>>> loves language wars :)
>>>
>>> Imran
>>>
>>>
>>>
>>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>>> `case object` inside an `object` doesn't show up in Java. This is the
>>>> minimal code I found to make everything show up correctly in both
>>>> Scala and Java:
>>>>
>>>> sealed abstract class StorageLevel // cannot be a trait
>>>>
>>>> object StorageLevel {
>>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>>
>>>>    private[this] case object _DiskOnly extends StorageLevel
>>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>>> }
>>>>
>>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>>>> wrote:
>>>>>
>>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>>
>>>>> - Patrick
>>>>>
>>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>>>>
>>>> wrote:
>>>>>>
>>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>>>
>>>> should
>>>>>>
>>>>>> be defined within an object with the same name as the enumeration
>>>>>> (like
>>>>
>>>> we
>>>>>>
>>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>>
>>>>>> e.g. we SHOULD do:
>>>>>>
>>>>>> trait StorageLevel
>>>>>> object StorageLevel {
>>>>>>    case object MemoryOnly extends StorageLevel
>>>>>>    case object DiskOnly extends StorageLevel
>>>>>> }
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>>>
>>>> michael@databricks.com>
>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> #4 with a preference for CamelCaseEnums
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley
>>>>>>> <jo...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> another vote for #4
>>>>>>>> People are already used to adding "()" in Java.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>>
>>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>>
>>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>>
>>>>>>>>> Constant names should be in upper camel case. That is, if the
>>>>
>>>> member is
>>>>>>>>>
>>>>>>>>> final, immutable and it belongs to a package object or an object,
>>>>
>>>> it
>>>>>>>
>>>>>>> may
>>>>>>>>
>>>>>>>> be
>>>>>>>>>
>>>>>>>>> considered a constant (similar to Java'sstatic final members):
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     1. object Container {
>>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>>     3. }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> There are many places where we use enum-like types in Spark, but
>>>>
>>>> in
>>>>>>>>>>
>>>>>>>>>> different ways. Every approach has both pros and cons. I wonder
>>>>>>>>>> whether there should be an "official" approach for enum-like
>>>>
>>>> types in
>>>>>>>>>>
>>>>>>>>>> Spark.
>>>>>>>>>>
>>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>>>>>>>>>
>>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>>>>>>>>>>
>>>>>>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>>>>>>>>>>
>>>>>>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>>
>>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>>>>>>>>>>
>>>>>>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>>
>>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>>>>>>>>>>
>>>>>>>>>> It would be great if we have an "official" approach for this as
>>>>
>>>> well
>>>>>>>>>>
>>>>>>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
>>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>>>>>>
>>>>>>> thoughts?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Xiangrui
>>>>>>>>>>
>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Kevin Markey <ke...@oracle.com>.

In some applications, I have rather heavy use of Java enums which are 
needed for related Java APIs that the application uses.  And 
unfortunately, they are also used as keys.  As such, using the native 
hashcodes makes any function over keys unstable and unpredictable, so we 
now use Enum.name() as the key instead.  Oh well.  But it works and 
seems to work well.

Kevin

On 03/05/2015 09:49 PM, Mridul Muralidharan wrote:
>    I have a strong dislike for java enum's due to the fact that they
> are not stable across JVM's - if it undergoes serde, you end up with
> unpredictable results at times [1].
> One of the reasons why we prevent enum's from being key : though it is
> highly possible users might depend on it internally and shoot
> themselves in the foot.
>
> Would be better to keep away from them in general and use something more stable.
>
> Regards,
> Mridul
>
> [1] Having had to debug this issue for 2 weeks - I really really hate it.
>
>
> On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
>> I have a very strong dislike for #1 (scala enumerations).   I'm ok with #4
>> (with Xiangrui's final suggestion, especially making it sealed & available
>> in Java), but I really think #2, java enums, are the best option.
>>
>> Java enums actually have some very real advantages over the other
>> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There has
>> been endless debate in the Scala community about the problems with the
>> approaches in Scala.  Very smart, level-headed Scala gurus have complained
>> about their short-comings (Rex Kerr's name is coming to mind, though I'm
>> not positive about that); there have been numerous well-thought out
>> proposals to give Scala a better enum.  But the powers-that-be in Scala
>> always reject them.  IIRC the explanation for rejecting is basically that
>> (a) enums aren't important enough for introducing some new special feature,
>> scala's got bigger things to work on and (b) if you really need a good
>> enum, just use java's enum.
>>
>> I doubt it really matters that much for Spark internals, which is why I
>> think #4 is fine.  But I figured I'd give my spiel, because every developer
>> loves language wars :)
>>
>> Imran
>>
>>
>>
>> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>
>>> `case object` inside an `object` doesn't show up in Java. This is the
>>> minimal code I found to make everything show up correctly in both
>>> Scala and Java:
>>>
>>> sealed abstract class StorageLevel // cannot be a trait
>>>
>>> object StorageLevel {
>>>    private[this] case object _MemoryOnly extends StorageLevel
>>>    final val MemoryOnly: StorageLevel = _MemoryOnly
>>>
>>>    private[this] case object _DiskOnly extends StorageLevel
>>>    final val DiskOnly: StorageLevel = _DiskOnly
>>> }
>>>
>>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>>> wrote:
>>>> I like #4 as well and agree with Aaron's suggestion.
>>>>
>>>> - Patrick
>>>>
>>>> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>>> wrote:
>>>>> I'm cool with #4 as well, but make sure we dictate that the values
>>> should
>>>>> be defined within an object with the same name as the enumeration (like
>>> we
>>>>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>>>>
>>>>> e.g. we SHOULD do:
>>>>>
>>>>> trait StorageLevel
>>>>> object StorageLevel {
>>>>>    case object MemoryOnly extends StorageLevel
>>>>>    case object DiskOnly extends StorageLevel
>>>>> }
>>>>>
>>>>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>>> michael@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> #4 with a preference for CamelCaseEnums
>>>>>>
>>>>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> another vote for #4
>>>>>>> People are already used to adding "()" in Java.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>>>>> wrote:
>>>>>>>> #4 but with MemoryOnly (more scala-like)
>>>>>>>>
>>>>>>>> http://docs.scala-lang.org/style/naming-conventions.html
>>>>>>>>
>>>>>>>> Constants, Values, Variable and Methods
>>>>>>>>
>>>>>>>> Constant names should be in upper camel case. That is, if the
>>> member is
>>>>>>>> final, immutable and it belongs to a package object or an object,
>>> it
>>>>>> may
>>>>>>> be
>>>>>>>> considered a constant (similar to Java'sstatic final members):
>>>>>>>>
>>>>>>>>
>>>>>>>>     1. object Container {
>>>>>>>>     2.     val MyConstant = ...
>>>>>>>>     3. }
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> There are many places where we use enum-like types in Spark, but
>>> in
>>>>>>>>> different ways. Every approach has both pros and cons. I wonder
>>>>>>>>> whether there should be an "official" approach for enum-like
>>> types in
>>>>>>>>> Spark.
>>>>>>>>>
>>>>>>>>> 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>>>>>>>>
>>>>>>>>> * All types show up as Enumeration.Value in Java.
>>>>>>>>>
>>>>>>>>>
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>>>>>>>>> 2. Java's Enum (e.g., SaveMode, IOMode)
>>>>>>>>>
>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>> * Values doesn't show up in the ScalaDoc:
>>>>>>>>>
>>>>>>>>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>>>>>>>>> 3. Static fields in Java (e.g., TripletFields)
>>>>>>>>>
>>>>>>>>> * Implementation must be in a Java file.
>>>>>>>>> * Doesn't need "()" in Java code.
>>>>>>>>> * Values don't show up in the ScalaDoc:
>>>>>>>>>
>>>>>>>>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>>>>>>>>> 4. Objects in Scala. (e.g., StorageLevel)
>>>>>>>>>
>>>>>>>>> * Needs "()" in Java code.
>>>>>>>>> * Values show up in both ScalaDoc and JavaDoc:
>>>>>>>>>
>>>>>>>>>
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>>>>>>>>>
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>>>>>>>>> It would be great if we have an "official" approach for this as
>>> well
>>>>>>>>> as the naming convention for enum-like values ("MEMORY_ONLY" or
>>>>>>>>> "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>>>>> thoughts?
>>>>>>>>> Best,
>>>>>>>>> Xiangrui
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Mridul Muralidharan <mr...@gmail.com>.

  I have a strong dislike for java enum's due to the fact that they
are not stable across JVM's - if it undergoes serde, you end up with
unpredictable results at times [1].
One of the reasons why we prevent enum's from being key : though it is
highly possible users might depend on it internally and shoot
themselves in the foot.

Would be better to keep away from them in general and use something more stable.

Regards,
Mridul

[1] Having had to debug this issue for 2 weeks - I really really hate it.


On Thu, Mar 5, 2015 at 1:08 PM, Imran Rashid <ir...@cloudera.com> wrote:
> I have a very strong dislike for #1 (scala enumerations).   I'm ok with #4
> (with Xiangrui's final suggestion, especially making it sealed & available
> in Java), but I really think #2, java enums, are the best option.
>
> Java enums actually have some very real advantages over the other
> approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There has
> been endless debate in the Scala community about the problems with the
> approaches in Scala.  Very smart, level-headed Scala gurus have complained
> about their short-comings (Rex Kerr's name is coming to mind, though I'm
> not positive about that); there have been numerous well-thought out
> proposals to give Scala a better enum.  But the powers-that-be in Scala
> always reject them.  IIRC the explanation for rejecting is basically that
> (a) enums aren't important enough for introducing some new special feature,
> scala's got bigger things to work on and (b) if you really need a good
> enum, just use java's enum.
>
> I doubt it really matters that much for Spark internals, which is why I
> think #4 is fine.  But I figured I'd give my spiel, because every developer
> loves language wars :)
>
> Imran
>
>
>
> On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> `case object` inside an `object` doesn't show up in Java. This is the
>> minimal code I found to make everything show up correctly in both
>> Scala and Java:
>>
>> sealed abstract class StorageLevel // cannot be a trait
>>
>> object StorageLevel {
>>   private[this] case object _MemoryOnly extends StorageLevel
>>   final val MemoryOnly: StorageLevel = _MemoryOnly
>>
>>   private[this] case object _DiskOnly extends StorageLevel
>>   final val DiskOnly: StorageLevel = _DiskOnly
>> }
>>
>> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
>> wrote:
>> > I like #4 as well and agree with Aaron's suggestion.
>> >
>> > - Patrick
>> >
>> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
>> wrote:
>> >> I'm cool with #4 as well, but make sure we dictate that the values
>> should
>> >> be defined within an object with the same name as the enumeration (like
>> we
>> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
>> >>
>> >> e.g. we SHOULD do:
>> >>
>> >> trait StorageLevel
>> >> object StorageLevel {
>> >>   case object MemoryOnly extends StorageLevel
>> >>   case object DiskOnly extends StorageLevel
>> >> }
>> >>
>> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
>> michael@databricks.com>
>> >> wrote:
>> >>
>> >>> #4 with a preference for CamelCaseEnums
>> >>>
>> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>> >>> wrote:
>> >>>
>> >>> > another vote for #4
>> >>> > People are already used to adding "()" in Java.
>> >>> >
>> >>> >
>> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > > #4 but with MemoryOnly (more scala-like)
>> >>> > >
>> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> >>> > >
>> >>> > > Constants, Values, Variable and Methods
>> >>> > >
>> >>> > > Constant names should be in upper camel case. That is, if the
>> member is
>> >>> > > final, immutable and it belongs to a package object or an object,
>> it
>> >>> may
>> >>> > be
>> >>> > > considered a constant (similar to Java'sstatic final members):
>> >>> > >
>> >>> > >
>> >>> > >    1. object Container {
>> >>> > >    2.     val MyConstant = ...
>> >>> > >    3. }
>> >>> > >
>> >>> > >
>> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>> >>> > >
>> >>> > > > Hi all,
>> >>> > > >
>> >>> > > > There are many places where we use enum-like types in Spark, but
>> in
>> >>> > > > different ways. Every approach has both pros and cons. I wonder
>> >>> > > > whether there should be an "official" approach for enum-like
>> types in
>> >>> > > > Spark.
>> >>> > > >
>> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>> >>> > > >
>> >>> > > > * All types show up as Enumeration.Value in Java.
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> >>> > > >
>> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Values doesn't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> >>> > > >
>> >>> > > > 3. Static fields in Java (e.g., TripletFields)
>> >>> > > >
>> >>> > > > * Implementation must be in a Java file.
>> >>> > > > * Doesn't need "()" in Java code.
>> >>> > > > * Values don't show up in the ScalaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> >>> > > >
>> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> >>> > > >
>> >>> > > > * Needs "()" in Java code.
>> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> >>> > > >
>> >>> > > > It would be great if we have an "official" approach for this as
>> well
>> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>> >>> thoughts?
>> >>> > > >
>> >>> > > > Best,
>> >>> > > > Xiangrui
>> >>> > > >
>> >>> > > >
>> ---------------------------------------------------------------------
>> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> >>> > > >
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Imran Rashid <ir...@cloudera.com>.

I have a very strong dislike for #1 (scala enumerations).   I'm ok with #4
(with Xiangrui's final suggestion, especially making it sealed & available
in Java), but I really think #2, java enums, are the best option.

Java enums actually have some very real advantages over the other
approaches -- you get values(), valueOf(), EnumSet, and EnumMap.  There has
been endless debate in the Scala community about the problems with the
approaches in Scala.  Very smart, level-headed Scala gurus have complained
about their short-comings (Rex Kerr's name is coming to mind, though I'm
not positive about that); there have been numerous well-thought out
proposals to give Scala a better enum.  But the powers-that-be in Scala
always reject them.  IIRC the explanation for rejecting is basically that
(a) enums aren't important enough for introducing some new special feature,
scala's got bigger things to work on and (b) if you really need a good
enum, just use java's enum.

I doubt it really matters that much for Spark internals, which is why I
think #4 is fine.  But I figured I'd give my spiel, because every developer
loves language wars :)

Imran



On Thu, Mar 5, 2015 at 1:35 AM, Xiangrui Meng <me...@gmail.com> wrote:

> `case object` inside an `object` doesn't show up in Java. This is the
> minimal code I found to make everything show up correctly in both
> Scala and Java:
>
> sealed abstract class StorageLevel // cannot be a trait
>
> object StorageLevel {
>   private[this] case object _MemoryOnly extends StorageLevel
>   final val MemoryOnly: StorageLevel = _MemoryOnly
>
>   private[this] case object _DiskOnly extends StorageLevel
>   final val DiskOnly: StorageLevel = _DiskOnly
> }
>
> On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> > I like #4 as well and agree with Aaron's suggestion.
> >
> > - Patrick
> >
> > On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com>
> wrote:
> >> I'm cool with #4 as well, but make sure we dictate that the values
> should
> >> be defined within an object with the same name as the enumeration (like
> we
> >> do for StorageLevel). Otherwise we may pollute a higher namespace.
> >>
> >> e.g. we SHOULD do:
> >>
> >> trait StorageLevel
> >> object StorageLevel {
> >>   case object MemoryOnly extends StorageLevel
> >>   case object DiskOnly extends StorageLevel
> >> }
> >>
> >> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <
> michael@databricks.com>
> >> wrote:
> >>
> >>> #4 with a preference for CamelCaseEnums
> >>>
> >>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
> >>> wrote:
> >>>
> >>> > another vote for #4
> >>> > People are already used to adding "()" in Java.
> >>> >
> >>> >
> >>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
> >>> wrote:
> >>> >
> >>> > > #4 but with MemoryOnly (more scala-like)
> >>> > >
> >>> > > http://docs.scala-lang.org/style/naming-conventions.html
> >>> > >
> >>> > > Constants, Values, Variable and Methods
> >>> > >
> >>> > > Constant names should be in upper camel case. That is, if the
> member is
> >>> > > final, immutable and it belongs to a package object or an object,
> it
> >>> may
> >>> > be
> >>> > > considered a constant (similar to Java'sstatic final members):
> >>> > >
> >>> > >
> >>> > >    1. object Container {
> >>> > >    2.     val MyConstant = ...
> >>> > >    3. }
> >>> > >
> >>> > >
> >>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > > There are many places where we use enum-like types in Spark, but
> in
> >>> > > > different ways. Every approach has both pros and cons. I wonder
> >>> > > > whether there should be an "official" approach for enum-like
> types in
> >>> > > > Spark.
> >>> > > >
> >>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
> >>> > > >
> >>> > > > * All types show up as Enumeration.Value in Java.
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >>> > > >
> >>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
> >>> > > >
> >>> > > > * Implementation must be in a Java file.
> >>> > > > * Values doesn't show up in the ScalaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >>> > > >
> >>> > > > 3. Static fields in Java (e.g., TripletFields)
> >>> > > >
> >>> > > > * Implementation must be in a Java file.
> >>> > > > * Doesn't need "()" in Java code.
> >>> > > > * Values don't show up in the ScalaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >>> > > >
> >>> > > > 4. Objects in Scala. (e.g., StorageLevel)
> >>> > > >
> >>> > > > * Needs "()" in Java code.
> >>> > > > * Values show up in both ScalaDoc and JavaDoc:
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >>> > > >
> >>> > > > It would be great if we have an "official" approach for this as
> well
> >>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
> >>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
> >>> thoughts?
> >>> > > >
> >>> > > > Best,
> >>> > > > Xiangrui
> >>> > > >
> >>> > > >
> ---------------------------------------------------------------------
> >>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> > > > For additional commands, e-mail: dev-help@spark.apache.org
> >>> > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: enum-like types in Spark

Posted by Xiangrui Meng <me...@gmail.com>.

`case object` inside an `object` doesn't show up in Java. This is the
minimal code I found to make everything show up correctly in both
Scala and Java:

sealed abstract class StorageLevel // cannot be a trait

object StorageLevel {
  private[this] case object _MemoryOnly extends StorageLevel
  final val MemoryOnly: StorageLevel = _MemoryOnly

  private[this] case object _DiskOnly extends StorageLevel
  final val DiskOnly: StorageLevel = _DiskOnly
}

On Wed, Mar 4, 2015 at 8:10 PM, Patrick Wendell <pw...@gmail.com> wrote:
> I like #4 as well and agree with Aaron's suggestion.
>
> - Patrick
>
> On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com> wrote:
>> I'm cool with #4 as well, but make sure we dictate that the values should
>> be defined within an object with the same name as the enumeration (like we
>> do for StorageLevel). Otherwise we may pollute a higher namespace.
>>
>> e.g. we SHOULD do:
>>
>> trait StorageLevel
>> object StorageLevel {
>>   case object MemoryOnly extends StorageLevel
>>   case object DiskOnly extends StorageLevel
>> }
>>
>> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <mi...@databricks.com>
>> wrote:
>>
>>> #4 with a preference for CamelCaseEnums
>>>
>>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>>> wrote:
>>>
>>> > another vote for #4
>>> > People are already used to adding "()" in Java.
>>> >
>>> >
>>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>>> wrote:
>>> >
>>> > > #4 but with MemoryOnly (more scala-like)
>>> > >
>>> > > http://docs.scala-lang.org/style/naming-conventions.html
>>> > >
>>> > > Constants, Values, Variable and Methods
>>> > >
>>> > > Constant names should be in upper camel case. That is, if the member is
>>> > > final, immutable and it belongs to a package object or an object, it
>>> may
>>> > be
>>> > > considered a constant (similar to Java'sstatic final members):
>>> > >
>>> > >
>>> > >    1. object Container {
>>> > >    2.     val MyConstant = ...
>>> > >    3. }
>>> > >
>>> > >
>>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>>> > >
>>> > > > Hi all,
>>> > > >
>>> > > > There are many places where we use enum-like types in Spark, but in
>>> > > > different ways. Every approach has both pros and cons. I wonder
>>> > > > whether there should be an "official" approach for enum-like types in
>>> > > > Spark.
>>> > > >
>>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>>> > > >
>>> > > > * All types show up as Enumeration.Value in Java.
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>>> > > >
>>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>>> > > >
>>> > > > * Implementation must be in a Java file.
>>> > > > * Values doesn't show up in the ScalaDoc:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>>> > > >
>>> > > > 3. Static fields in Java (e.g., TripletFields)
>>> > > >
>>> > > > * Implementation must be in a Java file.
>>> > > > * Doesn't need "()" in Java code.
>>> > > > * Values don't show up in the ScalaDoc:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>>> > > >
>>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>>> > > >
>>> > > > * Needs "()" in Java code.
>>> > > > * Values show up in both ScalaDoc and JavaDoc:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>>> > > >
>>> > > >
>>> > >
>>> >
>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>>> > > >
>>> > > > It would be great if we have an "official" approach for this as well
>>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>>> thoughts?
>>> > > >
>>> > > > Best,
>>> > > > Xiangrui
>>> > > >
>>> > > > ---------------------------------------------------------------------
>>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>>> > > >
>>> > > >
>>> > >
>>> >
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Patrick Wendell <pw...@gmail.com>.

I like #4 as well and agree with Aaron's suggestion.

- Patrick

On Wed, Mar 4, 2015 at 6:07 PM, Aaron Davidson <il...@gmail.com> wrote:
> I'm cool with #4 as well, but make sure we dictate that the values should
> be defined within an object with the same name as the enumeration (like we
> do for StorageLevel). Otherwise we may pollute a higher namespace.
>
> e.g. we SHOULD do:
>
> trait StorageLevel
> object StorageLevel {
>   case object MemoryOnly extends StorageLevel
>   case object DiskOnly extends StorageLevel
> }
>
> On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> #4 with a preference for CamelCaseEnums
>>
>> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
>> wrote:
>>
>> > another vote for #4
>> > People are already used to adding "()" in Java.
>> >
>> >
>> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
>> wrote:
>> >
>> > > #4 but with MemoryOnly (more scala-like)
>> > >
>> > > http://docs.scala-lang.org/style/naming-conventions.html
>> > >
>> > > Constants, Values, Variable and Methods
>> > >
>> > > Constant names should be in upper camel case. That is, if the member is
>> > > final, immutable and it belongs to a package object or an object, it
>> may
>> > be
>> > > considered a constant (similar to Java'sstatic final members):
>> > >
>> > >
>> > >    1. object Container {
>> > >    2.     val MyConstant = ...
>> > >    3. }
>> > >
>> > >
>> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>> > >
>> > > > Hi all,
>> > > >
>> > > > There are many places where we use enum-like types in Spark, but in
>> > > > different ways. Every approach has both pros and cons. I wonder
>> > > > whether there should be an "official" approach for enum-like types in
>> > > > Spark.
>> > > >
>> > > > 1. Scala's Enumeration (e.g., SchedulingMode, WorkerState, etc)
>> > > >
>> > > > * All types show up as Enumeration.Value in Java.
>> > > >
>> > > >
>> > >
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>> > > >
>> > > > 2. Java's Enum (e.g., SaveMode, IOMode)
>> > > >
>> > > > * Implementation must be in a Java file.
>> > > > * Values doesn't show up in the ScalaDoc:
>> > > >
>> > > >
>> > >
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>> > > >
>> > > > 3. Static fields in Java (e.g., TripletFields)
>> > > >
>> > > > * Implementation must be in a Java file.
>> > > > * Doesn't need "()" in Java code.
>> > > > * Values don't show up in the ScalaDoc:
>> > > >
>> > > >
>> > >
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>> > > >
>> > > > 4. Objects in Scala. (e.g., StorageLevel)
>> > > >
>> > > > * Needs "()" in Java code.
>> > > > * Values show up in both ScalaDoc and JavaDoc:
>> > > >
>> > > >
>> > >
>> >
>> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>> > > >
>> > > >
>> > >
>> >
>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>> > > >
>> > > > It would be great if we have an "official" approach for this as well
>> > > > as the naming convention for enum-like values ("MEMORY_ONLY" or
>> > > > "MemoryOnly"). Personally, I like 4) with "MEMORY_ONLY". Any
>> thoughts?
>> > > >
>> > > > Best,
>> > > > Xiangrui
>> > > >
>> > > > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > > > For additional commands, e-mail: dev-help@spark.apache.org
>> > > >
>> > > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: enum-like types in Spark

Posted by Aaron Davidson <il...@gmail.com>.

I'm cool with #4 as well, but make sure we dictate that the values should
be defined within an object with the same name as the enumeration (like we
do for StorageLevel). Otherwise we may pollute a higher namespace.

e.g. we SHOULD do:

trait StorageLevel
object StorageLevel {
  case object MemoryOnly extends StorageLevel
  case object DiskOnly extends StorageLevel
}

On Wed, Mar 4, 2015 at 5:37 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> #4 with a preference for CamelCaseEnums
>
> On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
> wrote:
>
> > another vote for #4
> > People are already used to adding "()" in Java.
> >
> >
> > On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com>
> wrote:
> >
> > > #4 but with MemoryOnly (more scala-like)
> > >
> > > http://docs.scala-lang.org/style/naming-conventions.html
> > >
> > > Constants, Values, Variable and Methods
> > >
> > > Constant names should be in upper camel case. That is, if the member is
> > > final, immutable and it belongs to a package object or an object, it
> may
> > be
> > > considered a constant (similar to Java’sstatic final members):
> > >
> > >
> > >    1. object Container {
> > >    2.     val MyConstant = ...
> > >    3. }
> > >
> > >
> > > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> > >
> > > > Hi all,
> > > >
> > > > There are many places where we use enum-like types in Spark, but in
> > > > different ways. Every approach has both pros and cons. I wonder
> > > > whether there should be an “official” approach for enum-like types in
> > > > Spark.
> > > >
> > > > 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
> > > >
> > > > * All types show up as Enumeration.Value in Java.
> > > >
> > > >
> > >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> > > >
> > > > 2. Java’s Enum (e.g., SaveMode, IOMode)
> > > >
> > > > * Implementation must be in a Java file.
> > > > * Values doesn’t show up in the ScalaDoc:
> > > >
> > > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> > > >
> > > > 3. Static fields in Java (e.g., TripletFields)
> > > >
> > > > * Implementation must be in a Java file.
> > > > * Doesn’t need “()” in Java code.
> > > > * Values don't show up in the ScalaDoc:
> > > >
> > > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> > > >
> > > > 4. Objects in Scala. (e.g., StorageLevel)
> > > >
> > > > * Needs “()” in Java code.
> > > > * Values show up in both ScalaDoc and JavaDoc:
> > > >
> > > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> > > >
> > > >
> > >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> > > >
> > > > It would be great if we have an “official” approach for this as well
> > > > as the naming convention for enum-like values (“MEMORY_ONLY” or
> > > > “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any
> thoughts?
> > > >
> > > > Best,
> > > > Xiangrui
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > > For additional commands, e-mail: dev-help@spark.apache.org
> > > >
> > > >
> > >
> >
>

Re: enum-like types in Spark

Posted by Michael Armbrust <mi...@databricks.com>.

#4 with a preference for CamelCaseEnums

On Wed, Mar 4, 2015 at 5:29 PM, Joseph Bradley <jo...@databricks.com>
wrote:

> another vote for #4
> People are already used to adding "()" in Java.
>
>
> On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com> wrote:
>
> > #4 but with MemoryOnly (more scala-like)
> >
> > http://docs.scala-lang.org/style/naming-conventions.html
> >
> > Constants, Values, Variable and Methods
> >
> > Constant names should be in upper camel case. That is, if the member is
> > final, immutable and it belongs to a package object or an object, it may
> be
> > considered a constant (similar to Java’sstatic final members):
> >
> >
> >    1. object Container {
> >    2.     val MyConstant = ...
> >    3. }
> >
> >
> > 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
> >
> > > Hi all,
> > >
> > > There are many places where we use enum-like types in Spark, but in
> > > different ways. Every approach has both pros and cons. I wonder
> > > whether there should be an “official” approach for enum-like types in
> > > Spark.
> > >
> > > 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
> > >
> > > * All types show up as Enumeration.Value in Java.
> > >
> > >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> > >
> > > 2. Java’s Enum (e.g., SaveMode, IOMode)
> > >
> > > * Implementation must be in a Java file.
> > > * Values doesn’t show up in the ScalaDoc:
> > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> > >
> > > 3. Static fields in Java (e.g., TripletFields)
> > >
> > > * Implementation must be in a Java file.
> > > * Doesn’t need “()” in Java code.
> > > * Values don't show up in the ScalaDoc:
> > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> > >
> > > 4. Objects in Scala. (e.g., StorageLevel)
> > >
> > > * Needs “()” in Java code.
> > > * Values show up in both ScalaDoc and JavaDoc:
> > >
> > >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> > >
> > >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> > >
> > > It would be great if we have an “official” approach for this as well
> > > as the naming convention for enum-like values (“MEMORY_ONLY” or
> > > “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts?
> > >
> > > Best,
> > > Xiangrui
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > For additional commands, e-mail: dev-help@spark.apache.org
> > >
> > >
> >
>

Re: enum-like types in Spark

Posted by Joseph Bradley <jo...@databricks.com>.

another vote for #4
People are already used to adding "()" in Java.


On Wed, Mar 4, 2015 at 5:14 PM, Stephen Boesch <ja...@gmail.com> wrote:

> #4 but with MemoryOnly (more scala-like)
>
> http://docs.scala-lang.org/style/naming-conventions.html
>
> Constants, Values, Variable and Methods
>
> Constant names should be in upper camel case. That is, if the member is
> final, immutable and it belongs to a package object or an object, it may be
> considered a constant (similar to Java’sstatic final members):
>
>
>    1. object Container {
>    2.     val MyConstant = ...
>    3. }
>
>
> 2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:
>
> > Hi all,
> >
> > There are many places where we use enum-like types in Spark, but in
> > different ways. Every approach has both pros and cons. I wonder
> > whether there should be an “official” approach for enum-like types in
> > Spark.
> >
> > 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
> >
> > * All types show up as Enumeration.Value in Java.
> >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
> >
> > 2. Java’s Enum (e.g., SaveMode, IOMode)
> >
> > * Implementation must be in a Java file.
> > * Values doesn’t show up in the ScalaDoc:
> >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
> >
> > 3. Static fields in Java (e.g., TripletFields)
> >
> > * Implementation must be in a Java file.
> > * Doesn’t need “()” in Java code.
> > * Values don't show up in the ScalaDoc:
> >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
> >
> > 4. Objects in Scala. (e.g., StorageLevel)
> >
> > * Needs “()” in Java code.
> > * Values show up in both ScalaDoc and JavaDoc:
> >
> >
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
> >
> >
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
> >
> > It would be great if we have an “official” approach for this as well
> > as the naming convention for enum-like values (“MEMORY_ONLY” or
> > “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts?
> >
> > Best,
> > Xiangrui
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>

Re: enum-like types in Spark

Posted by Stephen Boesch <ja...@gmail.com>.

#4 but with MemoryOnly (more scala-like)

http://docs.scala-lang.org/style/naming-conventions.html

Constants, Values, Variable and Methods

Constant names should be in upper camel case. That is, if the member is
final, immutable and it belongs to a package object or an object, it may be
considered a constant (similar to Java’sstatic final members):


   1. object Container {
   2.     val MyConstant = ...
   3. }


2015-03-04 17:11 GMT-08:00 Xiangrui Meng <me...@gmail.com>:

> Hi all,
>
> There are many places where we use enum-like types in Spark, but in
> different ways. Every approach has both pros and cons. I wonder
> whether there should be an “official” approach for enum-like types in
> Spark.
>
> 1. Scala’s Enumeration (e.g., SchedulingMode, WorkerState, etc)
>
> * All types show up as Enumeration.Value in Java.
>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/scheduler/SchedulingMode.html
>
> 2. Java’s Enum (e.g., SaveMode, IOMode)
>
> * Implementation must be in a Java file.
> * Values doesn’t show up in the ScalaDoc:
>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.network.util.IOMode
>
> 3. Static fields in Java (e.g., TripletFields)
>
> * Implementation must be in a Java file.
> * Doesn’t need “()” in Java code.
> * Values don't show up in the ScalaDoc:
>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.graphx.TripletFields
>
> 4. Objects in Scala. (e.g., StorageLevel)
>
> * Needs “()” in Java code.
> * Values show up in both ScalaDoc and JavaDoc:
>
> http://spark.apache.org/docs/latest/api/scala/#org.apache.spark.storage.StorageLevel$
>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/storage/StorageLevel.html
>
> It would be great if we have an “official” approach for this as well
> as the naming convention for enum-like values (“MEMORY_ONLY” or
> “MemoryOnly”). Personally, I like 4) with “MEMORY_ONLY”. Any thoughts?
>
> Best,
> Xiangrui
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>