You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2015/10/29 16:17:35 UTC

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/9358

    [SPARK-11269][SQL][WIP] Java API support & test cases

    I followed `DataFrame` to add java friendly methods in `Dataset`, but should we create a `JavaDataset` like `JavaRDD`? Actually `Dataset` likes a combination of typed `RDD` and untyped `DataFrame`, I'm not sure which side to follow.
    
    TODO:
    
    * add more java friendly methods
    * add more test
    * java doc

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark java

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9358
    
----
commit c0fa79e42c56e88711c9ba2eabfb1afd859309fd
Author: Wenchen Fan <we...@databricks.com>
Date:   2015-10-29T15:13:07Z

    add java API for Dataset

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820035
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -499,6 +499,10 @@ class SQLContext private[sql](
         new Dataset[T](this, plan)
       }
     
    +  def createDataset[T : Encoder](data: java.util.List[T]): Dataset[T] = {
    --- End diff --
    
    Yeah, lets do that in another PR please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153395653
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44920/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r44097675
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -148,18 +152,37 @@ class Dataset[T] private(
       def transform[U](t: Dataset[T] => Dataset[U]): Dataset[U] = t(this)
     
       /**
    +   * (Scala-specific)
        * Returns a new [[Dataset]] that only contains elements where `func` returns `true`.
        * @since 1.6.0
        */
       def filter(func: T => Boolean): Dataset[T] = mapPartitions(_.filter(func))
     
       /**
    +   * (Java-specific)
    +   * Returns a new [[Dataset]] that only contains elements where `func` returns `true`.
    +   * @since 1.6.0
    +   */
    +  def filter(func: JFunction[T, java.lang.Boolean]): Dataset[T] =
    +    filter(t => func.call(t).booleanValue())
    +
    +  /**
    +   * (Scala-specific)
        * Returns a new [[Dataset]] that contains the result of applying `func` to each element.
        * @since 1.6.0
        */
       def map[U : Encoder](func: T => U): Dataset[U] = mapPartitions(_.map(func))
     
       /**
    +   * (Java-specific)
    +   * Returns a new [[Dataset]] that contains the result of applying `func` to each element.
    +   * @since 1.6.0
    +   */
    +  def map[U](func: JFunction[T, U], encoder: Encoder[U]): Dataset[U] =
    --- End diff --
    
    Same here, `MapFunction`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153661955
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46911787
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    --- End diff --
    
    Will do it soon. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153613475
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152212904
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43841085
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    RDD holds a class tag of the element type that it uses to construct the
    correct type of array when you do a collect.
    On Nov 4, 2015 4:57 AM, "Wenchen Fan" <no...@github.com> wrote:
    
    > In sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
    > <https://github.com/apache/spark/pull/9358#discussion_r43840774>:
    >
    > > @@ -441,6 +537,17 @@ class Dataset[T] private(
    > >    /** Collects the elements to an Array. */
    > >    def collect(): Array[T] = rdd.collect()
    > >
    > > +  /**
    > > +   * (Java-specific)
    > > +   * Collects the elements to a Java list.
    > > +   *
    > > +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    >
    > Will the class tag do the trick? I tried to define a generic class with
    > ClassTag:
    >
    > class MyTest[T : ClassTag] {
    >   def t(): Array[T] = null
    > }
    >
    > object MyTest {
    >   def apply[T](cls: Class[T]): MyTest[T] = {
    >     new MyTest[T]()(ClassTag(cls))
    >   }
    > }
    >
    > The return type of MyClass.t() is still Object at java side.
    > I also tried to use scala RDD at java side, the return type of
    > RDD.collect() is also Object.
    >
    > One possible solution is to define T <: AnyRef, but I think it's hard to
    > make it for Dataset or RDD.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/9358/files#r43840774>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152215303
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153395647
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43842039
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    I see, then we should have collect as list too.
    On Nov 4, 2015 5:19 AM, "Wenchen Fan" <no...@github.com> wrote:
    
    > In sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
    > <https://github.com/apache/spark/pull/9358#discussion_r43841577>:
    >
    > > @@ -441,6 +537,17 @@ class Dataset[T] private(
    > >    /** Collects the elements to an Array. */
    > >    def collect(): Array[T] = rdd.collect()
    > >
    > > +  /**
    > > +   * (Java-specific)
    > > +   * Collects the elements to a Java list.
    > > +   *
    > > +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    >
    > We can construct right type of array while calling RDD.collect, the
    > problem is the interface. At java side the return type of RDD.collect()
    > is java.lang.Object and we need to do a type cast, which is not friendly
    > to users.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/9358/files#r43841577>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43848905
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    --- End diff --
    
    INT sounds good...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152542269
  
    **[Test build #44686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44686/consoleFull)** for PR 9358 at commit [`9c089de`](https://github.com/apache/spark/commit/9c089de3af31d554e0f0effd94cc78665da72b66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153931876
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46652462
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def tuple[T1, T2](enc1: Encoder[T1], enc2: Encoder[T2]): Encoder[(T1, T2)] = {
    +    tuple(Seq(enc1, enc2).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2)]]
    +  }
    +
    +  def tuple[T1, T2, T3](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3]): Encoder[(T1, T2, T3)] = {
    +    tuple(Seq(enc1, enc2, enc3).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4]): Encoder[(T1, T2, T3, T4)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4, T5](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4],
    +      enc5: Encoder[T5]): Encoder[(T1, T2, T3, T4, T5)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4, enc5).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4, T5)]]
    +  }
    +
    +  private def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    --- End diff --
    
    We can hold it off until some use cases come out that need more than Tuple5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46657105
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def tuple[T1, T2](enc1: Encoder[T1], enc2: Encoder[T2]): Encoder[(T1, T2)] = {
    +    tuple(Seq(enc1, enc2).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2)]]
    +  }
    +
    +  def tuple[T1, T2, T3](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3]): Encoder[(T1, T2, T3)] = {
    +    tuple(Seq(enc1, enc2, enc3).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4]): Encoder[(T1, T2, T3, T4)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4, T5](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4],
    +      enc5: Encoder[T5]): Encoder[(T1, T2, T3, T4, T5)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4, enc5).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4, T5)]]
    +  }
    +
    +  private def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    --- End diff --
    
    Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153354839
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46650956
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def tuple[T1, T2](enc1: Encoder[T1], enc2: Encoder[T2]): Encoder[(T1, T2)] = {
    +    tuple(Seq(enc1, enc2).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2)]]
    +  }
    +
    +  def tuple[T1, T2, T3](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3]): Encoder[(T1, T2, T3)] = {
    +    tuple(Seq(enc1, enc2, enc3).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4]): Encoder[(T1, T2, T3, T4)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4, T5](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4],
    +      enc5: Encoder[T5]): Encoder[(T1, T2, T3, T4, T5)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4, enc5).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4, T5)]]
    +  }
    +
    +  private def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    --- End diff --
    
    @cloud-fan , does that mean the limit will be 22? Do you think we should at least add it up to Tuple22, which is the limit of Scala?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152215307
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44605/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43954409
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -17,9 +17,13 @@
     
     package org.apache.spark.sql
     
    +import scala.collection.JavaConverters._
    --- End diff --
    
    let's use explicit conversions instead of implicit ones.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43841577
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    We can construct right type of array while calling `RDD.collect`, the problem is the interface. At java side the return type of `RDD.collect()` is `java.lang.Object` and we need to do a type cast, which is not friendly to users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43841129
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820879
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def forByte: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def forShort: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def forInt: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def forLong: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def forFloat: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def forDouble: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def forString: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def typeTagOfTuple2[T1 : TypeTag, T2 : TypeTag]: TypeTag[(T1, T2)] = typeTag[(T1, T2)]
    --- End diff --
    
    `private`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153280889
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153517822
  
    It would be really great to also try and create a test suite that uses java 8 lambdas (though we may need to pull this into a separate PR as I'm not sure how many build changes we will need)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152215296
  
    **[Test build #44605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44605/consoleFull)** for PR 9358 at commit [`c0fa79e`](https://github.com/apache/spark/commit/c0fa79e42c56e88711c9ba2eabfb1afd859309fd).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820849
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def forByte: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def forShort: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def forInt: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def forLong: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def forFloat: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def forDouble: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def forString: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def typeTagOfTuple2[T1 : TypeTag, T2 : TypeTag]: TypeTag[(T1, T2)] = typeTag[(T1, T2)]
    +
    +  private def getTypeTag[T](c: Class[T]): TypeTag[T] = {
    +    import scala.reflect.api
    +
    +    // val mirror = runtimeMirror(c.getClassLoader)
    +    val mirror = rootMirror
    +    val sym = mirror.staticClass(c.getName)
    +    val tpe = sym.selfType
    +    TypeTag(mirror, new api.TypeCreator {
    +      def apply[U <: api.Universe with Singleton](m: api.Mirror[U]) =
    +        if (m eq mirror) tpe.asInstanceOf[U # Type]
    +        else throw new IllegalArgumentException(
    +          s"Type tag defined in $mirror cannot be migrated to other mirrors.")
    +    })
    +  }
    +
    +  def forTuple2[T1, T2](c1: Class[T1], c2: Class[T2]): Encoder[(T1, T2)] = {
    --- End diff --
    
    How about just `forTuple`, the type of tuple returned is obvious from the number of arguments.  We should also add at least up to tuple 5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152580649
  
    **[Test build #44686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44686/consoleFull)** for PR 9358 at commit [`9c089de`](https://github.com/apache/spark/commit/9c089de3af31d554e0f0effd94cc78665da72b66).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153613507
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46657310
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    --- End diff --
    
    @cloud-fan Could you share me your idea why we do not add the other basic types like DecimalType, DateType and TimestampType? Thank you! 
    
    DecimalType -> java.math.BigDecimal
    DateType -> java.sql.Date
    TimestampType -> java.sql.Timestamp


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46650981
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def forByte: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def forShort: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def forInt: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def forLong: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def forFloat: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def forDouble: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def forString: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def typeTagOfTuple2[T1 : TypeTag, T2 : TypeTag]: TypeTag[(T1, T2)] = typeTag[(T1, T2)]
    +
    +  private def getTypeTag[T](c: Class[T]): TypeTag[T] = {
    +    import scala.reflect.api
    +
    +    // val mirror = runtimeMirror(c.getClassLoader)
    +    val mirror = rootMirror
    +    val sym = mirror.staticClass(c.getName)
    +    val tpe = sym.selfType
    +    TypeTag(mirror, new api.TypeCreator {
    +      def apply[U <: api.Universe with Singleton](m: api.Mirror[U]) =
    +        if (m eq mirror) tpe.asInstanceOf[U # Type]
    +        else throw new IllegalArgumentException(
    +          s"Type tag defined in $mirror cannot be migrated to other mirrors.")
    +    })
    +  }
    +
    +  def forTuple2[T1, T2](c1: Class[T1], c2: Class[T2]): Encoder[(T1, T2)] = {
    --- End diff --
    
    @marmbrus Any reason why it is tuple 5, instead of tuple 22 which is the current limit of Scala?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r44097663
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -148,18 +152,37 @@ class Dataset[T] private(
       def transform[U](t: Dataset[T] => Dataset[U]): Dataset[U] = t(this)
     
       /**
    +   * (Scala-specific)
        * Returns a new [[Dataset]] that only contains elements where `func` returns `true`.
        * @since 1.6.0
        */
       def filter(func: T => Boolean): Dataset[T] = mapPartitions(_.filter(func))
     
       /**
    +   * (Java-specific)
    +   * Returns a new [[Dataset]] that only contains elements where `func` returns `true`.
    +   * @since 1.6.0
    +   */
    +  def filter(func: JFunction[T, java.lang.Boolean]): Dataset[T] =
    --- End diff --
    
    After talking with @rxin we should probably create `FilterFunction` both because here we can avoid boxing and also because this might be less confusing to java users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153954466
  
    **[Test build #45086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45086/consoleFull)** for PR 9358 at commit [`d8d5a19`](https://github.com/apache/spark/commit/d8d5a19075d2749b841db1f0aa781cb10efd4078).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class GetInternalRowField(child: Expression, ordinal: Int, dataType: DataType)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152723021
  
    ping @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153354862
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153954542
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153283702
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44904/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153395466
  
    **[Test build #44920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44920/consoleFull)** for PR 9358 at commit [`0eea82c`](https://github.com/apache/spark/commit/0eea82ce19c8676580ca5279ecfe70e81063b373).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153661962
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45003/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43838362
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    --- End diff --
    
    Encoder.int seems simpler.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153616627
  
    **[Test build #45003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45003/consoleFull)** for PR 9358 at commit [`fbf791e`](https://github.com/apache/spark/commit/fbf791ebaf535b4426efcb4612bd11788b5e97d6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43814406
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    +   * Java side is `java.lang.Object`, which is not easy to use.  Java user can use this method
    +   * instead and keep the generic type for result.
    --- End diff --
    
    @since


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153281190
  
    **[Test build #44904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44904/consoleFull)** for PR 9358 at commit [`16a4401`](https://github.com/apache/spark/commit/16a4401b27f57a1fb13127fbcb36fc15c191ec4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153661694
  
    **[Test build #45003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45003/consoleFull)** for PR 9358 at commit [`fbf791e`](https://github.com/apache/spark/commit/fbf791ebaf535b4426efcb4612bd11788b5e97d6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `case class GetInternalRowField(child: Expression, ordinal: Int, dataType: DataType)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r46874701
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    --- End diff --
    
    We should add these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by yhuai <gi...@git.apache.org>.

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43685538
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -409,6 +443,9 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  def jcollect(): java.util.List[T] =
    --- End diff --
    
    That `Array` should be a java array? Why it does not work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152540472
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43840774
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    Will the class tag do the trick? I tried to define a generic class with ClassTag:
    ```
    class MyTest[T : ClassTag] {
      def t(): Array[T] = null
    }
    
    object MyTest {
      def apply[T](cls: Class[T]): MyTest[T] = {
        new MyTest[T]()(ClassTag(cls))
      }
    }
    ```
    
    The return type of `MyClass.t()` is still `Object` at java side.
    I also tried to use scala RDD at java side, the return type of `RDD.collect()` is also `Object`.
    
    One possible solution is to define `T <: AnyRef`, but I think it's hard to make it for `Dataset` or `RDD`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820335
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -499,6 +499,10 @@ class SQLContext private[sql](
         new Dataset[T](this, plan)
       }
     
    +  def createDataset[T : Encoder](data: java.util.List[T]): Dataset[T] = {
    --- End diff --
    
    Oh, you already did it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43507552
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -409,6 +443,9 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  def jcollect(): java.util.List[T] =
    --- End diff --
    
    I do want to avoid this `jcollect`, but our `collect` with return type `Array[T]` seems not working at java side, any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153954545
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45086/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152214696
  
    **[Test build #44605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44605/consoleFull)** for PR 9358 at commit [`c0fa79e`](https://github.com/apache/spark/commit/c0fa79e42c56e88711c9ba2eabfb1afd859309fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan closed the pull request at:

    https://github.com/apache/spark/pull/9358


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43849661
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +37,120 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def BOOLEAN: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def BYTE: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def SHORT: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def INT: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def LONG: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def FLOAT: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def DOUBLE: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def STRING: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def tuple[T1, T2](enc1: Encoder[T1], enc2: Encoder[T2]): Encoder[(T1, T2)] = {
    +    tuple(Seq(enc1, enc2).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2)]]
    +  }
    +
    +  def tuple[T1, T2, T3](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3]): Encoder[(T1, T2, T3)] = {
    +    tuple(Seq(enc1, enc2, enc3).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4]): Encoder[(T1, T2, T3, T4)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4)]]
    +  }
    +
    +  def tuple[T1, T2, T3, T4, T5](
    +      enc1: Encoder[T1],
    +      enc2: Encoder[T2],
    +      enc3: Encoder[T3],
    +      enc4: Encoder[T4],
    +      enc5: Encoder[T5]): Encoder[(T1, T2, T3, T4, T5)] = {
    +    tuple(Seq(enc1, enc2, enc3, enc4, enc5).map(_.asInstanceOf[ExpressionEncoder[_]]))
    +      .asInstanceOf[ExpressionEncoder[(T1, T2, T3, T4, T5)]]
    +  }
    +
    +  private def tuple(encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_] = {
    --- End diff --
    
    I came out with a new approach to create encoder at java side, which can also support nested tuples as well. I also keep the old code below and you can judge which way is better.
    
    cc @marmbrus  @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152580811
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153933812
  
    **[Test build #45086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45086/consoleFull)** for PR 9358 at commit [`d8d5a19`](https://github.com/apache/spark/commit/d8d5a19075d2749b841db1f0aa781cb10efd4078).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43398228
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    +  def forByte: Encoder[java.lang.Byte] = ExpressionEncoder(flat = true)
    +  def forShort: Encoder[java.lang.Short] = ExpressionEncoder(flat = true)
    +  def forInt: Encoder[java.lang.Integer] = ExpressionEncoder(flat = true)
    +  def forLong: Encoder[java.lang.Long] = ExpressionEncoder(flat = true)
    +  def forFloat: Encoder[java.lang.Float] = ExpressionEncoder(flat = true)
    +  def forDouble: Encoder[java.lang.Double] = ExpressionEncoder(flat = true)
    +  def forString: Encoder[java.lang.String] = ExpressionEncoder(flat = true)
    +
    +  def typeTagOfTuple2[T1 : TypeTag, T2 : TypeTag]: TypeTag[(T1, T2)] = typeTag[(T1, T2)]
    +
    +  private def getTypeTag[T](c: Class[T]): TypeTag[T] = {
    --- End diff --
    
    This is really really hacky, I'm not sure if I did it right...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153283701
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153931832
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152540428
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152212849
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820397
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -441,6 +537,17 @@ class Dataset[T] private(
       /** Collects the elements to an Array. */
       def collect(): Array[T] = rdd.collect()
     
    +  /**
    +   * (Java-specific)
    +   * Collects the elements to a Java list.
    +   *
    +   * Due to the incompatibility problem between Scala and Java, the return type of [[collect()]] at
    --- End diff --
    
    This just means that the RDD has the wrong classtag.  We need to find a way to pass the classtag from the encoder before calling collect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43820524
  
    --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package test.org.apache.spark.sql;
    +
    +import java.io.Serializable;
    +import java.util.Arrays;
    +import java.util.Iterator;
    +import java.util.LinkedList;
    +import java.util.List;
    +
    +import org.apache.spark.api.java.function.FlatMapFunction;
    +import org.apache.spark.sql.catalyst.encoders.Encoder;
    +import scala.Tuple2;
    +
    +import org.apache.spark.api.java.function.Function;
    +import org.junit.*;
    +
    +import org.apache.spark.SparkContext;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.sql.catalyst.encoders.Encoder$;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.test.TestSQLContext;
    +
    +public class JavaDatasetSuite implements Serializable {
    +  private transient JavaSparkContext jsc;
    +  private transient TestSQLContext context;
    +
    +  @Before
    +  public void setUp() {
    +    // Trigger static initializer of TestData
    +    SparkContext sc = new SparkContext("local[*]", "testing");
    +    jsc = new JavaSparkContext(sc);
    +    context = new TestSQLContext(sc);
    +    context.loadTestData();
    +  }
    +
    +  @After
    +  public void tearDown() {
    +    context.sparkContext().stop();
    +    context = null;
    +    jsc = null;
    +  }
    +
    +  @Test
    +  public void testCommonOperation() {
    +    List<String> data = Arrays.asList("hello", "world");
    +    Dataset<String> ds = context.createDataset(data, Encoder$.MODULE$.forString());
    --- End diff --
    
    Yeah, scala should create static methods so that you can just call `Encoder.forString()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153280909
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43507700
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -499,6 +499,10 @@ class SQLContext private[sql](
         new Dataset[T](this, plan)
       }
     
    +  def createDataset[T : Encoder](data: java.util.List[T]): Dataset[T] = {
    --- End diff --
    
    I found that we can't create `Dataset` from `RDD`, should we add it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-154577600
  
    @cloud-fan can you close this one? I'm going to merge https://github.com/apache/spark/pull/9528 for you


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153283678
  
    **[Test build #44904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44904/consoleFull)** for PR 9358 at commit [`16a4401`](https://github.com/apache/spark/commit/16a4401b27f57a1fb13127fbcb36fc15c191ec4b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by marmbrus <gi...@git.apache.org>.

Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43821203
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    --- End diff --
    
    @rxin @mateiz thoughts on naming here?
    
    `forInt` `int` `INT`?
    
    `forTuple` `tuple`, `tuple2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153335332
  
    cc @yhuai @marmbrus  , according to the jenkins result, scala method which returns generic array will return `Object` at java side, `DataFrame.collect` returns `Array[Row]` and has no problem.
    
    The reason is that: in scala `Array[Int]`, `Array[String]` are just `Array` with type parameter, however, at java side, `int[]` and `String[]` is different type, and there is exactly one supertype of them: `java.lang.Object`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43843372
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala ---
    @@ -37,3 +35,39 @@ trait Encoder[T] extends Serializable {
       /** A ClassTag that can be used to construct and Array to contain a collection of `T`. */
       def clsTag: ClassTag[T]
     }
    +
    +object Encoder {
    +  import scala.reflect.runtime.universe._
    +
    +  def forBoolean: Encoder[java.lang.Boolean] = ExpressionEncoder(flat = true)
    --- End diff --
    
    Actually we cannot, as `int` is a keyword in java. Maybe `INT`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152580814
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44686/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-152542390
  
    Now most of our `Dataset` APIs can work at java side, I'm waiting for some feedback on my approach so that I can improve it(e.g. how to create encoder for tuples).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL][WIP] Java API support & tes...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43398352
  
    --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java ---
    @@ -0,0 +1,111 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package test.org.apache.spark.sql;
    +
    +import java.io.Serializable;
    +import java.util.Arrays;
    +import java.util.Iterator;
    +import java.util.LinkedList;
    +import java.util.List;
    +
    +import org.apache.spark.api.java.function.FlatMapFunction;
    +import org.apache.spark.sql.catalyst.encoders.Encoder;
    +import scala.Tuple2;
    +
    +import org.apache.spark.api.java.function.Function;
    +import org.junit.*;
    +
    +import org.apache.spark.SparkContext;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.sql.catalyst.encoders.Encoder$;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.test.TestSQLContext;
    +
    +public class JavaDatasetSuite implements Serializable {
    +  private transient JavaSparkContext jsc;
    +  private transient TestSQLContext context;
    +
    +  @Before
    +  public void setUp() {
    +    // Trigger static initializer of TestData
    +    SparkContext sc = new SparkContext("local[*]", "testing");
    +    jsc = new JavaSparkContext(sc);
    +    context = new TestSQLContext(sc);
    +    context.loadTestData();
    +  }
    +
    +  @After
    +  public void tearDown() {
    +    context.sparkContext().stop();
    +    context = null;
    +    jsc = null;
    +  }
    +
    +  @Test
    +  public void testCommonOperation() {
    +    List<String> data = Arrays.asList("hello", "world");
    +    Dataset<String> ds = context.createDataset(data, Encoder$.MODULE$.forString());
    --- End diff --
    
    `Encoder$.MODULE$` looks very weird to me, is there a better way to use scala object in java?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9358#issuecomment-153356427
  
    **[Test build #44920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44920/consoleFull)** for PR 9358 at commit [`0eea82c`](https://github.com/apache/spark/commit/0eea82ce19c8676580ca5279ecfe70e81063b373).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11269][SQL] Java API support & test cas...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9358#discussion_r43954586
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -17,9 +17,13 @@
     
     package org.apache.spark.sql
     
    +import scala.collection.JavaConverters._
    --- End diff --
    
    JavaConverters _is_ the explicit one (`.asScala` / `.asJava`); the more implicit one was banned by me in a Scalastyle updtae.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org