You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2014/10/24 05:42:54 UTC

[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/2919

    [SPARK-3572] [sql] [mllib]  User-Defined Types and MLlib Datasets

    This PR adds User-Defined Types (UDTs) to SQL.  It is a precursor to using SchemaRDD as a Dataset for the new MLlib API.
    
    ## Main additions
    
    Public API
    * SQL
     * Added annotation SQLUserDefinedType (DeveloperApi)
     * Added UDTRegistry (global object)
     * Added abstract class UserDefinedType
    * MLlib
     * Vector, DenseVector, SparseVector are annotated with SQLUserDefinedType
    
    Internals
    * Made MLlib depend on SparkSQL.
    * SQL
     * ScalaReflection
      * Methods for converting between Scala and Catalyst types now take DataType.
       * convertRowToScala added in several locations in SQL
      * schemaFor checks for SQLUserDefinedType annotation and checks UDTRegistry
    * MLlib
     * Added VectorUDT, DenseVectorUDT, SparseVectorUDT (private[spark])
    
    Examples
    * /examples/mllib/DatasetExample.scala: Demonstrates implicit conversion of RDD[LabeledPoint] to SchemaRDD
    
    Unit Tests
    * mllib/rdd/DatasetSuite.scala: Tests *VectorUDT
    * sql/UserDefinedTypeSuite.scala: Tests fake version of DenseVector
    
    ## Design decisions
    
    * UDTs override types natively recognized by SQL.
    
    * Question: Should users be able to override primitive or built-in types?
    
    ## Items left for future PRs
    
    * Java and Python APIs
    * Serialization (Parquet, etc.)
    
    CC: @mengxr @marmbrus

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark sql-udt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2919.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2919
    
----
commit 7060cdd719a81b243f82f7e21a165a492daf79a1
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-03T02:06:49Z

    Adding UserDefinedType to SQL, not done yet.

commit 48d644de752df6e34b630991a296d46cb8049247
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-03T02:20:01Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 5b6612848e30b9c415d46c93306f8cdacdc87ea7
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-03T19:47:29Z

    Merge remote-tracking branch 'upstream/master' into sql-udt
    
    Conflicts:
    	sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala

commit 3d94153b85b972a2aace5296eb6e11dceecbba8e
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-04T01:04:32Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit b9df66e6fc7dff838065212be5277406f562d6c6
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-06T16:54:51Z

    Still working on UDTs

commit 1dc68146fc85dc32abfe1fb389a263c48bcd7c3f
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-06T17:06:03Z

    Merge remote-tracking branch 'upstream/master' into sql-udt
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala

commit 92891b9ac2a11e80417aae371564f1b50725677e
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-06T20:18:26Z

    still working on UDTs

commit f91e6afd522b615c69f2a2bc815975044e3daa34
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-07T02:10:43Z

    still working on UDTs

commit d4b3209836c18dd6bacb13e6dfeeb90e6a53bc58
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-07T02:15:25Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 2f835b78fbd7f49e80aceb6000292df8e9de4b54
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-07T19:25:45Z

    more udts...

commit 283a8aaf6a77496b7a0ff8d0c2a4fea9429924cc
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-07T22:51:07Z

    commented out convertRowToScala for debugging

commit 521eb945357805992040da917ed89b73f84fd089
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-07T23:43:30Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 86815d1600c776b1d4cdc3d93c748729afd635ac
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-08T02:22:10Z

    basic UDT is working, but deserialization has yet to be done

commit 007c84fb7ba205083ba72086733219a9e5aa88f4
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-08T02:31:59Z

    removed old udt suite

commit 8aa3b20f825eb0469c6e13cb67d25047a58ddb43
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T03:33:06Z

    Merge remote-tracking branch 'upstream/master' into sql-udt
    
    Conflicts:
    	sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala

commit 19ae3f6a72b9b715360efa76a03a142e71d8c6be
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T19:39:44Z

    udts

commit ef010b553a2500758dedc19d8ef47c8db7d21ee9
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T19:39:49Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 8b2222fab864047ce88860f9180b1fe6f9fd8258
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T20:09:15Z

    udts

commit f02b01def0731d956fd708df737c6043bb68b019
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T21:18:41Z

    udt finallly working

commit ceb886e6a1ad52e45d85eb76da28c9b172d7193c
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T21:48:33Z

    some cleanups

commit 47de90af0ab29cbaa712c6a4e13d312edd265108
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-09T21:56:09Z

    more cleanups

commit b8d0adeb9fbd6ed74a67bdfd8725c73d104aa86a
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T17:33:37Z

    Changing UDT to annotation

commit 7fae92842b3e9786a2bd6fec04b307a7023ad837
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T18:53:27Z

    udt annotation now working

commit 530022eb1f0243e38cf49a67df62d60c38cf975f
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T18:53:37Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 77a03056f048f070eb288c27ca42da2fd57a72b1
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T20:13:35Z

    renamed UDT types

commit db093877d728e4d76e063eb63c1c91c938a3a63b
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T21:02:32Z

    blah

commit 494347741ec949c499faaf7baa130d12fd988d93
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-10T22:14:29Z

    Added MLlib dependency on SQL.

commit df1e069ed08eb55cc5b03388f784f97fc481d492
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-17T18:58:01Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit d41a5963e81337f80c9f7286b161a9fe18257e15
Author: Joseph K. Bradley <jo...@gmail.com>
Date:   2014-10-18T00:24:10Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

commit 24b054bca6bb43b236835ed7d9848064c5d5d130
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2014-10-20T19:39:42Z

    Merge remote-tracking branch 'upstream/master' into sql-udt

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61331079
  
      [Test build #22634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22634/consoleFull) for   PR 2919 at commit [`4d65933`](https://github.com/apache/spark/commit/4d65933e1462e33ce457809c981618ee46c93bcd).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61319910
  
      [Test build #22623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22623/consoleFull) for   PR 2919 at commit [`ce583e3`](https://github.com/apache/spark/commit/ce583e373bc69b7bad207e405423ecbf6aaca596).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class DataType implements Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61312119
  
      [Test build #22623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22623/consoleFull) for   PR 2919 at commit [`ce583e3`](https://github.com/apache/spark/commit/ce583e373bc69b7bad207e405423ecbf6aaca596).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19708019
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala ---
    @@ -173,6 +173,8 @@ private[parquet] class RowWriteSupport extends WriteSupport[Row] with Logging {
       private[parquet] def writeValue(schema: DataType, value: Any): Unit = {
         if (value != null) {
           schema match {
    +        // Check UDT first since UDTs can override other types
    --- End diff --
    
    Same here, I don't think this has to be first.  (No reason to move it, I'd just remove the misleading comment)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60992364
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22462/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60984275
  
      [Test build #22462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22462/consoleFull) for   PR 2919 at commit [`a459956`](https://github.com/apache/spark/commit/a4599564766574613ec88a1abc8b806ef155c0e0).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61222239
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22597/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61222236
  
      [Test build #22597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22597/consoleFull) for   PR 2919 at commit [`81ecfc3`](https://github.com/apache/spark/commit/81ecfc32cf7332fcc36da0d755fa66cfd1eb6da2).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19708010
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/java/Row.scala ---
    @@ -101,7 +101,7 @@ class Row(private[spark] val row: ScalaRow) extends Serializable {
       override def equals(other: Any): Boolean = other match {
         case that: Row =>
           (that canEqual this) &&
    -        row == that.row
    +        row == that.row // Should this be row.equals(that.row)?
    --- End diff --
    
    [In Scala `==` routes to equals](http://stackoverflow.com/questions/7681161/whats-the-difference-between-and-equals-in-scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19707996
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/java/JavaSQLContext.scala ---
    @@ -86,9 +87,13 @@ class JavaSQLContext(val sqlContext: SQLContext) extends UDFRegistration {
     
       /**
        * Applies a schema to an RDD of Java Beans.
    +   *
    +   * WARNING: The ordering of elements in the schema may differ from Scala.
    --- End diff --
    
    Thanks for discovering/adding this.  Instead of saying that it differs from Scala, just say that since there is no guaranteed ordering for fields in a Java Bean `SELECT *` queries will return the columns in an undefined order.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60992355
  
      [Test build #22462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22462/consoleFull) for   PR 2919 at commit [`a459956`](https://github.com/apache/spark/commit/a4599564766574613ec88a1abc8b806ef155c0e0).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60827775
  
      [Test build #22375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22375/consoleFull) for   PR 2919 at commit [`bbb862a`](https://github.com/apache/spark/commit/bbb862aee87bc45338a2f44ed6d89378dfb1c503).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61164747
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22557/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61395289
  
    @marmbrus  @mengxr Thanks for all the help with this PR!  I merged and updated based on comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61397679
  
      [Test build #22756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22756/consoleFull) for   PR 2919 at commit [`f8002b4`](https://github.com/apache/spark/commit/f8002b402848790b591ff20aeec00e4cb8f1a79c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60830627
  
      [Test build #22376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22376/consoleFull) for   PR 2919 at commit [`b74251d`](https://github.com/apache/spark/commit/b74251d05071a046e7b0e7eee406805f4fe83ff7).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61163804
  
      [Test build #22557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22557/consoleFull) for   PR 2919 at commit [`9c175e9`](https://github.com/apache/spark/commit/9c175e9220a5be1add0b2f7eb38e67a030720439).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61319918
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22623/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60703036
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22315/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60441226
  
      [Test build #22150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22150/consoleFull) for   PR 2919 at commit [`8ca2339`](https://github.com/apache/spark/commit/8ca2339a2eadb0304856c2d8029caa21043670a3).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61397090
  
      [Test build #22747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22747/consoleFull) for   PR 2919 at commit [`b4a2803`](https://github.com/apache/spark/commit/b4a2803fc43bf88147c5ae74ed2f1d2636c4f9a3).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60427468
  
      [Test build #22148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22148/consoleFull) for   PR 2919 at commit [`716c19f`](https://github.com/apache/spark/commit/716c19f5782d3ba7a7307bf14c63fbf7a3b27719).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61223274
  
      [Test build #22599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22599/consoleFull) for   PR 2919 at commit [`94acb99`](https://github.com/apache/spark/commit/94acb9939357b5048bb9aad867566b70758c8a09).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class BernoulliCellSampler[T](lb: Double, ub: Double, complement: Boolean = false)`
      * `class BernoulliSampler[T: ClassTag](fraction: Double) extends RandomSampler[T, T] `
      * `class PoissonSampler[T: ClassTag](fraction: Double) extends RandomSampler[T, T] `
      * `class GapSamplingIterator[T: ClassTag](`
      * `class GapSamplingReplacementIterator[T: ClassTag](`
      * `class JavaModelWrapper(object):`
      * `class JavaVectorTransformer(JavaModelWrapper, VectorTransformer):`
      * `class StandardScalerModel(JavaVectorTransformer):`
      * `class IDFModel(JavaVectorTransformer):`
      * `class Word2VecModel(JavaVectorTransformer):`
      * `class MatrixFactorizationModel(JavaModelWrapper):`
      * `class MultivariateStatisticalSummary(JavaModelWrapper):`
      * `class DecisionTreeModel(JavaModelWrapper):`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class DataType implements Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61222747
  
      [Test build #22599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22599/consoleFull) for   PR 2919 at commit [`94acb99`](https://github.com/apache/spark/commit/94acb9939357b5048bb9aad867566b70758c8a09).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60429549
  
      [Test build #22150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22150/consoleFull) for   PR 2919 at commit [`8ca2339`](https://github.com/apache/spark/commit/8ca2339a2eadb0304856c2d8029caa21043670a3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61319152
  
      [Test build #22628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22628/consoleFull) for   PR 2919 at commit [`1b8cda3`](https://github.com/apache/spark/commit/1b8cda3af82559636a370ceb14ec75791686d2f5).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61396751
  
    I see the big Decimal patch is in...merging now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19708022
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTypes.scala ---
    @@ -253,6 +253,10 @@ private[parquet] object ParquetTypesConverter extends Logging {
             new ParquetPrimitiveType(repetition, primitiveType, name, originalType.orNull)
         }.getOrElse {
           ctype match {
    +        // Check UDT first since UDTs can override other types
    +        case udt: UserDefinedType[_] => {
    --- End diff --
    
    Same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60843333
  
      [Test build #22376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22376/consoleFull) for   PR 2919 at commit [`b74251d`](https://github.com/apache/spark/commit/b74251d05071a046e7b0e7eee406805f4fe83ff7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61323390
  
      [Test build #22634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22634/consoleFull) for   PR 2919 at commit [`4d65933`](https://github.com/apache/spark/commit/4d65933e1462e33ce457809c981618ee46c93bcd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19447499
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -492,3 +494,24 @@ case class MapType(
           ("valueType" -> valueType.jsonValue) ~
           ("valueContainsNull" -> valueContainsNull)
     }
    +
    +/**
    + * ::DeveloperApi::
    + * The data type for User Defined Types.
    + */
    +@DeveloperApi
    --- End diff --
    
    Will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60437120
  
      [Test build #22148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22148/consoleFull) for   PR 2919 at commit [`716c19f`](https://github.com/apache/spark/commit/716c19f5782d3ba7a7307bf14c63fbf7a3b27719).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61395355
  
      [Test build #22747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22747/consoleFull) for   PR 2919 at commit [`b4a2803`](https://github.com/apache/spark/commit/b4a2803fc43bf88147c5ae74ed2f1d2636c4f9a3).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60345199
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22112/consoleFull) for   PR 2919 at commit [`3de3d76`](https://github.com/apache/spark/commit/3de3d768951f020bf0876da2f40cc098210fcf05).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60437129
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22148/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60840966
  
      [Test build #22375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22375/consoleFull) for   PR 2919 at commit [`bbb862a`](https://github.com/apache/spark/commit/bbb862aee87bc45338a2f44ed6d89378dfb1c503).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61432277
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22776/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61223276
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22599/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61429419
  
      [Test build #22776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22776/consoleFull) for   PR 2919 at commit [`e13cd8a`](https://github.com/apache/spark/commit/e13cd8ae5a5a9fae8b0dee1d2f6d890328b13210).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60843341
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22376/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60341645
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22112/consoleFull) for   PR 2919 at commit [`3de3d76`](https://github.com/apache/spark/commit/3de3d768951f020bf0876da2f40cc098210fcf05).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61222349
  
    @marmbrus  @mengxr  I removed the Java UDT and made UDTs private for now.  I think it's good enough to work with MLlib Datasets, and we can take more time to design public UDTs.  I'll make a pass through the code early tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61327183
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22628/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61331085
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22634/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19708016
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala ---
    @@ -75,6 +75,10 @@ private[sql] object CatalystConverter {
           parent: CatalystConverter): Converter = {
         val fieldType: DataType = field.dataType
         fieldType match {
    +      // Check UDT first since UDTs can override other types
    --- End diff --
    
    This doesn't actually have to be first.  None of the other cases can match UserDefinedType.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60830162
  
    @marmbrus  Parquet support added by @mengxr so this should be ready for a pass.  Thanks both!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60345200
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22112/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60840976
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22375/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60703033
  
      [Test build #22315 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22315/consoleFull) for   PR 2919 at commit [`7dd045a`](https://github.com/apache/spark/commit/7dd045ae91f83e5a596b871a39395eefda8dfb4a).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by etrain <gi...@git.apache.org>.
Github user etrain commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60969356
  
    I've looked at the code here, and basically seems reasonable. One high-level concern I have is around the programming pattern that this encourages: complex nesting of otherwise simple structure that may make it difficult to program against Datasets for sufficiently complicated applications.
    
    A 'dataset' is now a collection of Row, where we have the guarantee that all rows in a Dataset conform to the same schema. A schema is a list of (name, type) pairs which describe the attributes available in the dataset. This seems like a good thing to me, and is pretty much what we described in MLI (and how conventional databases have been structured forever). So far, so good. 
    
    The concern that I have is that we are now encouraging these attributes to be complex types. For example, where I might have had 
    val x = Schema(('a', classOf[String]), ('b', classOf[Double]), ..., ("z", classOf[Double]))
    This would become
    val x = Schema(('a', classOf[String]), ('bGroup', classOf[Vector]), .., ("zGroup", classOf[Vector]))
    
    So, great, my schema now has these vector things in them, which I can create separately, pass around, etc.
    
    This clearly has its merits:
    1) Features are groups together logically based on the process that creates them.
    2) Managing one short schema where each record is comprised of a few large objects (say, 4 vectors, each of length 1000) is probably easier than managing a really big schema comprised of lots small objects (say, 4000 doubles).
    
    But, there are some major drawbacks
    1) Why only stop at one level of nesting? Why not have Vector[Vector]? 
    2) How do learning algorithms, like SVM or PCA deal with these Datasets? Is there an implicit conversion that flattens these things to RDD[LabeledPoint]? Do we want to guarantee these semantics?
    3) Manipulating and subsetting nested schemas like this might be tricky. Where before I might be able to write:
    
    val x: Dataset = input.select(Seq(0,1,2,4,180,181,1000,1001,1002))
    now I might have to write
    val groupSelections = Seq(Seq(0,1,2,4),Seq(0,1),Seq(0,1,2))
    val x: Dataset = groupSelections.zip(input.columns).map {case (gs, col) => col(gs) }
    
    Ignoring raw syntax and semantics of how you might actually map an operation over the columns of a Dataset and get back a well-structured dataset, I think this makes two conflicting points:
    1) In the first example - presumably all the work goes into figuring out what the subset of features you want is in this really wide feature space.
    2) In the second example - there’s a lot of gymnastics that goes into subsetting feature groups. I think it’s clear that working with lots of feature groups might get unreasonable pretty quickly.
    
    If we look at R or pandas/scikit-learn as examples of projects that have (arguably quite successfully) dealt with these interface issues, there is one basic pattern: learning algorithms expect big tables of numbers as input. Even here, there are some important differences:
    
    For example, in scikit-learn, categorical features aren’t supported directly by most learning algorithms. Instead, users are responsible for getting data from “table with heterogenously typed columns” to “table of numbers.” with something like OneHotEncoder and other feature transformers. In R, on the other hand, categorical features are (sometimes frustratingly) first class citizens by virtue of the “factor” data type - which is essentially and enum. Most out-of-the-box learning algorithms (like glm()) accept data frames with categorical inputs and handle them sensibly - implicitly one hot encoding (or creating dummy variables, if you prefer) the categorical features.
    
    While I have a slight preference for representing things as big flat tables, I would be fine coding either way - but I wanted to raise the issue for discussion here before the interfaces are set in stone.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61164743
  
      [Test build #22557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22557/consoleFull) for   PR 2919 at commit [`9c175e9`](https://github.com/apache/spark/commit/9c175e9220a5be1add0b2f7eb38e67a030720439).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class RegressionMetrics(predictionAndObservations: RDD[(Double, Double)]) extends Logging `
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19448229
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -492,3 +494,24 @@ case class MapType(
           ("valueType" -> valueType.jsonValue) ~
           ("valueContainsNull" -> valueContainsNull)
     }
    +
    +/**
    + * ::DeveloperApi::
    + * The data type for User Defined Types.
    + */
    +@DeveloperApi
    +abstract class UserDefinedType[UserType] extends DataType with Serializable {
    +
    +  /** Underlying storage type for this UDT used by SparkSQL */
    +  def sqlType: DataType
    +
    +  /** Convert the user type to a Row object */
    +  // TODO: Can we make this take obj: UserType?  The issue is in ScalaReflection.convertToCatalyst,
    +  //       where we need to convert Any to UserType.
    --- End diff --
    
    I'll add the doc.  For now, it requires that the UDT be representable using SparkSQL's built-in types (since the `sqlType: DataType` field must be given).  It would be nice to permit this optimization later on, but I was not planning on doing it for this PR, unless there is a need to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61693524
  
    This has been subsumed by other PRs right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19373097
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -492,3 +494,24 @@ case class MapType(
           ("valueType" -> valueType.jsonValue) ~
           ("valueContainsNull" -> valueContainsNull)
     }
    +
    +/**
    + * ::DeveloperApi::
    + * The data type for User Defined Types.
    + */
    +@DeveloperApi
    +abstract class UserDefinedType[UserType] extends DataType with Serializable {
    +
    +  /** Underlying storage type for this UDT used by SparkSQL */
    +  def sqlType: DataType
    +
    +  /** Convert the user type to a Row object */
    +  // TODO: Can we make this take obj: UserType?  The issue is in ScalaReflection.convertToCatalyst,
    +  //       where we need to convert Any to UserType.
    --- End diff --
    
    Can this include some additional doc on when this conversion overhead will be incurred?  Is it kosher to return references to the same underlying structures or does everything need to be copied?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60696165
  
      [Test build #22315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22315/consoleFull) for   PR 2919 at commit [`7dd045a`](https://github.com/apache/spark/commit/7dd045ae91f83e5a596b871a39395eefda8dfb4a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61397519
  
      [Test build #22756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22756/consoleFull) for   PR 2919 at commit [`f8002b4`](https://github.com/apache/spark/commit/f8002b402848790b591ff20aeec00e4cb8f1a79c).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60982156
  
    I'm about to remove the mllib/ part of this PR; that can be put in after more discussions and whatever modifications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61432276
  
      [Test build #22776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22776/consoleFull) for   PR 2919 at commit [`e13cd8a`](https://github.com/apache/spark/commit/e13cd8ae5a5a9fae8b0dee1d2f6d890328b13210).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `  case class Params(`
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61164357
  
    @marbrus  Just pushed WIP update to include Java support, but currently have issue with accessing Scala UserDefinedType (in catalyst) from Java side.  The goal is to use a UDT defined in Scala (MyDenseVector) in Java, but the Java user needs to be able to convert the Scala UDT to a Java UDT.  It is hard to write a (public) conversion method in Java since it needs to take a Scala UDT as an argument (and it does not recognize the Scala UserDefinedType alias from package.scala).
    
    Proposal: Write a conversion method in Scala, and have Java users call it.
    
    Thoughts?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61696361
  
    Yes, I'll close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61397092
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22747/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley closed the pull request at:

    https://github.com/apache/spark/pull/2919


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60981280
  
    @etrain  Thanks for your thoughts!  This sounds like a discussion which would fit better on the [Dataset JIRA](https://issues.apache.org/jira/browse/SPARK-3573).  Could we please move it to there?  This PR is meant to give a standard SQL UDT implementation; I am OK with removing the MLlib Dataset example if that needs to be discussed more.  I'll post some thoughts on the JIRA once you move the comment there (for keeping a record).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61327173
  
      [Test build #22628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22628/consoleFull) for   PR 2919 at commit [`1b8cda3`](https://github.com/apache/spark/commit/1b8cda3af82559636a370ceb14ec75791686d2f5).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `        //       in some cases, such as when a class is enclosed in an object (in which case`
      * `abstract class UserDefinedType[UserType] extends DataType with Serializable `
      * `public abstract class UserDefinedType<UserType> extends DataType implements Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by sryza <gi...@git.apache.org>.
Github user sryza commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2919#discussion_r19373051
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -492,3 +494,24 @@ case class MapType(
           ("valueType" -> valueType.jsonValue) ~
           ("valueContainsNull" -> valueContainsNull)
     }
    +
    +/**
    + * ::DeveloperApi::
    + * The data type for User Defined Types.
    + */
    +@DeveloperApi
    --- End diff --
    
    Can this have some extra documentation about what it's purpose is and when a user might want to define one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61222170
  
      [Test build #22597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22597/consoleFull) for   PR 2919 at commit [`81ecfc3`](https://github.com/apache/spark/commit/81ecfc32cf7332fcc36da0d755fa66cfd1eb6da2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-60441231
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22150/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3572] [sql] [mllib] User-Defined Types ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2919#issuecomment-61397680
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22756/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org