You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2018/08/10 01:57:45 UTC

[GitHub] spark pull request #22063: [SPARK-25044][SQL] Address translation of LMF clo...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/22063

    [SPARK-25044][SQL] Address translation of LMF closure primitive args to Object in Scala 2.12

    ## What changes were proposed in this pull request?
    
    First attempt to resolve issue with inferring func types in 2.12 by instead using info captured when UDF is registered -- capturing which types are nullable (i.e. not primitive)
    
    ## How was this patch tested?
    
    Existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-25044

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22063
    
----
commit 0242218cdd512cc6e23a96621852fd3e019b7fc3
Author: Sean Owen <sr...@...>
Date:   2018-08-10T01:56:54Z

    First attempt to resolve issue with inferring func types in 2.12 by instead using info captured when UDF is registered -- capturing which types are nullable (i.e. not primitive)

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212347965
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -57,8 +59,21 @@ case class ScalaUDF(
           children: Seq[Expression],
           inputTypes: Seq[DataType],
           udfName: Option[String]) = {
    -    this(
    -      function, dataType, children, inputTypes, udfName, nullable = true, udfDeterministic = true)
    +    this(function, dataType, children, inputTypes, udfName,
    +      nullable = true, udfDeterministic = true, nullableTypes = Nil)
    +  }
    +
    +  // Constructor from Spark 2.3
    --- End diff --
    
    I'm not sure we want to keep doing this. cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212488387
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    @skonto @lrytz this might be of interest. Don't think it's a Scala issue per se but just checking if that behavior change makes sense.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212393458
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -85,9 +85,8 @@ private[recommendation] trait ALSModelParams extends Params with HasPredictionCo
        * Attempts to safely cast a user/item id to an Int. Throws an exception if the value is
        * out of integer range or contains a fractional part.
        */
    -  protected[recommendation] val checkedCast = udf { (n: Any) =>
    +  protected[recommendation] val checkedCast = udf { n: AnyRef =>
    --- End diff --
    
    This doesn't even actually work, now that I dig further. Using `Number` doesn't work either. There are a number of UDFs that fail now in MLlib unfortunately.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2064/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    the idea LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95027/testReport)** for PR 22063 at commit [`f029f6d`](https://github.com/apache/spark/commit/f029f6d4d5ab6a39812be5beafb5c990205e208f).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212349025
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    +
    +  // Constructor from Spark 2.3.0 for binary compatibility
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212348950
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -375,8 +375,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
         import org.apache.spark.sql.functions.{rand, udf}
         val c = Column(col)
         val r = rand(seed)
    -    val f = udf { (stratum: Any, x: Double) =>
    -      x < fractions.getOrElse(stratum.asInstanceOf[T], 0.0)
    +    // Hack to get around the fact that type T is Any and we can't use a UDF whose arg
    +    // is Any. Convert everything to a string rep.
    --- End diff --
    
    I'm not sure this is safe to do. Maybe `a == b` is different from `a.toString == b.toString`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4288/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4289 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4289/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95031/testReport)** for PR 22063 at commit [`a5cc5ec`](https://github.com/apache/spark/commit/a5cc5ec49e4bf34c66457f5eec800bafbf3e4bd9).
     * This patch **fails from timeout after a configured wait of \`400m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212352751
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    +
    +  // Constructor from Spark 2.3.0 for binary compatibility
    --- End diff --
    
    why add this? `UserDefinedFunction` is public but its constructor is not.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94567/testReport)** for PR 22063 at commit [`4a88da5`](https://github.com/apache/spark/commit/4a88da5013f43221e644a2f6a86e3a6475e98c6d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94592/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2446/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by adriaanm <gi...@git.apache.org>.
Github user adriaanm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r213593596
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    No idea, but in any case the new version seems nicer :-) Both 2.11 and 2.12 will happily generate a `typeTag` for `Any`, though, so that wouldn't immediately explain it.  To see what was actually inferred, you could compile with `-Xprint:typer` (ideally after a full compile and then just making this file recompile incrementally).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212504250
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    --- End diff --
    
    +1 to all your comments. I'm overhauling this whole PR and will force push with a rebase once it seems to basically work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209426781
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala ---
    @@ -211,9 +211,7 @@ abstract class PredictionModel[FeaturesType, M <: PredictionModel[FeaturesType,
       }
     
       protected def transformImpl(dataset: Dataset[_]): DataFrame = {
    -    val predictUDF = udf { (features: Any) =>
    --- End diff --
    
    I'm really surprised that this worked before...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2459/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94561/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95031/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94561/testReport)** for PR 22063 at commit [`a1850b0`](https://github.com/apache/spark/commit/a1850b0a0385cd6add6177b95c5fb57c6c173c8e).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212499322
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    --- End diff --
    
    or `Option[Seq[ScalaReflection.Schema]]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95234/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2048/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2024/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94569/testReport)** for PR 22063 at commit [`c3d1561`](https://github.com/apache/spark/commit/c3d15618e362fcbd0d26487540ba9761188bffab).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2047/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94592/testReport)** for PR 22063 at commit [`3c67d9d`](https://github.com/apache/spark/commit/3c67d9d30b2afd38ccf7a01d49c75d4c17f31445).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94535/testReport)** for PR 22063 at commit [`92598f0`](https://github.com/apache/spark/commit/92598f0bbf55b19fc833745c88e273ffaea2b139).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95121/testReport)** for PR 22063 at commit [`09c3a3b`](https://github.com/apache/spark/commit/09c3a3b5c45fb2a356b191d5699c6ba497694031).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94610/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94610/testReport)** for PR 22063 at commit [`c285e93`](https://github.com/apache/spark/commit/c285e93c976b8e04d6296d445a2c9d27f9fb01d8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212800597
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3713,7 +3726,7 @@ object functions {
           | */
           |def udf[$typeTags](f: Function$x[$types]): UserDefinedFunction = {
           |  val ScalaReflection.Schema(dataType, nullable) = ScalaReflection.schemaFor[RT]
    -      |  val inputTypes = Try($inputTypes).toOption
    +      |  val inputTypes = Try($argSchema).toOption
    --- End diff --
    
    @cloud-fan might be worth another look now. So, after making this change, I realize, maybe the whole reason it started failing was that I had moved the schema inference outside the Try(). Now it's back inside. Maybe that makes the whole problem go back to being silent. Did you mean you preferred tackling the problem directly and not suppressing the failure to infer a schema? I added udfInternal above for that.
    
    But maybe this isn't the best approach as user UDFs could fail for the same reason. Maybe I need to back this whole thing out after all, now that I understand what's happening after your comments.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95031/testReport)** for PR 22063 at commit [`a5cc5ec`](https://github.com/apache/spark/commit/a5cc5ec49e4bf34c66457f5eec800bafbf3e4bd9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94567/testReport)** for PR 22063 at commit [`4a88da5`](https://github.com/apache/spark/commit/4a88da5013f43221e644a2f6a86e3a6475e98c6d).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212393072
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    +
    +  // Constructor from Spark 2.3.0 for binary compatibility
    --- End diff --
    
    It was just for MiMa, but I could also suppress the warning


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94567/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212345138
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    I looked into this, and now I understand why it worked before.
    
    Scala 2.11 somehow can generate type tag for `Any`, then Spark gets the input schema from type tag `Try(ScalaReflection.schemaFor(typeTag[A1]).dataType :: Nil).toOption`. It will fail and input schema will be None, so no type check will be applied later.
    
    I think it makes more sense to specify the type and ask Spark to do type check.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r213696131
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    I apologize, that's my mistake. In the end it isn't related to TypeTags for Any and that is not a difference. Thanks for your input, I think we are close. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212445066
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -57,8 +59,21 @@ case class ScalaUDF(
           children: Seq[Expression],
           inputTypes: Seq[DataType],
           udfName: Option[String]) = {
    -    this(
    -      function, dataType, children, inputTypes, udfName, nullable = true, udfDeterministic = true)
    +    this(function, dataType, children, inputTypes, udfName,
    +      nullable = true, udfDeterministic = true, nullableTypes = Nil)
    +  }
    +
    +  // Constructor from Spark 2.3
    --- End diff --
    
    Yeah, messy. The constructor and class are public so felt it was worth erring on the side of compatibility.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2043/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4293 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4293/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209127367
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -39,6 +39,7 @@ import org.apache.spark.sql.types.DataType
      * @param nullable  True if the UDF can return null value.
      * @param udfDeterministic  True if the UDF is deterministic. Deterministic UDF returns same result
      *                          each time it is invoked with a particular input.
    + * @param nullableTypes which of the inputTypes are nullable (i.e. not primitive)
    --- End diff --
    
    The approach here is to capture at registration time whether the arg types are primitive, or nullable. Not a great way to record this, but might be the least hack for now


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95234/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4289/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94610/testReport)** for PR 22063 at commit [`c285e93`](https://github.com/apache/spark/commit/c285e93c976b8e04d6296d445a2c9d27f9fb01d8).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by adriaanm <gi...@git.apache.org>.
Github user adriaanm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r213702881
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    No problem! Happy to help with the 2.12 upgrade.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95106/testReport)** for PR 22063 at commit [`e7abb67`](https://github.com/apache/spark/commit/e7abb67bcb66b21a41818e435b8ec848df62edd8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95027/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r214560630
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +41,7 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[ScalaReflection.Schema]]) {
    --- End diff --
    
    ah i see, it's a case class, so we would need to keep the `def inputTypes(): Option[Seq[DataType]]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94569/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209136524
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
    @@ -114,6 +114,7 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends
           val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
           val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: TypeTag")(_ + ", " + _)
           val inputTypes = (1 to x).foldRight("Nil")((i, s) => {s"ScalaReflection.schemaFor[A$i].dataType :: $s"})
    +      val nullableTypes = (1 to x).foldRight("Nil")((i, s) => {s"ScalaReflection.schemaFor[A$i].nullable :: $s"})
    --- End diff --
    
    Yeah that can be optimized. I'll fix the MiMa issue too by restoring a constructor.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2381/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen closed the pull request at:

    https://github.com/apache/spark/pull/22063


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212499164
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -57,8 +59,21 @@ case class ScalaUDF(
           children: Seq[Expression],
           inputTypes: Seq[DataType],
           udfName: Option[String]) = {
    -    this(
    -      function, dataType, children, inputTypes, udfName, nullable = true, udfDeterministic = true)
    +    this(function, dataType, children, inputTypes, udfName,
    +      nullable = true, udfDeterministic = true, nullableTypes = Nil)
    +  }
    +
    +  // Constructor from Spark 2.3
    --- End diff --
    
    By convention, everything under catalyst package is private, so compatibility is not a concern here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4288 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4288/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #4293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4293/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95027/testReport)** for PR 22063 at commit [`f029f6d`](https://github.com/apache/spark/commit/f029f6d4d5ab6a39812be5beafb5c990205e208f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by skonto <gi...@git.apache.org>.
Github user skonto commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r213575267
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    @adriaanm any more info?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2046/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95234/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r214567945
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +41,7 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[ScalaReflection.Schema]]) {
    --- End diff --
    
    (BTW it was PR https://github.com/apache/spark/pull/22259 that was merged)
    
    We can add back accessors, constructors, if it would make life easier for callers. But if this is protected, who are the callers of this code we're accommodating? maybe some hacky but important integration?
    
    We'd have to rename `inputTypes` and then add back an accessor with the old type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94535/testReport)** for PR 22063 at commit [`92598f0`](https://github.com/apache/spark/commit/92598f0bbf55b19fc833745c88e273ffaea2b139).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    See also https://github.com/apache/spark/pull/22259


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209127319
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2149,28 +2149,29 @@ class Analyzer(
     
           case p => p transformExpressionsUp {
     
    -        case udf @ ScalaUDF(func, _, inputs, _, _, _, _) =>
    -          val parameterTypes = ScalaReflection.getParameterTypes(func)
    -          assert(parameterTypes.length == inputs.length)
    -
    -          // TODO: skip null handling for not-nullable primitive inputs after we can completely
    -          // trust the `nullable` information.
    -          // (cls, expr) => cls.isPrimitive && expr.nullable
    -          val needsNullCheck = (cls: Class[_], expr: Expression) =>
    -            cls.isPrimitive && !expr.isInstanceOf[KnownNotNull]
    -          val inputsNullCheck = parameterTypes.zip(inputs)
    -            .filter { case (cls, expr) => needsNullCheck(cls, expr) }
    -            .map { case (_, expr) => IsNull(expr) }
    -            .reduceLeftOption[Expression]((e1, e2) => Or(e1, e2))
    -          // Once we add an `If` check above the udf, it is safe to mark those checked inputs
    -          // as not nullable (i.e., wrap them with `KnownNotNull`), because the null-returning
    -          // branch of `If` will be called if any of these checked inputs is null. Thus we can
    -          // prevent this rule from being applied repeatedly.
    -          val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) =>
    -            if (needsNullCheck(cls, expr)) KnownNotNull(expr) else expr }
    -          inputsNullCheck
    -            .map(If(_, Literal.create(null, udf.dataType), udf.copy(children = newInputs)))
    -            .getOrElse(udf)
    +        case udf@ScalaUDF(func, _, inputs, _, _, _, _, nullableTypes) =>
    +          if (nullableTypes.isEmpty) {
    --- End diff --
    
    This is probably the weak point: unless there is nullability info, don't do anything to the UDF plan, but, that's probably wrong in some cases


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95121/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94561/testReport)** for PR 22063 at commit [`a1850b0`](https://github.com/apache/spark/commit/a1850b0a0385cd6add6177b95c5fb57c6c173c8e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94592/testReport)** for PR 22063 at commit [`3c67d9d`](https://github.com/apache/spark/commit/3c67d9d30b2afd38ccf7a01d49c75d4c17f31445).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r214560313
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +41,7 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[ScalaReflection.Schema]]) {
    --- End diff --
    
    This is a stable API. Are we able to make this change in 2.4 instead of 3.0?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94571/testReport)** for PR 22063 at commit [`a803869`](https://github.com/apache/spark/commit/a803869775250f366ef357ccf06fed397a3f1cfd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94571/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2545/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212392363
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala ---
    @@ -164,19 +164,15 @@ abstract class ClassificationModel[FeaturesType, M <: ClassificationModel[Featur
         var outputData = dataset
         var numColsOutput = 0
         if (getRawPredictionCol != "") {
    -      val predictRawUDF = udf { (features: Any) =>
    --- End diff --
    
    Thanks for your review @cloud-fan , I could really use your input here. That's a good find. It _may_ be that we want to explicitly support UDFs where a schema isn't available -- see below. But I agree I'd rather not. It gets kind of messy though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94571/testReport)** for PR 22063 at commit [`a803869`](https://github.com/apache/spark/commit/a803869775250f366ef357ccf06fed397a3f1cfd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95121/testReport)** for PR 22063 at commit [`09c3a3b`](https://github.com/apache/spark/commit/09c3a3b5c45fb2a356b191d5699c6ba497694031).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2078/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212353000
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    --- End diff --
    
    again, can we combine it with the data types?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209135836
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
    @@ -114,6 +114,7 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends
           val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
           val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: TypeTag")(_ + ", " + _)
           val inputTypes = (1 to x).foldRight("Nil")((i, s) => {s"ScalaReflection.schemaFor[A$i].dataType :: $s"})
    +      val nullableTypes = (1 to x).foldRight("Nil")((i, s) => {s"ScalaReflection.schemaFor[A$i].nullable :: $s"})
    --- End diff --
    
    instead of having 2 list, shall we just keep a `Seq[ScalaReflection.Schema]` or `Seq[(DataType, Boolean)]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95106/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r214560519
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +41,7 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[ScalaReflection.Schema]]) {
    --- End diff --
    
    But the constructor is `protected[sql]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212354787
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -375,8 +375,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
         import org.apache.spark.sql.functions.{rand, udf}
         val c = Column(col)
         val r = rand(seed)
    -    val f = udf { (stratum: Any, x: Double) =>
    -      x < fractions.getOrElse(stratum.asInstanceOf[T], 0.0)
    +    // Hack to get around the fact that type T is Any and we can't use a UDF whose arg
    +    // is Any. Convert everything to a string rep.
    --- End diff --
    
    I feel udf using `Any` as input type is rare but a valid use case. Sometimes they just want to accept any input type.
    
    How about we create a few `udfInternal` methods that takes `Any` as inputs? e.g.
    ```
    def udfInternal[R: TypeTag](f: Function1[Any, R]): UserDefinedFunction = {
      val ScalaReflection.Schema(dataType, nullable) = ScalaReflection.schemaFor[R]
      val udf = UserDefinedFunction(f, dataType, Nil)
      if (nullable) udf else udf.asNonNullable()
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212393002
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala ---
    @@ -375,8 +375,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
         import org.apache.spark.sql.functions.{rand, udf}
         val c = Column(col)
         val r = rand(seed)
    -    val f = udf { (stratum: Any, x: Double) =>
    -      x < fractions.getOrElse(stratum.asInstanceOf[T], 0.0)
    +    // Hack to get around the fact that type T is Any and we can't use a UDF whose arg
    +    // is Any. Convert everything to a string rep.
    --- End diff --
    
    You are right that the change I made here is not bulletproof. Unfortunately there are several more problems like this. Anywhere there's a UDF on `Row` it now fails, and workarounds are ugly.
    
    I like your idea, let me work on that. Because the alternative I've been working on is driving me nuts.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212346762
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -85,9 +85,8 @@ private[recommendation] trait ALSModelParams extends Params with HasPredictionCo
        * Attempts to safely cast a user/item id to an Int. Throws an exception if the value is
        * out of integer range or contains a fractional part.
        */
    -  protected[recommendation] val checkedCast = udf { (n: Any) =>
    +  protected[recommendation] val checkedCast = udf { n: AnyRef =>
    --- End diff --
    
    I'm fine with this workaround, but I think we should create an expression for this `checkedCast`. The current UDF doesn't work well with inputs of different types.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94535/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2378/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #95106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95106/testReport)** for PR 22063 at commit [`e7abb67`](https://github.com/apache/spark/commit/e7abb67bcb66b21a41818e435b8ec848df62edd8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    **[Test build #94569 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94569/testReport)** for PR 22063 at commit [`c3d1561`](https://github.com/apache/spark/commit/c3d15618e362fcbd0d26487540ba9761188bffab).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r212393299
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala ---
    @@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
     case class UserDefinedFunction protected[sql] (
         f: AnyRef,
         dataType: DataType,
    -    inputTypes: Option[Seq[DataType]]) {
    +    inputTypes: Option[Seq[DataType]],
    +    nullableTypes: Option[Seq[Boolean]] = None) {
    --- End diff --
    
    I was hoping to minimize the change in the signature, but yeah it could be `Option[Seq[(DataType, Boolean)]]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22063
  
    For those following along, the problem right now is this: functions like `udf()` have always declared that their type params have `TypeTag`s, meaning the caller has to have runtime info about types. This supports schema inference.
    
    Although the internals of the `udf()` methods changed in this PR, the signature did not. `SchemaReflection.schemaFor` was called before. However, with this change, suddenly it seems like the `TypeTag`s are needed by the compiler, and many instances of `udf()` fail now because the call site does not have one.
    
    I worked around a few, but am having more trouble with others.
    I am still not clear why it is that only with this change do the `TypeTag`s seem to matter; they were always required?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

Posted by TomaszGaweda <gi...@git.apache.org>.
Github user TomaszGaweda commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22063#discussion_r209713521
  
    --- Diff: project/MimaExcludes.scala ---
    @@ -36,6 +36,11 @@ object MimaExcludes {
     
       // Exclude rules for 2.4.x
       lazy val v24excludes = v23excludes ++ Seq(
    +    // [SPARK-25044] Address translation of LMF closure primitive args to Object in Scala 2.1
    --- End diff --
    
    Nit: Wrong Scala version


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org