You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2015/12/11 06:45:57 UTC

[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/10266

    [SPARK-12258] [SQL] passing null into ScalaUDF (follow-up)

    This is a follow-up PR for #10259 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark null_udf2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10266
    
----
commit c0f85bb676443cd0d6d73f8beb4139fb062fef8c
Author: Davies Liu <da...@databricks.com>
Date:   2015-12-11T05:44:23Z

    fix passing null into ScalaUDF

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-164009410
  
    **[Test build #2204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2204/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `class ExecutorClassLoader(`\n  * `case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends LeafExpression`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855851
  
    LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163868667
  
    **[Test build #47574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47574/consoleFull)** for PR 10266 at commit [`c96b512`](https://github.com/apache/spark/commit/c96b512a8a30575b24e8e9dbba24e0ac7ae16121).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163868756
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-164022075
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163986683
  
    **[Test build #47584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47584/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163903666
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47578/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163852763
  
    I changed `int` to `Integer` and tried again ,the result is the same. And I also tried `Integer i = (Integer) -1;` which also failed to compile. I think the problem is when we use negative literal with explicit type cast, the `-` are mistakenly parsed and we need to wrap it with `()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163861470
  
    @davies The exact same UDF worked fine in 1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163879143
  
    **[Test build #47578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47578/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-164020361
  
    **[Test build #2205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2205/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `class ExecutorClassLoader(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47389510
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    It's `Integer b = null; int a = (Integer) b;` , then NPE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163847683
  
    **[Test build #47572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47572/consoleFull)** for PR 10266 at commit [`c0f85bb`](https://github.com/apache/spark/commit/c0f85bb676443cd0d6d73f8beb4139fb062fef8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163874386
  
    hi @markhamstra , can you add some log in your UDF, to see if the NPE occurred before run into your UDF code or after?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10266


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163876482
  
    @cloud-fan @markhamstra They should be all fixed (handling null in arguments and results).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163903552
  
    **[Test build #47578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47578/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163868758
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47574/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47391422
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    My understanding is that he was trying to explain the NPE reported by @markhamstra.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163858860
  
    Still doesn't work for me.  Now it ends up in a different place, but a NPE:
    ```
    ...
    2015-12-11 06:48:09,285 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection: Code generated in 145.67804 ms
    2015-12-11 06:48:09,297 INFO org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection: Code generated in 4.438909 ms
    2015-12-11 06:48:09,305 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    java.lang.NullPointerException
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    	at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    	at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    	at org.apache.spark.scheduler.Task.run(Task.scala:89)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    2015-12-11 06:48:09,325 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NullPointerException
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:51)
    	at org.apache.spark.sql.execution.Project$$anonfun$1$$anonfun$apply$1.apply(basicOperators.scala:49)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    	at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    	at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:259)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$22.apply(RDD.scala:745)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    	at org.apache.spark.scheduler.Task.run(Task.scala:89)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    ...
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855481
  
    @davies This results in a slightly different failure from the one I previously reported:
    
    Everything looks the same as the prior post except now:
    ```
    failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 78, Column 193: Expression "java.lang.Integer" is not an rvalue
    .
    .
    .
    /* 078 */     Long result16 = (Long)catalystConverter15.apply(udf20.apply(converter17.apply(isNull21 ? (UTF8String) null : (UTF8String) primitive22),converter18.apply(false ? (Integer) null : (Integer) -1),converter19.apply(isNull26 ? (Long) null : (Long) primitive27)));
    .
    .
    .
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47392510
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    uh, the value of `${ctx.defaultValue(dataType)}` is not `null` but `-1` when data type is `Integer`. I do not have more questions. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163996602
  
    **[Test build #47584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47584/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163862856
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855323
  
    It's not a Janino bug, `(Integer)-1` does not work in Java, faint :-(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163997902
  
    **[Test build #2205 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2205/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47374514
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    Seems we are fine because we check `if (!${ev.isNull})` first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47391109
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    I can understand your fix, but I am trying to see what @cloud-fan said above. It sounds like he found another issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47428568
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    sorry for being vague, I was trying to explain why the NPE happened and @davies has fixed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163998492
  
    **[Test build #47585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47585/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163860387
  
    @markhamstra I think it's because of your UDF did not handle null correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163851041
  
    Wenchen, should the type of i be Integer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-164022078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47585/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163903664
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-164021742
  
    **[Test build #47585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47585/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855563
  
    @markhamstra Sorry, just pushed a commit to fix it now, added a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163996672
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47584/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163862544
  
    **[Test build #47572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47572/consoleFull)** for PR 10266 at commit [`c0f85bb`](https://github.com/apache/spark/commit/c0f85bb676443cd0d6d73f8beb4139fb062fef8c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163997363
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855980
  
    **[Test build #47574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47574/consoleFull)** for PR 10266 at commit [`c96b512`](https://github.com/apache/spark/commit/c96b512a8a30575b24e8e9dbba24e0ac7ae16121).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163848082
  
    Could you try to add `(Integer)` before `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163847721
  
    I tried it locally, here is my findings:
    
    * `int i = false ? null : (Integer) -1;`  **doesn't compile**
    * `int i = false ? null : (Integer) 1;` compiles
    * `int i = false ? null : (Integer) t;` compiles
    * `int i = false ? null : (Integer) (-1);` **compiles**
    
    So I think a simple fix is just adding `()` around `${eval.value}`, but I can't think of a test case to reproduce it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163984799
  
    **[Test build #2204 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2204/consoleFull)** for PR 10266 at commit [`2125a1b`](https://github.com/apache/spark/commit/2125a1bac7edd75c1602806089cee6eff2e11660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855637
  
    No problem; I'll cherry-pick another.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163867183
  
    **[Test build #2202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2202/consoleFull)** for PR 10266 at commit [`c96b512`](https://github.com/apache/spark/commit/c96b512a8a30575b24e8e9dbba24e0ac7ae16121).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `class ExecutorClassLoader(`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47391269
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    @gatorsmile At line 1049, we are using the default value. For primitive types, it will not be null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163862860
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47572/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855762
  
    @markhamstra Once it works, I will merge this to unblock RC2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163983570
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47388587
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    It will not cause NPE, but a compilation error? 
    
    For example, if `dataType` is `Integer`, line 1049 will be ```int ev.value = null``` It will trigger a compilation error "incompatible types". Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10266#discussion_r47351058
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala ---
    @@ -1029,24 +1029,27 @@ case class ScalaUDF(
         // such as IntegerType, its javaType is `int` and the returned type of user-defined
         // function is Object. Trying to convert an Object to `int` will cause casting exception.
         val evalCode = evals.map(_.code).mkString
    -    val funcArguments = converterTerms.zipWithIndex.map {
    -      case (converter, i) =>
    -        val eval = evals(i)
    -        val dt = children(i).dataType
    -        s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
    -    }.mkString(",")
    -    val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
    -      s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
    -        s".apply($funcTerm.apply($funcArguments));"
    +    val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
    +      val eval = evals(i)
    +      val argTerm = ctx.freshName("arg")
    +      val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
    +      (convert, argTerm)
    +    }.unzip
     
    -    evalCode + s"""
    -      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    -      Boolean ${ev.isNull};
    +    val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
    +      s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
    +        s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"
     
    +    s"""
    +      $evalCode
    +      ${converters.mkString("\n")}
           $callFunc
     
    -      ${ev.value} = $resultTerm;
    -      ${ev.isNull} = $resultTerm == null;
    +      boolean ${ev.isNull} = $resultTerm == null;
    +      ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
    +      if (!${ev.isNull}) {
    +        ${ev.value} = $resultTerm;
    --- End diff --
    
    ah that's it, the result type may be primitive and we should not assign null value to in, or NPE will happen.
    
    Should we create a JIRA for it? I think it's a different bug comparing to the one you fixed in #10259


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163972430
  
    Works for me.  Thanks, guys!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163855486
  
    **[Test build #2202 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2202/consoleFull)** for PR 10266 at commit [`c96b512`](https://github.com/apache/spark/commit/c96b512a8a30575b24e8e9dbba24e0ac7ae16121).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12258] [SQL] passing null into ScalaUDF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10266#issuecomment-163996670
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org