You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/04/07 06:34:02 UTC

[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/5383

    [SPARK-6734] [SQL] Add UDTF.close support in Generate

    Some third-party UDTF extensions generate additional rows in the "GenericUDTF.close()" method, which is supported / documented by Hive.
    https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
    However, Spark SQL ignores the "GenericUDTF.close()", and it causes bug while porting job from Hive to Spark SQL.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark udtf_close

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5383.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5383
    
----
commit 94ce8d062a427e219a8f89b9de4541dafa1b66ca
Author: Cheng Hao <ha...@intel.com>
Date:   2015-04-07T04:22:23Z

    add UDTF.close support

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92520820
  
      [Test build #30202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30202/consoleFull) for   PR 5383 at commit [`d719983`](https://github.com/apache/spark/commit/d7199830ac16e521474c427e478b65d39fee6d74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-94633301
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30620/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-90356260
  
      [Test build #29778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29778/consoleFull) for   PR 5383 at commit [`94ce8d0`](https://github.com/apache/spark/commit/94ce8d062a427e219a8f89b9de4541dafa1b66ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r27897916
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -74,10 +84,15 @@ case class Generate(
               } else {
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
    +        } ++ LazyIterator(() => boundGenerator.terminate()).map { row =>
    +          joinedRow(nullOriginalRow, row)
             }
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row)) ++
    +          LazyIterator(() => boundGenerator.terminate())
    --- End diff --
    
    Yes, I that's OK by called in each partition.
    
    See 
    https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java#L278
    https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecMapper.java#L192
    The `Operator.close()` is called in `MapReduceBase.close`, which mean they are supposed to run once within each Mapper/Reducer.
    
    And 
    https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java#L144
    https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L616
    shows the `genericUDTF.close()` is called within the `Operator.close`.
    
    Sorry, please correct me if my understanding is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-96516549
  
    @liancheng @marmbrus  Any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92543213
  
      [Test build #30202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30202/consoleFull) for   PR 5383 at commit [`d719983`](https://github.com/apache/spark/commit/d7199830ac16e521474c427e478b65d39fee6d74).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92544533
  
    Thank you @liancheng I've updated the code and it passed the unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28305846
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ---
    @@ -71,6 +71,12 @@ abstract class Generator extends Expression {
         copy._output = _output
         copy
       }
    +
    +  /**
    +   * Notifies that there are no more rows to process, clean up the code, and additional
    --- End diff --
    
    clean up the code => clean up code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91317211
  
      [Test build #29954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29954/consoleFull) for   PR 5383 at commit [`c45faf0`](https://github.com/apache/spark/commit/c45faf038afcd354ed085b24711e61097beeeabc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28212494
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    +        val joinedRow = new JoinedRow
    +        iter ++ boundGenerator.terminate().map(joinedRow(nullOriginalRow, _))
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    -    }
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row))
    +      )
    +    }.mapPartitions(_ ++ boundGenerator.terminate())
    --- End diff --
    
    Oh, yeah, that's a bug, thank you @liancheng I will update the code soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205711
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    --- End diff --
    
    Why the `: _*`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95765126
  
      [Test build #30890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30890/consoleFull) for   PR 5383 at commit [`1799ba5`](https://github.com/apache/spark/commit/1799ba55ba91198c9488b787100d21522f1ca669).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91342733
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29954/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92097125
  
    Please also add a test case to test the case where `Generator.join` is true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28306055
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -56,28 +65,33 @@ case class Generate(
       val boundGenerator = BindReferences.bindReference(generator, child.output)
     
       override def execute(): RDD[Row] = {
    +    // boundGenerator.terminate() should be triggered after all of the rows in the partition
         if (join) {
           child.execute().mapPartitions { iter =>
    -        val nullValues = Seq.fill(generator.output.size)(Literal(null))
    -        // Used to produce rows with no matches when outer = true.
    -        val outerProjection =
    -          newProjection(child.output ++ nullValues, child.output)
    -
    -        val joinProjection =
    -          newProjection(child.output ++ generatorOutput, child.output ++ generatorOutput)
    +        val generatorNullRow = Row.fromSeq(Seq.fill[Any](generator.output.size)(null))
             val joinedRow = new JoinedRow
     
             iter.flatMap {row =>
    +          // we should always set the left (child output)
    +          joinedRow.withLeft(row)
               val outputRows = boundGenerator.eval(row)
               if (outer && outputRows.isEmpty) {
    -            outerProjection(row) :: Nil
    +            joinedRow.withRight(generatorNullRow) :: Nil
               } else {
    -            outputRows.map(or => joinProjection(joinedRow(row, or)))
    +            outputRows.map(or => joinedRow.withRight(or))
               }
    +        } ++ LazyIterator(() => boundGenerator.terminate()).map { row =>
    --- End diff --
    
    I think we can just use `org.apache.spark.util.CompletionIterator` to ensure `terminate()` is called at the very last.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r29021010
  
    --- Diff: sql/hive/src/test/resources/golden/Test UDTF.close in Lateral Views-0-ac5c96224a534f07b49462ad76620678 ---
    @@ -0,0 +1,2 @@
    +97	500
    +97	500
    --- End diff --
    
    Yes, moved into `resources/`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92501504
  
      [Test build #30195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30195/consoleFull) for   PR 5383 at commit [`63c88cc`](https://github.com/apache/spark/commit/63c88cc7afca5cab3b67f57eda334add81b562f9).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected class UDTFCollector extends Collector with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r27861265
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -74,10 +84,15 @@ case class Generate(
               } else {
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
    +        } ++ LazyIterator(() => boundGenerator.terminate()).map { row =>
    +          joinedRow(nullOriginalRow, row)
             }
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row)) ++
    +          LazyIterator(() => boundGenerator.terminate())
    --- End diff --
    
    Looks like you are calling `terminate` on each partition. Is that not the same as how Hive does? In Hive it seems that `close` is called after all rows are processed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28214577
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    --- End diff --
    
    Yes, you're right, we should use the `child.output.size`, not the `generator.output.size`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28310316
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -56,28 +65,33 @@ case class Generate(
       val boundGenerator = BindReferences.bindReference(generator, child.output)
     
       override def execute(): RDD[Row] = {
    +    // boundGenerator.terminate() should be triggered after all of the rows in the partition
         if (join) {
           child.execute().mapPartitions { iter =>
    -        val nullValues = Seq.fill(generator.output.size)(Literal(null))
    -        // Used to produce rows with no matches when outer = true.
    -        val outerProjection =
    -          newProjection(child.output ++ nullValues, child.output)
    -
    -        val joinProjection =
    -          newProjection(child.output ++ generatorOutput, child.output ++ generatorOutput)
    +        val generatorNullRow = Row.fromSeq(Seq.fill[Any](generator.output.size)(null))
             val joinedRow = new JoinedRow
     
             iter.flatMap {row =>
    --- End diff --
    
    BTW, please help adding a space after the `{`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28745931
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -21,6 +21,15 @@ import org.apache.spark.annotation.DeveloperApi
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.expressions._
     
    +// for lazy computing, be sure the generator.terminate() called in the very last
    +private[execution] sealed case class LazyIterator(func: () => TraversableOnce[Row])
    +  extends Iterator[Row] {
    +
    +  lazy val results = func().toIterator
    +  override def hasNext: Boolean = results.hasNext
    +  override def next(): Row = results.next()
    +}
    --- End diff --
    
    @liancheng I just confirmed, the `CompletionIterator` provides a callback machinism once finish iterating the iterator, however, all we need here is to append an new iterator, which produced by the `terminate()` method.
    
    But you're right, probably we need to update the `CompletionIterator` a little bit, I leave a `TODO` here, and will file another PR for this improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205769
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    +        val joinedRow = new JoinedRow
    +        iter ++ boundGenerator.terminate().map(joinedRow(nullOriginalRow, _))
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    -    }
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row))
    +      )
    +    }.mapPartitions(_ ++ boundGenerator.terminate())
    --- End diff --
    
    This should only be added when `join` is false. Move it to the above line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-94617325
  
      [Test build #30620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30620/consoleFull) for   PR 5383 at commit [`e1635b4`](https://github.com/apache/spark/commit/e1635b4f85f4b6f4dc927c105e1d3c432288359e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28305660
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -52,14 +56,32 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter {
         TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
         // Add Locale setting
         Locale.setDefault(Locale.US)
    +    sql(s"ADD JAR ${TestHive.getHiveFile("data/files/TestUDTF.jar").getCanonicalPath()}")
    +    // The function source code can be found at:
    +    // https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide+UDTF
    +    sql(
    +      """
    +        |CREATE TEMPORARY FUNCTION udtf_count2 
    +        |AS 'org.apache.spark.sql.hive.execution.GenericUDTFCount2'
    +      """.stripMargin)
       }
     
       override def afterAll() {
         TestHive.cacheTables = false
         TimeZone.setDefault(originalTimeZone)
         Locale.setDefault(originalLocale)
    +    sql("DROP TEMPORARY FUNCTION udtf_count2")
       }
     
    +  createQueryTest("Test UDTF.close in Lateral Views",
    +     """
    +       | SELECT key, cc
    +       |   FROM src LATERAL VIEW udtf_count2(value) dd AS cc
    --- End diff --
    
    Please remove spaces at the beginning of these two lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95628553
  
      [Test build #30845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30845/consoleFull) for   PR 5383 at commit [`8953be3`](https://github.com/apache/spark/commit/8953be3eda4b120966f6cc7fb1e02f48c632a90f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r27995414
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -74,10 +84,15 @@ case class Generate(
               } else {
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
    +        } ++ LazyIterator(() => boundGenerator.terminate()).map { row =>
    +          joinedRow(nullOriginalRow, row)
             }
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row)) ++
    +          LazyIterator(() => boundGenerator.terminate())
    --- End diff --
    
    Ok, I'll try and let you know the result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-90379100
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29778/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92543221
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30202/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28829143
  
    --- Diff: sql/hive/src/test/resources/golden/Test UDTF.close in Lateral Views-0-ac5c96224a534f07b49462ad76620678 ---
    @@ -0,0 +1,2 @@
    +97	500
    +97	500
    --- End diff --
    
    Lets not put the test jar in `resources/data/files/`, as that is a directory that we copied from hive, and might replace in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95782798
  
    cc @liancheng @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95779198
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30890/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-94633296
  
      [Test build #30620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30620/consoleFull) for   PR 5383 at commit [`e1635b4`](https://github.com/apache/spark/commit/e1635b4f85f4b6f4dc927c105e1d3c432288359e).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91316943
  
    @maropu thanks for the comments, I've updated the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205574
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    +        val joinedRow = new JoinedRow
    +        iter ++ boundGenerator.terminate().map(joinedRow(nullOriginalRow, _))
    --- End diff --
    
    Can we merge this `.mapPartitions` into the above one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r27982781
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -74,10 +84,15 @@ case class Generate(
               } else {
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
    +        } ++ LazyIterator(() => boundGenerator.terminate()).map { row =>
    +          joinedRow(nullOriginalRow, row)
             }
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row)) ++
    +          LazyIterator(() => boundGenerator.terminate())
    --- End diff --
    
    @maropu thanks for the suggestion, I'd like to try that locally. BTW can you confirm if this PR work for you, as we talked offline? Just to double confirm the concern of @viirya.
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-90379070
  
      [Test build #29778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29778/consoleFull) for   PR 5383 at commit [`94ce8d0`](https://github.com/apache/spark/commit/94ce8d062a427e219a8f89b9de4541dafa1b66ca).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    case class LazyIterator(func: () => TraversableOnce[Row]) extends Iterator[Row] `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95628566
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30845/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205652
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ---
    @@ -71,6 +71,12 @@ abstract class Generator extends Expression {
         copy._output = _output
         copy
       }
    +
    +  /**
    +   * Call to notify that there are no more rows to process, clean up the code or additional
    +   * rows can be made here.
    --- End diff --
    
    Please reword this comment:
    
    > Notifies that there are no more rows to process, clean up code and optional additional rows can be made here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205580
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    +        val joinedRow = new JoinedRow
    +        iter ++ boundGenerator.terminate().map(joinedRow(nullOriginalRow, _))
           }
         } else {
    -      child.execute().mapPartitions(iter => iter.flatMap(row => boundGenerator.eval(row)))
    -    }
    +      child.execute().mapPartitions(iter =>
    +        iter.flatMap(row => boundGenerator.eval(row))
    +      )
    --- End diff --
    
    The original line doesn't exceeded 100 columns, please revert this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92501509
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30195/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28310315
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -21,6 +21,15 @@ import org.apache.spark.annotation.DeveloperApi
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.expressions._
     
    +// for lazy computing, be sure the generator.terminate() called in the very last
    +private[execution] sealed case class LazyIterator(func: () => TraversableOnce[Row])
    +  extends Iterator[Row] {
    +
    +  lazy val results = func().toIterator
    +  override def hasNext: Boolean = results.hasNext
    +  override def next(): Row = results.next()
    +}
    --- End diff --
    
    This can be removed since `CompletionIterator` should be enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28205759
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -75,9 +78,15 @@ case class Generate(
                 outputRows.map(or => joinProjection(joinedRow(row, or)))
               }
             }
    +      }.mapPartitions { iter =>
    +        val nullOriginalRow = Row(Seq.fill(generator.output.size)(Literal(null)): _*)
    --- End diff --
    
    My mistake, I see you're building the `Row`. But here we should use `child.output.size` rather than `generator.output.size`. Also we can use `Row.fromSeq` instead of `Row.appy(values: Any*)` for clarity:
    
    ```scala
    val nullOriginalRow = Row.fromSeq(Seq.fill(child.output.size)(Literal(null))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-92500634
  
      [Test build #30195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30195/consoleFull) for   PR 5383 at commit [`63c88cc`](https://github.com/apache/spark/commit/63c88cc7afca5cab3b67f57eda334add81b562f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28305665
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ---
    @@ -36,6 +39,7 @@ import org.apache.spark.sql.hive.test.TestHive._
     
     case class TestData(a: Int, b: String)
     
    +
    --- End diff --
    
    Remove this new line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91401585
  
    @liancheng @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95610869
  
      [Test build #30845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30845/consoleFull) for   PR 5383 at commit [`8953be3`](https://github.com/apache/spark/commit/8953be3eda4b120966f6cc7fb1e02f48c632a90f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91172886
  
    I found one issue; the current implementation of HiveGenericUdtf always calls `terminate()` though, it does not call `initialize()` in some cases because of lazy initialization.
    
    ```
     protected lazy val outputInspector = function.initialize(inputInspectors.toArray)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-95779193
  
      [Test build #30890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30890/consoleFull) for   PR 5383 at commit [`1799ba5`](https://github.com/apache/spark/commit/1799ba55ba91198c9488b787100d21522f1ca669).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5383#discussion_r28828949
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -21,6 +21,16 @@ import org.apache.spark.annotation.DeveloperApi
     import org.apache.spark.rdd.RDD
     import org.apache.spark.sql.catalyst.expressions._
     
    +// for lazy computing, be sure the generator.terminate() called in the very last
    +// TODO reusing the CompletionIterator?
    --- End diff --
    
    Use ScalaDoc style for class comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5383#issuecomment-91342725
  
      [Test build #29954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29954/consoleFull) for   PR 5383 at commit [`c45faf0`](https://github.com/apache/spark/commit/c45faf038afcd354ed085b24711e61097beeeabc).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected class UDTFCollector extends Collector with Serializable `
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org