You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/02/14 04:50:55 UTC

[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/4602

    [SPARK-5817] [SQL] Fix bug of udtf with column names

    Column names doesn't set properly for UDTF, the following case will throws exception
    ```
    createQueryTest("insert table with generator with column name",
        """
          CREATE TABLE gen_tmp (key Int);
          INSERT OVERWRITE TABLE gen_tmp
            SELECT explode(array(1,2,3)) AS val FROM src LIMIT 3;
       """.stripMargin)
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark explode_bug

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4602
    
----
commit 5ddab7e779df663c70dcd60fb49f0b5eddd39dcd
Author: Cheng Hao <ha...@intel.com>
Date:   2015-02-14T03:34:14Z

    Fix bug of udtf with column names

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r27710379
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -441,10 +441,23 @@ class Analyzer(catalog: Catalog,
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    +        Generate(g, join = false, outer = false, child, None, name :: Nil)
    +      case Project(Seq(MultiAlias(g: Generator, names)), child) =>
    +        Generate(g, join = false, outer = false, child, None, names)
         }
       }
    +
    +  object ResolveGenerate extends Rule[LogicalPlan] {
    --- End diff --
    
    Can you add some scala doc here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-93936975
  
      [Test build #30468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30468/consoleFull) for   PR 4602 at commit [`d2e8b43`](https://github.com/apache/spark/commit/d2e8b43968cf4e041e418bf73f0be29b04aaf38e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28372494
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ---
    @@ -42,47 +42,27 @@ abstract class Generator extends Expression {
     
       override type EvaluatedType = TraversableOnce[Row]
     
    -  override lazy val dataType =
    -    ArrayType(StructType(output.map(a => StructField(a.name, a.dataType, a.nullable, a.metadata))))
    +  override def dataType: DataType = ???
    --- End diff --
    
    Use `elementTypes` to construct a `ArrayType(StructType)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90695907
  
      [Test build #29801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29801/consoleFull) for   PR 4602 at commit [`607f4fb`](https://github.com/apache/spark/commit/607f4fb5b6513de332e83c0820b5b836aec11459).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74628619
  
    Thank you @yhuai , I've updated the description and rebased the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r27710417
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -101,6 +101,7 @@ case class Alias(child: Expression, name: String)
       extends NamedExpression with trees.UnaryNode[Expression] {
     
       override type EvaluatedType = Any
    +  override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator]
    --- End diff --
    
    Can you add a comment to this effect?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74360989
  
      [Test build #27473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27473/consoleFull) for   PR 4602 at commit [`7738ca6`](https://github.com/apache/spark/commit/7738ca6406814501018e2a968d78d0833adc9d36).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74362744
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27473/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74400196
  
      [Test build #27499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27499/consoleFull) for   PR 4602 at commit [`9656e51`](https://github.com/apache/spark/commit/9656e51349c9d3680c131a4fb787ecbde5ef4834).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74629893
  
      [Test build #27620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27620/consoleFull) for   PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94065766
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30493/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-83266773
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28846/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25394294
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    I agree, the deletion of L42-L54 is duplicated code with the `logical.Generate`, I would like to keep it unchange for this PR, which aims to fix the bug.
    Definitely we need to refactor the code of `Generator` and `Generate`, but we can leave it after 1.3 release, what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318693
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    The output of `Generate` is not always identical with `Generator`, e.g. if `join` is `true`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94036310
  
      [Test build #30490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30490/consoleFull) for   PR 4602 at commit [`04ae500`](https://github.com/apache/spark/commit/04ae50003b35e84e7287a8e6f07d8737b468345a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r27710490
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -700,12 +700,15 @@ class DataFrame private[sql](
        */
       def explode[A <: Product : TypeTag](input: Column*)(f: Row => TraversableOnce[A]): DataFrame = {
         val schema = ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
    -    val attributes = schema.toAttributes
    +    // TODO handle the metadata?
    --- End diff --
    
    I don't think there ever can be metadata, can there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94045656
  
      [Test build #30489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30489/consoleFull) for   PR 4602 at commit [`5ee5d2c`](https://github.com/apache/spark/commit/5ee5d2c52d34fec40660f00a0135b72d0e5e581c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77818820
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28384/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-93945568
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30468/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r27710465
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -40,34 +40,69 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend
      * output of each into a new stream of rows.  This operation is similar to a `flatMap` in functional
      * programming with one important additional feature, which allows the input rows to be joined with
      * their output.
    + * @param generator the generator expression
      * @param join  when true, each output row is implicitly joined with the input tuple that produced
      *              it.
      * @param outer when true, each input row will be output at least once, even if the output of the
      *              given `generator` is empty. `outer` has no effect when `join` is false.
    - * @param alias when set, this string is applied to the schema of the output of the transformation
    - *              as a qualifier.
    + * @param child Children logical plan node
    + * @param qualifier Qualifier for the attributes of generator(UDTF)
    + * @param attributeNames the column names for the generator(UDTF), will be _c0, _c1 .. _cN if
    + *                       leave as default (empty)
      */
     case class Generate(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    -    alias: Option[String],
    -    child: LogicalPlan)
    +    child: LogicalPlan,
    +    qualifier: Option[String] = None,
    +    attributeNames: Seq[String] = Nil)
       extends UnaryNode {
     
    -  protected def generatorOutput: Seq[Attribute] = {
    -    val output = alias
    -      .map(a => generator.output.map(_.withQualifiers(a :: Nil)))
    -      .getOrElse(generator.output)
    -    if (join && outer) {
    -      output.map(_.withNullability(true))
    -    } else {
    -      output
    +  override lazy val resolved: Boolean = {
    +    generator.resolved && childrenResolved && attributeNames.length > 0
    +  }
    +
    +  def generatorOutput(): Seq[Attribute] = {
    +    if (_generatorOutput == null) {
    +      val elementTypes = generator.elementTypes
    +
    +      val raw = if (attributeNames.size == elementTypes.size) {
    +        attributeNames.zip(elementTypes).map {
    +          case (n, (t, nullable)) => AttributeReference(n, t, nullable)()
    +        }
    +      } else {
    +        elementTypes.zipWithIndex.map {
    +          // keep the default column names as Hive does _c0, _c1, _cN
    +          case ((t, nullable), i) => AttributeReference(s"_c$i", t, nullable)()
    +        }
    +      }
    +
    +      _generatorOutput = qualifier.map(q => raw.map(_.withQualifiers(q :: Nil))).getOrElse(raw)
         }
    +
    +    _generatorOutput
       }
     
    -  override def output =
    -    if (join) child.output ++ generatorOutput else generatorOutput
    +  private var _generatorOutput: Seq[Attribute] = null
    --- End diff --
    
    I know that this is similar to how I implemented some of this originally, but I think my design was actually a really bad idea.  It would be much better if we could make the `Seq[Attributes]` an argument to the constructor.  That way we know it will be copied correctly when this node or its children are transformed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90464337
  
      [Test build #29788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29788/consoleFull) for   PR 4602 at commit [`2a66aa8`](https://github.com/apache/spark/commit/2a66aa8216f0ff2e8c520e0cd01249c3ab3b2b54).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-93945559
  
      [Test build #30468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30468/consoleFull) for   PR 4602 at commit [`d2e8b43`](https://github.com/apache/spark/commit/d2e8b43968cf4e041e418bf73f0be29b04aaf38e).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77862925
  
      [Test build #28395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28395/consoleFull) for   PR 4602 at commit [`7fa6e0d`](https://github.com/apache/spark/commit/7fa6e0d3e3cf83072e4dcf37fe24a89bdf0f8da1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318212
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -144,6 +144,12 @@ class Analyzer(catalog: Catalog,
                   failAnalysis(
                     s"unresolved operator ${operator.simpleString}")
     
    +            case p @ Project(exprs, _) if exprs.length > 1 && exprs.flatMap(_.collect {
    +              case e: Generator => true
    +            }).length >= 1 =>
    +              failAnalysis(
    +                s"only a single generator allow in Projection ${operator.simpleString}")
    --- End diff --
    
    `Only a single table generating function is allowed in a SELECT clause, found: ${exprs.map(_.prettyString).mkString(",")}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94032555
  
      [Test build #30489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30489/consoleFull) for   PR 4602 at commit [`5ee5d2c`](https://github.com/apache/spark/commit/5ee5d2c52d34fec40660f00a0135b72d0e5e581c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74360192
  
      [Test build #27472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27472/consoleFull) for   PR 4602 at commit [`5ddab7e`](https://github.com/apache/spark/commit/5ddab7e779df663c70dcd60fb49f0b5eddd39dcd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74637702
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27620/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28372352
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ---
    @@ -42,47 +42,27 @@ abstract class Generator extends Expression {
     
       override type EvaluatedType = TraversableOnce[Row]
     
    -  override lazy val dataType =
    -    ArrayType(StructType(output.map(a => StructField(a.name, a.dataType, a.nullable, a.metadata))))
    +  override def dataType: DataType = ???
    --- End diff --
    
    Throw better exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28575468
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -107,6 +113,12 @@ trait CheckAnalysis {
                 failAnalysis(
                   s"unresolved operator ${operator.simpleString}")
     
    +          case p @ Project(exprs, _) if containsMultipleGenerators(exprs) =>
    +            failAnalysis(
    +              s"""Only a single table generating function is allowed in a SELECT clause, found:
    +                 | ${exprs.map(_.prettyString).mkString(",")}""".stripMargin)
    --- End diff --
    
    Yea, I added in the unit test. see `HiveQuerySuite.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28575507
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -473,10 +473,47 @@ class Analyzer(
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, name :: Nil, Nil)
    +      case Project(Seq(MultiAlias(g: Generator, names)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, names, Nil)
         }
       }
    +
    +  object ResolveGenerate extends Rule[LogicalPlan] {
    +    // Construct the output attributes for the generator,
    +    // The output attribute names can be either specified or
    +    // auto generated.
    +    private def makeGeneratorOutput(
    +        generator: Generator,
    +        attributeNames: Seq[String],
    +        qualifier: Option[String]): Array[Attribute] = {
    +      val elementTypes = generator.elementTypes
    +
    +      val raw = if (attributeNames.size == elementTypes.size) {
    --- End diff --
    
    Hive does exactly the same as you listed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74600228
  
    I tried the following 
    ```
    val rdd = sc.parallelize((1 to 10).map(i => s"""{"a":$i, "b":"str${i}"}"""))
    sqlContext.jsonRDD(rdd).registerTempTable("jt")
    sqlContext.sql("CREATE TABLE gen_tmp (key Int)")
    sqlContext.sql("INSERT OVERWRITE TABLE gen_tmp SELECT explode(array(1,2,3)) AS val FROM jt LIMIT 1")
    ```
    
    ```
    org.apache.spark.sql.AnalysisException: invalid cast from array<struct<_c0:int>> to int;
    	at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.failAnalysis(Analyzer.scala:85)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$18$$anonfun$apply$2.applyOrElse(Analyzer.scala:98)
    	at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$apply$18$$anonfun$apply$2.applyOrElse(Analyzer.scala:92)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250)
    	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50)
    	at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249)
    	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:263)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318420
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -501,8 +507,10 @@ class Analyzer(catalog: Catalog,
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    --- End diff --
    
    Is it possible to always put the names in the Generator itself instead of needing this rule?  I don't really remember all of the places where we construct these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-92681545
  
      [Test build #30222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30222/consoleFull) for   PR 4602 at commit [`ca5e7f4`](https://github.com/apache/spark/commit/ca5e7f41996fc6ada7949ee714e26b25eba578e7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90463818
  
      [Test build #29788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29788/consoleFull) for   PR 4602 at commit [`2a66aa8`](https://github.com/apache/spark/commit/2a66aa8216f0ff2e8c520e0cd01249c3ab3b2b54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318534
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -501,8 +507,10 @@ class Analyzer(catalog: Catalog,
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    --- End diff --
    
    Ah, makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24712240
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -101,6 +101,7 @@ case class Alias(child: Expression, name: String)
       extends NamedExpression with trees.UnaryNode[Expression] {
     
       override type EvaluatedType = Any
    +  override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator]
    --- End diff --
    
    why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77984222
  
      [Test build #28414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28414/consoleFull) for   PR 4602 at commit [`da11e12`](https://github.com/apache/spark/commit/da11e12c69bb5307ba9af7df99a3c04cd0c5a621).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28575756
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala ---
    @@ -42,47 +42,27 @@ abstract class Generator extends Expression {
     
       override type EvaluatedType = TraversableOnce[Row]
     
    -  override lazy val dataType =
    -    ArrayType(StructType(output.map(a => StructField(a.name, a.dataType, a.nullable, a.metadata))))
    +  override def dataType: DataType = ???
    --- End diff --
    
    As we moved the output field names from the `Generator` to `Generate`, probably it's impossible to construct the `StructType` any more within the `Generator`. And, it will be the rare case to the `dataType` method of a `Generator`, how about keep throwing exception if people do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74637694
  
      [Test build #27620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27620/consoleFull) for   PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ShowTablesCommand(databaseName: Option[String]) extends RunnableCommand `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74360225
  
      [Test build #27472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27472/consoleFull) for   PR 4602 at commit [`5ddab7e`](https://github.com/apache/spark/commit/5ddab7e779df663c70dcd60fb49f0b5eddd39dcd).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90657029
  
      [Test build #29801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29801/consoleFull) for   PR 4602 at commit [`607f4fb`](https://github.com/apache/spark/commit/607f4fb5b6513de332e83c0820b5b836aec11459).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24718555
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -137,6 +137,11 @@ class Analyzer(catalog: Catalog,
                   failAnalysis(
                     s"unresolved operator ${operator.simpleString}")
     
    +            case p @ Project(exprs, _) if exprs.length > 1 && exprs.collect {
    --- End diff --
    
    Oh, it's a bug in my code, thanks for finding this. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318498
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -501,8 +507,10 @@ class Analyzer(catalog: Catalog,
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    --- End diff --
    
    We couldn't tell whether it's a Generator until the function is resolved, particularly in the `HiveQl.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74807564
  
    @chenghao-intel After another look of the code, I think it may be better to remove aliases from the `generator`. Then, `MultiAlias` can be used to assign the names to the output fields of a generator. Because a `Generator` is just an `Expression`, seems it is not a good idea to put names in it. Instead, using an `NamedExpression` (e.g. `MultiAlias`) to wrap a `Generator` looks like a better approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77990620
  
      [Test build #28414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28414/consoleFull) for   PR 4602 at commit [`da11e12`](https://github.com/apache/spark/commit/da11e12c69bb5307ba9af7df99a3c04cd0c5a621).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77815033
  
      [Test build #28384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28384/consoleFull) for   PR 4602 at commit [`3500042`](https://github.com/apache/spark/commit/3500042ab8110bf386e502b937805b8783e30d3f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28371222
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -40,34 +40,41 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend
      * output of each into a new stream of rows.  This operation is similar to a `flatMap` in functional
      * programming with one important additional feature, which allows the input rows to be joined with
      * their output.
    + * @param generator the generator expression
      * @param join  when true, each output row is implicitly joined with the input tuple that produced
      *              it.
      * @param outer when true, each input row will be output at least once, even if the output of the
      *              given `generator` is empty. `outer` has no effect when `join` is false.
    - * @param alias when set, this string is applied to the schema of the output of the transformation
    - *              as a qualifier.
    + * @param child Children logical plan node
    + * @param qualifier Qualifier for the attributes of generator(UDTF)
    + * @param attributeNames the column names for the generator(UDTF), will be _c0, _c1 .. _cN if
    + *                       leave as default (empty)
    + * @param gOutput The output of Generator.
      */
     case class Generate(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    -    alias: Option[String],
    -    child: LogicalPlan)
    +    child: LogicalPlan,
    +    qualifier: Option[String] = None,
    +    attributeNames: Seq[String] = Nil,
    +    gOutput: Seq[Attribute] = Nil)
    --- End diff --
    
    This could be just `output: Seq[Attribute]` without attributeNames.  When its unresolved `output = Seq(UnresolvedAttribute("name1"), ...) => Seq(AttributeReference(...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74402072
  
    @marmbrus any more comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77818815
  
      [Test build #28384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28384/consoleFull) for   PR 4602 at commit [`3500042`](https://github.com/apache/spark/commit/3500042ab8110bf386e502b937805b8783e30d3f).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74628623
  
      [Test build #27617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27617/consoleFull) for   PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318192
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -144,6 +144,12 @@ class Analyzer(catalog: Catalog,
                   failAnalysis(
                     s"unresolved operator ${operator.simpleString}")
     
    +            case p @ Project(exprs, _) if exprs.length > 1 && exprs.flatMap(_.collect {
    --- End diff --
    
    pull `containsMultipleGenerators` out into a function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74801565
  
    /cc @marmbrus @yhuai  Any comment on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77994219
  
    cc @yhuai @marmbrus @liancheng can you review the code? I've finished the code refactoring and the bug fixing as we discussed above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28199798
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -112,6 +112,8 @@ case class Alias(child: Expression, name: String)(
       extends NamedExpression with trees.UnaryNode[Expression] {
     
       override type EvaluatedType = Any
    +  // Alias(Generator, xx) need to be transformed into Generate(generator, ...)
    +  override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator]
    --- End diff --
    
    Instead, could we just have generators be unresolved until they have aliases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25462282
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    Its too late to merge this into 1.3 since its not a regression.  So I don't see a lot of point in waiting for the correct solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25479434
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    Ok, I will do the code refactoring within this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-75419540
  
    @yhuai please ignore my previous comment. I was thinking some other possibilities.
    I agree with you we can move the output column names into the logical plan node `Generate`, but one thing that I am not sure if we need to provide the ability of managing the default field names(if it's not specified) by the `generator` expression itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74636202
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27617/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28372261
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ---
    @@ -107,6 +113,12 @@ trait CheckAnalysis {
                 failAnalysis(
                   s"unresolved operator ${operator.simpleString}")
     
    +          case p @ Project(exprs, _) if containsMultipleGenerators(exprs) =>
    +            failAnalysis(
    +              s"""Only a single table generating function is allowed in a SELECT clause, found:
    +                 | ${exprs.map(_.prettyString).mkString(",")}""".stripMargin)
    --- End diff --
    
    Do we have a test for this error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28371985
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -473,10 +473,47 @@ class Analyzer(
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, name :: Nil, Nil)
    +      case Project(Seq(MultiAlias(g: Generator, names)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, names, Nil)
         }
       }
    +
    +  object ResolveGenerate extends Rule[LogicalPlan] {
    +    // Construct the output attributes for the generator,
    +    // The output attribute names can be either specified or
    +    // auto generated.
    +    private def makeGeneratorOutput(
    +        generator: Generator,
    +        attributeNames: Seq[String],
    +        qualifier: Option[String]): Array[Attribute] = {
    +      val elementTypes = generator.elementTypes
    +
    +      val raw = if (attributeNames.size == elementTypes.size) {
    +        attributeNames.zip(elementTypes).map {
    +          case (n, (t, nullable)) => AttributeReference(n, t, nullable)()
    +        }
    +      } else {
    +        elementTypes.zipWithIndex.map {
    +          // keep the default column names as Hive does _c0, _c1, _cN
    +          case ((t, nullable), i) => AttributeReference(s"_c$i", t, nullable)()
    +        }
    +      }
    +
    +      qualifier.map(q => raw.map(_.withQualifiers(q :: Nil))).getOrElse(raw).toArray[Attribute]
    --- End diff --
    
    Move this into the operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90464346
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29788/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94958058
  
    Thanks, merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28373360
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -31,40 +31,29 @@ import org.apache.spark.sql.catalyst.expressions._
      *              it.
      * @param outer when true, each input row will be output at least once, even if the output of the
      *              given `generator` is empty. `outer` has no effect when `join` is false.
    + * @param output the output attributes of this node, which constructed in analysis phase,
    + *               and we can not change it, as the parent node bound with it already.
      */
     @DeveloperApi
     case class Generate(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    -    child: SparkPlan)
    +    child: SparkPlan,
    +    output: Seq[Attribute])
    --- End diff --
    
    Make child last argument, get rid of default arguments above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90241271
  
    I think that if you just make them arguments to the constructor then all of the work to keep them consistent will come for free.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-89102449
  
    Sorry for the delay.  I did a quick pass and I think the biggest comment is the mutable state for the output of `Generate`.  My initial version had tons of bugs, so I'd like to avoid doing that again if at all possible.  If you have time to update I can try and give a more thorough review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90749662
  
    @marmbrus updated, can you review it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24713919
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -101,6 +101,7 @@ case class Alias(child: Expression, name: String)
       extends NamedExpression with trees.UnaryNode[Expression] {
     
       override type EvaluatedType = Any
    +  override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator]
    --- End diff --
    
    `Alias(Generator)` is not like the normal expression, and it will be transformed into `Generate(Generator, alias)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94054064
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30491/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94045209
  
      [Test build #30493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30493/consoleFull) for   PR 4602 at commit [`c2a5132`](https://github.com/apache/spark/commit/c2a51323138c42fd1fcc84d32b01cfa647bc7a67).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77879123
  
      [Test build #28395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28395/consoleFull) for   PR 4602 at commit [`7fa6e0d`](https://github.com/apache/spark/commit/7fa6e0d3e3cf83072e4dcf37fe24a89bdf0f8da1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24718498
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -137,6 +137,11 @@ class Analyzer(catalog: Catalog,
                   failAnalysis(
                     s"unresolved operator ${operator.simpleString}")
     
    +            case p @ Project(exprs, _) if exprs.length > 1 && exprs.collect {
    --- End diff --
    
    e.g. Project(Alias(Generator1, name), Alias(Generator2, name2))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94049157
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30490/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28371348
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -473,10 +473,47 @@ class Analyzer(
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, name :: Nil, Nil)
    +      case Project(Seq(MultiAlias(g: Generator, names)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, names, Nil)
         }
       }
    +
    +  object ResolveGenerate extends Rule[LogicalPlan] {
    +    // Construct the output attributes for the generator,
    +    // The output attribute names can be either specified or
    +    // auto generated.
    --- End diff --
    
    Scala doc for this object, don't wrap early.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-89903531
  
    Yea, that's quite headache, all of the output attribute are created within the `Generate`, and will be bundled by its parent nodes later, the difficulty is to keep the `exprId` of the output attributes consistency during the node transform / copying.
    I will think of that more deeply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90470127
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29789/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90470117
  
      [Test build #29789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29789/consoleFull) for   PR 4602 at commit [`76b820c`](https://github.com/apache/spark/commit/76b820c3d47f40937d6f7f7b8344b734fbc70881).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74629760
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94045673
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30489/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-75417555
  
    The `generator` is not like the normal expression, which can output multiple columns, in current implementation, the logical plan node `Generate` is for that purpose, instead of `Project`. I agree we need to improve that somehow, probably all we need is a more general logical plan node `Project`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24715779
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -137,6 +137,11 @@ class Analyzer(catalog: Catalog,
                   failAnalysis(
                     s"unresolved operator ${operator.simpleString}")
     
    +            case p @ Project(exprs, _) if exprs.length > 1 && exprs.collect {
    --- End diff --
    
    perhaps `exprs.find(_.isInstanceOf[Generator]).isDefined`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-83238883
  
      [Test build #28846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28846/consoleFull) for   PR 4602 at commit [`95187fe`](https://github.com/apache/spark/commit/95187fea53de1549c030777f47c6184188ca2b24).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90695926
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29801/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28371774
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -40,34 +40,41 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend
      * output of each into a new stream of rows.  This operation is similar to a `flatMap` in functional
      * programming with one important additional feature, which allows the input rows to be joined with
      * their output.
    + * @param generator the generator expression
      * @param join  when true, each output row is implicitly joined with the input tuple that produced
      *              it.
      * @param outer when true, each input row will be output at least once, even if the output of the
      *              given `generator` is empty. `outer` has no effect when `join` is false.
    - * @param alias when set, this string is applied to the schema of the output of the transformation
    - *              as a qualifier.
    + * @param child Children logical plan node
    + * @param qualifier Qualifier for the attributes of generator(UDTF)
    + * @param attributeNames the column names for the generator(UDTF), will be _c0, _c1 .. _cN if
    + *                       leave as default (empty)
    + * @param gOutput The output of Generator.
      */
     case class Generate(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    -    alias: Option[String],
    -    child: LogicalPlan)
    +    child: LogicalPlan,
    +    qualifier: Option[String] = None,
    +    attributeNames: Seq[String] = Nil,
    +    gOutput: Seq[Attribute] = Nil)
    --- End diff --
    
    Call this: `generatorOutput`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28371864
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -40,34 +40,41 @@ case class Project(projectList: Seq[NamedExpression], child: LogicalPlan) extend
      * output of each into a new stream of rows.  This operation is similar to a `flatMap` in functional
      * programming with one important additional feature, which allows the input rows to be joined with
      * their output.
    + * @param generator the generator expression
      * @param join  when true, each output row is implicitly joined with the input tuple that produced
      *              it.
      * @param outer when true, each input row will be output at least once, even if the output of the
      *              given `generator` is empty. `outer` has no effect when `join` is false.
    - * @param alias when set, this string is applied to the schema of the output of the transformation
    - *              as a qualifier.
    + * @param child Children logical plan node
    + * @param qualifier Qualifier for the attributes of generator(UDTF)
    + * @param attributeNames the column names for the generator(UDTF), will be _c0, _c1 .. _cN if
    + *                       leave as default (empty)
    + * @param gOutput The output of Generator.
      */
     case class Generate(
         generator: Generator,
         join: Boolean,
         outer: Boolean,
    -    alias: Option[String],
    -    child: LogicalPlan)
    +    child: LogicalPlan,
    +    qualifier: Option[String] = None,
    +    attributeNames: Seq[String] = Nil,
    +    gOutput: Seq[Attribute] = Nil)
       extends UnaryNode {
     
    -  protected def generatorOutput: Seq[Attribute] = {
    -    val output = alias
    -      .map(a => generator.output.map(_.withQualifiers(a :: Nil)))
    -      .getOrElse(generator.output)
    -    if (join && outer) {
    -      output.map(_.withNullability(true))
    -    } else {
    -      output
    -    }
    +  override lazy val resolved: Boolean = {
    +    generator.resolved &&
    +      childrenResolved &&
    +      attributeNames.length > 0 &&
    +      gOutput.map(_.name) == attributeNames
       }
     
    -  override def output: Seq[Attribute] =
    -    if (join) child.output ++ generatorOutput else generatorOutput
    +  // we don't want the gOutput to be taken as part of the expressions
    +  // as that will cause exceptions like unresolved attributes etc.
    +  override def expressions: Seq[Expression] = generator :: Nil
    +
    +  def output: Seq[Attribute] = {
    +    if (join) child.output ++ gOutput else gOutput
    --- End diff --
    
    If you apply the qualifier here instead then its impossible for the rule writer to make a mistake:
    
    ```scala
    val withoutQualifier = if (join) child.output ++ gOutput else gOutput
    qualifier.map(q => withoutQualifier.map(_.withQualifier(q)).getOrElse(withoutQualifier)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24715815
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,17 +36,22 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    tblAlias: Option[String],
         join: Boolean,
         outer: Boolean,
         child: SparkPlan)
       extends UnaryNode {
     
       // This must be a val since the generator output expr ids are not preserved by serialization.
    --- End diff --
    
    I'm tempted to remove all this complicated logic and just make the attributes produced an argument to the `Generate` operator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28372328
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
    @@ -284,12 +284,13 @@ package object dsl {
             seed: Int = (math.random * 1000).toInt): LogicalPlan =
           Sample(fraction, withReplacement, seed, logicalPlan)
     
    +    // TODO specify the output column names
         def generate(
             generator: Generator,
             join: Boolean = false,
             outer: Boolean = false,
    -        alias: Option[String] = None): LogicalPlan =
    -      Generate(generator, join, outer, None, logicalPlan)
    +        alias: Option[String] = None): Generate =
    --- End diff --
    
    keep as `LogicalPlan`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94054050
  
      [Test build #30491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30491/consoleFull) for   PR 4602 at commit [`556e982`](https://github.com/apache/spark/commit/556e982db1a11c800132d659c0ed495cf4f90de2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-92681556
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30222/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r24718543
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,17 +36,22 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    tblAlias: Option[String],
         join: Boolean,
         outer: Boolean,
         child: SparkPlan)
       extends UnaryNode {
     
       // This must be a val since the generator output expr ids are not preserved by serialization.
    --- End diff --
    
    Yea, that's a better idea. I will update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74401862
  
      [Test build #27499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27499/consoleFull) for   PR 4602 at commit [`9656e51`](https://github.com/apache/spark/commit/9656e51349c9d3680c131a4fb787ecbde5ef4834).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94065756
  
      [Test build #30493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30493/consoleFull) for   PR 4602 at commit [`c2a5132`](https://github.com/apache/spark/commit/c2a51323138c42fd1fcc84d32b01cfa647bc7a67).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77990625
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28414/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25318272
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    It seems redundant to have the generator `Attribute`s in both the operator and the expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28372173
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -473,10 +473,47 @@ class Analyzer(
        */
       object ImplicitGenerate extends Rule[LogicalPlan] {
         def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    -      case Project(Seq(Alias(g: Generator, _)), child) =>
    -        Generate(g, join = false, outer = false, None, child)
    +      case Project(Seq(Alias(g: Generator, name)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, name :: Nil, Nil)
    +      case Project(Seq(MultiAlias(g: Generator, names)), child) =>
    +        Generate(g, join = false, outer = false, child, qualifier = None, names, Nil)
         }
       }
    +
    +  object ResolveGenerate extends Rule[LogicalPlan] {
    +    // Construct the output attributes for the generator,
    +    // The output attribute names can be either specified or
    +    // auto generated.
    +    private def makeGeneratorOutput(
    +        generator: Generator,
    +        attributeNames: Seq[String],
    +        qualifier: Option[String]): Array[Attribute] = {
    +      val elementTypes = generator.elementTypes
    +
    +      val raw = if (attributeNames.size == elementTypes.size) {
    --- End diff --
    
    Could this instead be:
     - attributeName.size == 0 => auto generate
     - attributeNames.size == elementTypes.size => use names
     - otherwise throw error
    
    Lets see what hive does.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94041648
  
      [Test build #30491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30491/consoleFull) for   PR 4602 at commit [`556e982`](https://github.com/apache/spark/commit/556e982db1a11c800132d659c0ed495cf4f90de2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-92644150
  
      [Test build #30222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30222/consoleFull) for   PR 4602 at commit [`ca5e7f4`](https://github.com/apache/spark/commit/ca5e7f41996fc6ada7949ee714e26b25eba578e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-83266758
  
      [Test build #28846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28846/consoleFull) for   PR 4602 at commit [`95187fe`](https://github.com/apache/spark/commit/95187fea53de1549c030777f47c6184188ca2b24).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r28303497
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala ---
    @@ -112,6 +112,8 @@ case class Alias(child: Expression, name: String)(
       extends NamedExpression with trees.UnaryNode[Expression] {
     
       override type EvaluatedType = Any
    +  // Alias(Generator, xx) need to be transformed into Generate(generator, ...)
    +  override lazy val resolved = childrenResolved && !child.isInstanceOf[Generator]
    --- End diff --
    
    @marmbrus sorry, I am not so sure your mean, most of change that I made is move the `aliases` from `Generator` to `Generate`, which means `Generator` developers will not cares about the aliases any more.
    The `Alias(Generator, xx)` here will be transformed to `Generate(generator, aliases)`, that's why I marked it as `unresolved` for Alias.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74362743
  
      [Test build #27473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27473/consoleFull) for   PR 4602 at commit [`7738ca6`](https://github.com/apache/spark/commit/7738ca6406814501018e2a968d78d0833adc9d36).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-77879139
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28395/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74360226
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27472/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4602#discussion_r25382904
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala ---
    @@ -34,24 +35,12 @@ import org.apache.spark.sql.catalyst.expressions._
     @DeveloperApi
     case class Generate(
         generator: Generator,
    +    output: Seq[Attribute],
    --- End diff --
    
    Okay, but that sounds like an argument to take in only the produced attributes as a parameter and have `output` be a def that is calculated based on the value of `join`, etc.  I don't see why we should keep the unused, complicated, redundant `Attribute`s in generator (actually they are used once in this file, but that seems like any easy way to allow these two things to get out of sync)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-75884037
  
    @yhuai @marmbrus this is a bug fixing, it will be great if you can give more comments on this, and I agree with @yhuai we need to refactor the UDTF expression implementation, but can I put that in the next PR? This is actually a block issue for our internally benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-94049145
  
      [Test build #30490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30490/consoleFull) for   PR 4602 at commit [`04ae500`](https://github.com/apache/spark/commit/04ae50003b35e84e7287a8e6f07d8737b468345a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Explode(child: Expression)`
    
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74401863
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27499/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-90469761
  
      [Test build #29789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29789/consoleFull) for   PR 4602 at commit [`76b820c`](https://github.com/apache/spark/commit/76b820c3d47f40937d6f7f7b8344b734fbc70881).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5817] [SQL] Fix bug of udtf with column...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4602#issuecomment-74636197
  
      [Test build #27617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27617/consoleFull) for   PR 4602 at commit [`f6907d2`](https://github.com/apache/spark/commit/f6907d2bb1c9aca1528e458a9a7fd9a3d58b9309).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ShowTablesCommand(databaseName: Option[String]) extends RunnableCommand `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org