You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dreamquster <gi...@git.apache.org> on 2015/04/04 11:24:16 UTC

[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

GitHub user dreamquster opened a pull request:

    https://github.com/apache/spark/pull/5358

    [SQL] SPARK-6489: Optimize lateral view with explode to not unnecessary columns.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dreamquster/spark spark-explode-optimize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5358.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5358
    
----
commit 1b29835b2beaba53f6cec3c02680002ad89802f5
Author: dreamquster <dr...@163.com>
Date:   2015-04-04T08:42:47Z

    [SQL] SPARK-6489: Optimize lateral view with explode to not read unnecessary columns

commit 376d332462a3e2a21a28ecdeab14a3bd1f49ffbf
Author: dreamquster <dr...@163.com>
Date:   2015-04-04T09:14:04Z

    [SQL] SPARK-6489: adding test files

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93248199
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30322/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93230381
  
      [Test build #30311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30311/consoleFull) for   PR 5358 at commit [`8909a5d`](https://github.com/apache/spark/commit/8909a5d14dccc1933a451261bc0e56a2cc876897).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch **removes the following dependencies:**
       * `RoaringBitmap-0.4.5.jar`
       * `activation-1.1.jar`
       * `akka-actor_2.10-2.3.4-spark.jar`
       * `akka-remote_2.10-2.3.4-spark.jar`
       * `akka-slf4j_2.10-2.3.4-spark.jar`
       * `aopalliance-1.0.jar`
       * `arpack_combined_all-0.1.jar`
       * `avro-1.7.7.jar`
       * `breeze-macros_2.10-0.11.2.jar`
       * `breeze_2.10-0.11.2.jar`
       * `chill-java-0.5.0.jar`
       * `chill_2.10-0.5.0.jar`
       * `commons-beanutils-1.7.0.jar`
       * `commons-beanutils-core-1.8.0.jar`
       * `commons-cli-1.2.jar`
       * `commons-codec-1.10.jar`
       * `commons-collections-3.2.1.jar`
       * `commons-compress-1.4.1.jar`
       * `commons-configuration-1.6.jar`
       * `commons-digester-1.8.jar`
       * `commons-httpclient-3.1.jar`
       * `commons-io-2.1.jar`
       * `commons-lang-2.5.jar`
       * `commons-lang3-3.3.2.jar`
       * `commons-math-2.1.jar`
       * `commons-math3-3.1.1.jar`
       * `commons-net-2.2.jar`
       * `compress-lzf-1.0.0.jar`
       * `config-1.2.1.jar`
       * `core-1.1.2.jar`
       * `curator-client-2.4.0.jar`
       * `curator-framework-2.4.0.jar`
       * `curator-recipes-2.4.0.jar`
       * `gmbal-api-only-3.0.0-b023.jar`
       * `grizzly-framework-2.1.2.jar`
       * `grizzly-http-2.1.2.jar`
       * `grizzly-http-server-2.1.2.jar`
       * `grizzly-http-servlet-2.1.2.jar`
       * `grizzly-rcm-2.1.2.jar`
       * `groovy-all-2.3.7.jar`
       * `guava-14.0.1.jar`
       * `guice-3.0.jar`
       * `hadoop-annotations-2.2.0.jar`
       * `hadoop-auth-2.2.0.jar`
       * `hadoop-client-2.2.0.jar`
       * `hadoop-common-2.2.0.jar`
       * `hadoop-hdfs-2.2.0.jar`
       * `hadoop-mapreduce-client-app-2.2.0.jar`
       * `hadoop-mapreduce-client-common-2.2.0.jar`
       * `hadoop-mapreduce-client-core-2.2.0.jar`
       * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
       * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
       * `hadoop-yarn-api-2.2.0.jar`
       * `hadoop-yarn-client-2.2.0.jar`
       * `hadoop-yarn-common-2.2.0.jar`
       * `hadoop-yarn-server-common-2.2.0.jar`
       * `ivy-2.4.0.jar`
       * `jackson-annotations-2.4.0.jar`
       * `jackson-core-2.4.4.jar`
       * `jackson-core-asl-1.8.8.jar`
       * `jackson-databind-2.4.4.jar`
       * `jackson-jaxrs-1.8.8.jar`
       * `jackson-mapper-asl-1.8.8.jar`
       * `jackson-module-scala_2.10-2.4.4.jar`
       * `jackson-xc-1.8.8.jar`
       * `jansi-1.4.jar`
       * `javax.inject-1.jar`
       * `javax.servlet-3.0.0.v201112011016.jar`
       * `javax.servlet-3.1.jar`
       * `javax.servlet-api-3.0.1.jar`
       * `jaxb-api-2.2.2.jar`
       * `jaxb-impl-2.2.3-1.jar`
       * `jcl-over-slf4j-1.7.10.jar`
       * `jersey-client-1.9.jar`
       * `jersey-core-1.9.jar`
       * `jersey-grizzly2-1.9.jar`
       * `jersey-guice-1.9.jar`
       * `jersey-json-1.9.jar`
       * `jersey-server-1.9.jar`
       * `jersey-test-framework-core-1.9.jar`
       * `jersey-test-framework-grizzly2-1.9.jar`
       * `jets3t-0.7.1.jar`
       * `jettison-1.1.jar`
       * `jetty-util-6.1.26.jar`
       * `jline-0.9.94.jar`
       * `jline-2.10.4.jar`
       * `jodd-core-3.6.3.jar`
       * `json4s-ast_2.10-3.2.10.jar`
       * `json4s-core_2.10-3.2.10.jar`
       * `json4s-jackson_2.10-3.2.10.jar`
       * `jsr305-1.3.9.jar`
       * `jtransforms-2.4.0.jar`
       * `jul-to-slf4j-1.7.10.jar`
       * `kryo-2.21.jar`
       * `log4j-1.2.17.jar`
       * `lz4-1.2.0.jar`
       * `management-api-3.0.0-b012.jar`
       * `mesos-0.21.0-shaded-protobuf.jar`
       * `metrics-core-3.1.0.jar`
       * `metrics-graphite-3.1.0.jar`
       * `metrics-json-3.1.0.jar`
       * `metrics-jvm-3.1.0.jar`
       * `minlog-1.2.jar`
       * `netty-3.8.0.Final.jar`
       * `netty-all-4.0.23.Final.jar`
       * `objenesis-1.2.jar`
       * `opencsv-2.3.jar`
       * `oro-2.0.8.jar`
       * `paranamer-2.6.jar`
       * `parquet-column-1.6.0rc3.jar`
       * `parquet-common-1.6.0rc3.jar`
       * `parquet-encoding-1.6.0rc3.jar`
       * `parquet-format-2.2.0-rc1.jar`
       * `parquet-generator-1.6.0rc3.jar`
       * `parquet-hadoop-1.6.0rc3.jar`
       * `parquet-jackson-1.6.0rc3.jar`
       * `protobuf-java-2.4.1.jar`
       * `protobuf-java-2.5.0-spark.jar`
       * `py4j-0.8.2.1.jar`
       * `pyrolite-2.0.1.jar`
       * `quasiquotes_2.10-2.0.1.jar`
       * `reflectasm-1.07-shaded.jar`
       * `scala-compiler-2.10.4.jar`
       * `scala-library-2.10.4.jar`
       * `scala-reflect-2.10.4.jar`
       * `scalap-2.10.4.jar`
       * `scalatest_2.10-2.2.1.jar`
       * `slf4j-api-1.7.10.jar`
       * `slf4j-log4j12-1.7.10.jar`
       * `snappy-java-1.1.1.6.jar`
       * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
       * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
       * `spire-macros_2.10-0.7.4.jar`
       * `spire_2.10-0.7.4.jar`
       * `stax-api-1.0.1.jar`
       * `stream-2.7.0.jar`
       * `tachyon-0.5.0.jar`
       * `tachyon-client-0.5.0.jar`
       * `uncommons-maths-1.2.2a.jar`
       * `unused-1.0.0.jar`
       * `xmlenc-0.52.jar`
       * `xz-1.0.jar`
       * `zookeeper-3.4.5.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90255476
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29754/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-168112187
  
    I'm going to close this pull request.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90838159
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29844/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93208166
  
      [Test build #30311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30311/consoleFull) for   PR 5358 at commit [`8909a5d`](https://github.com/apache/spark/commit/8909a5d14dccc1933a451261bc0e56a2cc876897).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93222352
  
      [Test build #30322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30322/consoleFull) for   PR 5358 at commit [`6014acc`](https://github.com/apache/spark/commit/6014acc11e570c880657238dc4a444ba6335bc13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r27841427
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(generator: Explode, isJoin, isOuter, alias, child) =>
    +      val allReferences = gen.references ++ gen.parentReferences
    +      val pruneProject = prunedChild(child, allReferences)
    +      Generate(generator, isJoin, isOuter, alias, pruneProject)
    --- End diff --
    
    Is that OK if we just do like
    ```scala
    case gen@Generate(_, _, _, _, child) 
      if child.outputSet -- allReferences.filter(c.outputSet.contains)).nonEmpty =>
      gen.copy(child = Project(a.references.toSeq, child))
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by dreamquster <gi...@git.apache.org>.
Github user dreamquster commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93274588
  
    What's wrong with  the code  in PruningSuite?
    
    ````scala
    createPruningTest("Column pruning - explode with aggregate",
        "SELECT name, sum(d) AS sumd FROM person LATERAL VIEW explode(data) d AS d GROUP BY name",
        Seq("name", "sumd"),
        Seq("name","data"),
        Seq.empty)
    
      createPruningTest("Column pruning - outer explode with limit",
        "SELECT name FROM person LATERAL VIEW OUTER explode(data) outd AS d" +
          " where  name < \"C\" limit 3",
        Seq("name"),
        Seq("name", "data"),
        Seq.empty)
    
      createPruningTest(s"Column pruning - select all without explode optimze - query test",
        "SELECT * FROM person LATERAL VIEW OUTER explode(data) outd AS d WHERE 20 < age",
        Seq("name", "age", "data", "d"),
        Seq("name", "age", "data"),
        Seq.empty)
    `````
    
    jenkins report -> Error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r27841595
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(generator: Explode, isJoin, isOuter, alias, child) =>
    +      val allReferences = gen.references ++ gen.parentReferences
    +      val pruneProject = prunedChild(child, allReferences)
    +      Generate(generator, isJoin, isOuter, alias, pruneProject)
    --- End diff --
    
    Then we don't need other code change except the unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r28011972
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -55,6 +55,8 @@ case class Generate(
         child: LogicalPlan)
       extends UnaryNode {
     
    +  var parentReferences:AttributeSet = AttributeSet(Seq.empty)
    +
    --- End diff --
    
    This mutable state is no longer needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93248189
  
      [Test build #30322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30322/consoleFull) for   PR 5358 at commit [`6014acc`](https://github.com/apache/spark/commit/6014acc11e570c880657238dc4a444ba6335bc13).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90255460
  
      [Test build #29754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29754/consoleFull) for   PR 5358 at commit [`376d332`](https://github.com/apache/spark/commit/376d332462a3e2a21a28ecdeab14a3bd1f49ffbf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93270212
  
      [Test build #30328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30328/consoleFull) for   PR 5358 at commit [`54abc3a`](https://github.com/apache/spark/commit/54abc3a7fe80c8aee97d748613e265783ae839d2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by dreamquster <gi...@git.apache.org>.
Github user dreamquster commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-89980837
  
    What's wrong?Who can verify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5358


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r28199784
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(_: Explode, _, _, _, c) =>
    --- End diff --
    
    Oh, I see.  Sorry I did not look closely enough.  In general we do this by matching on a project that is above the node in question.  This works because projects will get pushed down or created in case where unneeded attributes exist.  There are other examples in this file where we do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90828950
  
      [Test build #29844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29844/consoleFull) for   PR 5358 at commit [`644f688`](https://github.com/apache/spark/commit/644f688a8ef679e2402a0d390e95a90e8f1803af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r28012085
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(_: Explode, _, _, _, c) =>
    --- End diff --
    
    Is there any reason to only match on `Explode`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90250644
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by dreamquster <gi...@git.apache.org>.
Github user dreamquster commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r27855034
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(generator: Explode, isJoin, isOuter, alias, child) =>
    +      val allReferences = gen.references ++ gen.parentReferences
    +      val pruneProject = prunedChild(child, allReferences)
    +      Generate(generator, isJoin, isOuter, alias, pruneProject)
    --- End diff --
    
    Ok,thanks:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by dreamquster <gi...@git.apache.org>.
Github user dreamquster commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r28043649
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(_: Explode, _, _, _, c) =>
    --- End diff --
    
    SPARK-6489:this issue means it pruned columns when explode lateral view clause.
    Does it need to extend to all generate clause? I want to mark generate node's parent reference in analysis phase. Is there another gracefully method to get  its parent references?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93230419
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30311/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93270246
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30328/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-89536841
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93183311
  
      [Test build #30300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30300/consoleFull) for   PR 5358 at commit [`9e7aaec`](https://github.com/apache/spark/commit/9e7aaecc5d28914a86a4d8b8da47504efd68bde6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5358#discussion_r28011985
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,8 +171,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
     
           Project(substitutedProjection, child)
     
    +    case gen@Generate(_: Explode, _, _, _, c) =>
    +      val allReferences = gen.references ++ gen.parentReferences
    +      if ((c.outputSet -- allReferences.filter(c.outputSet.contains)).nonEmpty) {
    +        gen.copy(child = Project(allReferences.filter(c.outputSet.contains).toSeq, c))
    +      } else {
    +        gen
    +      }
    +
         // Eliminate no-op Projects
         case Project(projectList, child) if child.output == projectList => child
    +
    +    case plan =>
    +      plan.children.foreach { c =>
    +        c match {
    +          case gen@Generate(generator: Explode, isJoin, isOuter, alias, child) =>
    +            gen.parentReferences = plan.references;
    +          case _ => // nothing
    +        }
    +      }
    +      plan
    --- End diff --
    
    This change is no longer needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93253237
  
      [Test build #30328 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30328/consoleFull) for   PR 5358 at commit [`54abc3a`](https://github.com/apache/spark/commit/54abc3a7fe80c8aee97d748613e265783ae839d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93177971
  
      [Test build #30300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30300/consoleFull) for   PR 5358 at commit [`9e7aaec`](https://github.com/apache/spark/commit/9e7aaecc5d28914a86a4d8b8da47504efd68bde6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90251758
  
      [Test build #29754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29754/consoleFull) for   PR 5358 at commit [`376d332`](https://github.com/apache/spark/commit/376d332462a3e2a21a28ecdeab14a3bd1f49ffbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-164882928
  
    Looks like #8286 already supersedes this. I think we can safely close this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-90838150
  
      [Test build #29844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29844/consoleFull) for   PR 5358 at commit [`644f688`](https://github.com/apache/spark/commit/644f688a8ef679e2402a0d390e95a90e8f1803af).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] SPARK-6489: Optimize lateral view with e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5358#issuecomment-93183316
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30300/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org