You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by chenghao-intel <gi...@git.apache.org> on 2015/02/10 15:15:01 UTC

[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

GitHub user chenghao-intel opened a pull request:

    https://github.com/apache/spark/pull/4506

    [SQL] [Minor] Deferred table resolving for DF API

    Eagerly resolving the table probably causes side effect in some scenarios, let's keep it the same behavior (deferred resolving) with the other DF APIs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark df_table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4506
    
----
commit 3ee839f2f6b93c93a6fb381cf17ab17a5d19a25b
Author: Cheng Hao <ha...@intel.com>
Date:   2015-02-10T14:02:36Z

    using unresolved logical plan instead of resolved

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92188010
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30133/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-73711302
  
      [Test build #27207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27207/consoleFull) for   PR 4506 at commit [`3ee839f`](https://github.com/apache/spark/commit/3ee839f2f6b93c93a6fb381cf17ab17a5d19a25b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-91948523
  
    Closing it since it's only impact a single hive compatible test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74404814
  
    @marmbrus Seems there is a bug in `TestHive`, which will load the table during logical plan analyzing, I've fixed the logic and let's see if we need to re-generate the golden files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92189980
  
      [Test build #30131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30131/consoleFull) for   PR 4506 at commit [`b463f8a`](https://github.com/apache/spark/commit/b463f8a2e9d6dce35ca63eae7cd3d1a6c5d2c292).
     * This patch **passes all tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92190000
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30131/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-73997207
  
    The deferred resolving table should be harmless, and since most of the DF API are yielding the unresolved logical plans(can I say that?), I think we'd better keep it the same for `def table(name)`, .
    
    Ideally, eagerly resolving the table should produce the the result in unit test, but seems not, probably something wrong somewhere, I will keep investigating that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92172761
  
      [Test build #30133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30133/consoleFull) for   PR 4506 at commit [`0be05f7`](https://github.com/apache/spark/commit/0be05f7e83f6e9617b738f307428c12263718c8c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Deferred table resolving fo...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-75884168
  
    @marmbrus @rxin @yhuai any more comment? Or should I close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r26002314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -783,7 +783,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
     
       /** Returns the specified table as a [[DataFrame]]. */
       def table(tableName: String): DataFrame =
    -    DataFrame(this, catalog.lookupRelation(Seq(tableName)))
    +    DataFrame(this, UnresolvedRelation(Seq(tableName)))
    --- End diff --
    
    My point here is we should leave the Relation Resolution for `SQLContext` or its extension,  through in most of case, the relation resolution resort to `catalog.lookupRelation`, but we never know if the logic will be changed or not, particularly for the extensions.  In #4784, it will returns an unresolved relation for `catalog.lookupRelation`, which required further analysis, it's maybe not the strong reason that we need to change code here, but it gives us an example that `catalog.lookupRelation` does not always act as the `eager analysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92187993
  
      [Test build #30133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30133/consoleFull) for   PR 4506 at commit [`0be05f7`](https://github.com/apache/spark/commit/0be05f7e83f6e9617b738f307428c12263718c8c).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r25398253
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
    @@ -248,12 +248,6 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
              |WITH SERDEPROPERTIES ('field.delim'='\\t')
            """.stripMargin.cmd,
           "INSERT OVERWRITE TABLE serdeins SELECT * FROM src".cmd),
    -    TestTable("sales",
    -      s"""CREATE TABLE IF NOT EXISTS sales (key STRING, value INT)
    -         |ROW FORMAT SERDE '${classOf[RegexSerDe].getCanonicalName}'
    -         |WITH SERDEPROPERTIES ("input.regex" = "([^ ]*)\t([^ ]*)")
    -       """.stripMargin.cmd,
    -      s"LOAD DATA LOCAL INPATH '${getHiveFile("data/files/sales.txt")}' INTO TABLE sales".cmd),
    --- End diff --
    
    Since `sales` is not a preloaded in Hive's unit tests (https://github.com/apache/hive/blob/trunk/data/scripts/q_test_init.sql), seems it is fine to remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74404516
  
      [Test build #27505 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27505/consoleFull) for   PR 4506 at commit [`40ccd81`](https://github.com/apache/spark/commit/40ccd81b8349081998c4108b90f1b21106613dda).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74405134
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27505/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r26002268
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala ---
    @@ -25,17 +25,24 @@ import org.apache.spark.sql.hive.test.TestHive
      * A set of tests that validates support for Hive SerDe.
      */
     class HiveSerDeSuite extends HiveComparisonTest with BeforeAndAfterAll {
    -
       override def beforeAll() = {
    +    import TestHive._
    +    import org.apache.hadoop.hive.serde2.RegexSerDe
         TestHive.cacheTables = false
    +    sql(s"""CREATE TABLE IF NOT EXISTS sales (key STRING, value INT)
    +       |ROW FORMAT SERDE '${classOf[RegexSerDe].getCanonicalName}'
    +       |WITH SERDEPROPERTIES ("input.regex" = "([^ ]*)\t([^ ]*)")
    +       """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '${getHiveFile("data/files/sales.txt")}' INTO TABLE sales")
    --- End diff --
    
    Thank you @yhuai for noticing this, actually the `createQueryTest` will clean the table by default (as the second argument `reset=true` by default). And that's also why I moved ahead for the test `Read with RegexSerDe`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74809284
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27672/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r25398281
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala ---
    @@ -25,17 +25,24 @@ import org.apache.spark.sql.hive.test.TestHive
      * A set of tests that validates support for Hive SerDe.
      */
     class HiveSerDeSuite extends HiveComparisonTest with BeforeAndAfterAll {
    -
       override def beforeAll() = {
    +    import TestHive._
    +    import org.apache.hadoop.hive.serde2.RegexSerDe
         TestHive.cacheTables = false
    +    sql(s"""CREATE TABLE IF NOT EXISTS sales (key STRING, value INT)
    +       |ROW FORMAT SERDE '${classOf[RegexSerDe].getCanonicalName}'
    +       |WITH SERDEPROPERTIES ("input.regex" = "([^ ]*)\t([^ ]*)")
    +       """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '${getHiveFile("data/files/sales.txt")}' INTO TABLE sales")
    --- End diff --
    
    drop it in `afterAll`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92172250
  
      [Test build #30132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30132/consoleFull) for   PR 4506 at commit [`dd0b3f6`](https://github.com/apache/spark/commit/dd0b3f61eb8606d1587be3cc11a9fa6708a32fc7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r25398571
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -783,7 +783,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
     
       /** Returns the specified table as a [[DataFrame]]. */
       def table(tableName: String): DataFrame =
    -    DataFrame(this, catalog.lookupRelation(Seq(tableName)))
    +    DataFrame(this, UnresolvedRelation(Seq(tableName)))
    --- End diff --
    
    @rxin this line is the main change. @chenghao-intel thinks we should use `UnresolvedRelation` at here. So, when eager analysis is off, the `DataFrame.queryExecution.logical` will not return a resolved relation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-73705668
  
      [Test build #27207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27207/consoleFull) for   PR 4506 at commit [`3ee839f`](https://github.com/apache/spark/commit/3ee839f2f6b93c93a6fb381cf17ab17a5d19a25b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74640117
  
      [Test build #27625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27625/consoleFull) for   PR 4506 at commit [`4c58fbc`](https://github.com/apache/spark/commit/4c58fbc7813a3d6200a8029001798291437635bd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74802654
  
      [Test build #27672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27672/consoleFull) for   PR 4506 at commit [`28dd6d5`](https://github.com/apache/spark/commit/28dd6d5e62d5d37a91d98d5d91c12fd1405e91d2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel closed the pull request at:

    https://github.com/apache/spark/pull/4506


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74640124
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27625/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92187601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30132/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-89888015
  
      [Test build #29735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29735/consoleFull) for   PR 4506 at commit [`b463f8a`](https://github.com/apache/spark/commit/b463f8a2e9d6dce35ca63eae7cd3d1a6c5d2c292).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74405131
  
      [Test build #27505 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27505/consoleFull) for   PR 4506 at commit [`40ccd81`](https://github.com/apache/spark/commit/40ccd81b8349081998c4108b90f1b21106613dda).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-73785605
  
    We have eagerly resolved the table since Spark 1.0 when Spark SQL was added.
    
    https://github.com/apache/spark/blob/branch-1.0/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L185
    
    Why do you think this is problematic?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r25408698
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -783,7 +783,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
     
       /** Returns the specified table as a [[DataFrame]]. */
       def table(tableName: String): DataFrame =
    -    DataFrame(this, catalog.lookupRelation(Seq(tableName)))
    +    DataFrame(this, UnresolvedRelation(Seq(tableName)))
    --- End diff --
    
    Thank you @yhuai for the explanation. I've created another PR (#4784), I'd like to return the `Unresolved` logical plan for `lookupRelation`, which will be helpful for the `describe extended cache|table|view|data source table`. Probably we'd better to review #4784 first and then jump back to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92171815
  
      [Test build #30131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30131/consoleFull) for   PR 4506 at commit [`b463f8a`](https://github.com/apache/spark/commit/b463f8a2e9d6dce35ca63eae7cd3d1a6c5d2c292).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-89903861
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29735/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4506


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74809281
  
      [Test build #27672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27672/consoleFull) for   PR 4506 at commit [`28dd6d5`](https://github.com/apache/spark/commit/28dd6d5e62d5d37a91d98d5d91c12fd1405e91d2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by chenghao-intel <gi...@git.apache.org>.
GitHub user chenghao-intel reopened a pull request:

    https://github.com/apache/spark/pull/4506

    [SPARK-5941] [SQL] Unit Test loads the table `src` twice for leftsemijoin.q

    In `leftsemijoin.q`, there is a data loading command for table `sales` already, but in `TestHive`, it also created the table `sales`, which causes duplicated records inserted into the `sales`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chenghao-intel/spark df_table

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4506
    
----
commit b463f8a2e9d6dce35ca63eae7cd3d1a6c5d2c292
Author: Cheng Hao <ha...@intel.com>
Date:   2015-04-06T01:49:15Z

    Remove the table `sales` creating from TestHive

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92527942
  
    Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-75915426
  
    Sorry for the confusing. I've updated the title and description.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Deferred table resolving fo...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-75912942
  
    If I remember it correctly, the issue is some test tables are loaded twice. @chenghao-intel Can you change the title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92173101
  
    @marmbrus I've reopened this PR, just in case people runs into the bug of this while unit testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by chenghao-intel <gi...@git.apache.org>.
Github user chenghao-intel commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74803371
  
    /cc @marmbrus @rxin @yhuai I noticed that we have added the `dataFrameEagerAnalysis` in `SQLConf`, probably we also need to update the `def table` code for using `UnresolvedRelation` instead of `lookupRelation` eagerly.
    However some of the unit test (left semi join) will fail if we do that, as we have bug in TestHive, this PR is also fix the bug. More detailed information about the unit test failure can be found at the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-92187592
  
      [Test build #30132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30132/consoleFull) for   PR 4506 at commit [`dd0b3f6`](https://github.com/apache/spark/commit/dd0b3f61eb8606d1587be3cc11a9fa6708a32fc7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Unit Test loads the table `...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-89903845
  
      [Test build #29735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29735/consoleFull) for   PR 4506 at commit [`b463f8a`](https://github.com/apache/spark/commit/b463f8a2e9d6dce35ca63eae7cd3d1a6c5d2c292).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [DataFrame] Postpone the table re...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4506#discussion_r25398649
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -783,7 +783,7 @@ class SQLContext(@transient val sparkContext: SparkContext)
     
       /** Returns the specified table as a [[DataFrame]]. */
       def table(tableName: String): DataFrame =
    -    DataFrame(this, catalog.lookupRelation(Seq(tableName)))
    +    DataFrame(this, UnresolvedRelation(Seq(tableName)))
    --- End diff --
    
    I don't see why.  Eager analysis is only for debugging complex failures in the analyzer.  Missing a table doesn't really fall into that category.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5941] [SQL] Deferred table resolving fo...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-75912390
  
    @marmbrus can you take a look? not 100% sure what's happening


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-73711308
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27207/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] [Minor] Deferred table resolving for DF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4506#issuecomment-74635723
  
      [Test build #27625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27625/consoleFull) for   PR 4506 at commit [`4c58fbc`](https://github.com/apache/spark/commit/4c58fbc7813a3d6200a8029001798291437635bd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org