You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/06/25 23:11:30 UTC

[GitHub] spark pull request #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data S...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/13907

    [SPARK-16209] [SQL] Convert Hive Tables to Data Source Tables for CREATE TABLE AS SELECT

    #### What changes were proposed in this pull request?
    Currently, the following created table is Hive Table.
    ```SQL
    CREATE TABLE t STORED AS parquet SELECT 1 as a, 1 as b
    ```
    When users create table as query with `STORED AS` or `ROW FORMAT`, we will not convert them to data source tables when `spark.sql.hive.convertCTAS` is set to `true`. Actually, for parquet and orc formats, we still can convert them to data source table even if the users use `STORED AS` or `ROW FORMAT`.
    
    #### How was this patch tested?
    Added test cases for both ORC and PARQUET

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark storedAsParquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13907
    
----
commit 06e115ca886809a7b1fcd16e96bd1e9f493add79
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-25T23:05:19Z

    fix

commit 2cf107d69a7a16500f17d03d034be43b3ac8cab3
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-25T23:06:42Z

    Merge remote-tracking branch 'upstream/master' into storedAsParquet

commit c4bde0217a5e6a31da15cc29dc552a198ed6ef21
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-25T23:08:00Z

    clean

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    **[Test build #61253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61253/consoleFull)** for PR 13907 at commit [`a9ce0d8`](https://github.com/apache/spark/commit/a9ce0d8342a2c3768823b4dd120fda0997b1c313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUE...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/13907


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    Nope. If users do not specify the intput and output formats. We will use the default `INPUTFORMAT`, which is `org.apache.hadoop.mapred.TextInputFormat` and the default `OUTPUTFORMAT`, which is `org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat`. This is different from the standard input and output formats for `ORC`: `org.apache.hadoop.hive.ql.io.orc.OrcInputFormat` and `org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat`. 
    
    I am not sure whether we should still convert it. Please let me know if you think we should still convert them. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    I don't think it's a very useful feature, and we may surprise users as they do use hive syntax to specify row format.
    
    For advanced users, they can easily use `USING xxx` to explicitly create a data source table for better performance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    **[Test build #61243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61243/consoleFull)** for PR 13907 at commit [`c4bde02`](https://github.com/apache/spark/commit/c4bde0217a5e6a31da15cc29dc552a198ed6ef21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61253/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    This is not contained in https://github.com/apache/spark/pull/14482. Should I leave it open? Or should I fix the conflict after https://github.com/apache/spark/pull/14482 is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61243/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables to Data Source T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    **[Test build #61243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61243/consoleFull)** for PR 13907 at commit [`c4bde02`](https://github.com/apache/spark/commit/c4bde0217a5e6a31da15cc29dc552a198ed6ef21).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    With your PR, if users specify `ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'`, will we convert?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    I see. Let me close it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13907: [SPARK-16209] [SQL] Convert Hive Tables in PARQUET/ORC t...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13907
  
    **[Test build #61253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61253/consoleFull)** for PR 13907 at commit [`a9ce0d8`](https://github.com/apache/spark/commit/a9ce0d8342a2c3768823b4dd120fda0997b1c313).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org