You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/03/05 20:08:26 UTC
[spark] branch branch-2.4 updated: [SPARK-26932][DOC] Add a warning for Hive 2.1.1 ORC reader issue

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new b583bfe  [SPARK-26932][DOC] Add a warning for Hive 2.1.1 ORC reader issue
b583bfe is described below

commit b583bfe1a41d69593161b45722e53378d6636efd
Author: Bo Hai <ha...@163.com>
AuthorDate: Tue Mar 5 11:57:04 2019 -0800

    [SPARK-26932][DOC] Add a warning for Hive 2.1.1 ORC reader issue
    
    Hive 2.1.1 cannot read ORC table created by Spark 2.4.0 in default, and I add the information into sql-migration-guide-upgrade.md. for details to see:  [SPARK-26932](https://issues.apache.org/jira/browse/SPARK-26932)
    
    doc build
    
    Closes #23944 from haiboself/SPARK-26932.
    
    Authored-by: Bo Hai <ha...@163.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
    (cherry picked from commit c27caead43423d1f994f42502496d57ea8389dc0)
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 docs/sql-migration-guide-upgrade.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-migration-guide-upgrade.md b/docs/sql-migration-guide-upgrade.md
index 05c33f6..8dae1b4 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -77,7 +77,7 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 2.4, Spark will display table description column Last Access value as UNKNOWN when the value was Jan 01 1970.
 
-  - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively.
+  - Since Spark 2.4, Spark maximizes the usage of a vectorized ORC reader for ORC files by default. To do that, `spark.sql.orc.impl` and `spark.sql.orc.filterPushdown` change their default values to `native` and `true` respectively. ORC files created by native ORC writer cannot be read by some old Apache Hive releases. Use `spark.sql.orc.impl=hive` to create the files shared with Hive 2.1.1 and older.
 
   - In PySpark, when Arrow optimization is enabled, previously `toPandas` just failed when Arrow optimization is unable to be used whereas `createDataFrame` from Pandas DataFrame allowed the fallback to non-optimization. Now, both `toPandas` and `createDataFrame` from Pandas DataFrame allow the fallback by default, which can be switched off by `spark.sql.execution.arrow.fallback.enabled`.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org