You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bang Xiao (JIRA)" <ji...@apache.org> on 2018/12/11 08:08:01 UTC

[jira] [Updated] (SPARK-26332) Spark sql write orc table on viewFS throws exception

     [ https://issues.apache.org/jira/browse/SPARK-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bang Xiao updated SPARK-26332:
------------------------------
    Affects Version/s: 2.2.0

> Spark sql write orc table on viewFS throws exception
> ----------------------------------------------------
>
>                 Key: SPARK-26332
>                 URL: https://issues.apache.org/jira/browse/SPARK-26332
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.3.1
>            Reporter: Bang Xiao
>            Priority: Major
>
> Using SparkSQL write orc table on viewFs will cause exception:
> {code:java}
> Task failed while writing rows.
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:109)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid
> at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:634)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:2103)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2120)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:352)
> at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
> at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2413)
> at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86)
> at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:392)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
> at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1414)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
> ... 8 more
> Suppressed: org.apache.hadoop.fs.viewfs.NotInMountpointException: getDefaultReplication on empty path is invalid
> at org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:634)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:2103)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2120)
> at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2425)
> at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
> at org.apache.spark.sql.hive.execution.HiveOutputWriter.close(HiveFileFormat.scala:154)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:405)
> at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$1.apply$mcV$sp(FileFormatWriter.scala:275)
> at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1423)
> ... 9 more{code}
> this exception can be reproduced by follow sqls:
> {code:java}
> spark-sql> CREATE EXTERNAL TABLE test_orc(test_id INT, test_age INT, test_rank INT) STORED AS ORC LOCATION 'viewfs://nsX/user/hive/warehouse/ultraman_tmp.db/test_orc';
> spark-sql> CREATE TABLE source(id INT, age INT, rank INT);
> spark-sql> INSERT INTO source VALUES(1,1,1);
> spark-sql> INSERT OVERWRITE TABLE test_orc SELECT * FROM source;
> {code}
> this is related to https://issues.apache.org/jira/browse/HIVE-10790.  and resolved after hive-2.0.0 , While SparkSQL depends on hive-1.2.1-Spark2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org