You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/20 03:13:18 UTC

[GitHub] beliefer opened a new pull request #23841: [SPARK-26936][SQL] Fix bug of insert overwrite local dir and inconsistent behavior with Hive

beliefer opened a new pull request #23841: [SPARK-26936][SQL] Fix bug of insert overwrite local dir and inconsistent behavior with Hive
URL: https://github.com/apache/spark/pull/23841
 
 
   ## What changes were proposed in this pull request?
   The feature is 'insert overwrite local directory' has an inconsistent behavior with Hive and has a bug.
   
   ### First, let me introduce the inconsistent behavior.
   
   There exists a local path '/home/spark/' and not contains child directory 'result' on driver node.
   I want save data of hive table A into  '/home/spark/result/A/', so I use the SQL as follows:
   `insert overwrite local directory '/home/spark/result/A/' select * from A;`
   When I execute this SQL, Hive will create the parent directory 'result' and child directory 'A', and finally mv the data into  '/home/spark/result/A/'.
   But Spark SQL will not do these things.
   This pr will use LocalFileSystem to create path that not exists.
   ### Second, let me introduce bug of  'insert overwrite local directory'.
   
   If I execute the SQL mentioned before, a HiveException will appear as follows:
   `Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/xitong/hive/stagingdir_hive_2019-02-19_17-31-00_678_1816816774691551856-1/-ext-10000/_temporary/0/_temporary/attempt_20190219173233_0002_m_000000_3 (exists=false, cwd=file:/data10/yarn/nm-local-dir/usercache/xitong/appcache/application_1543893582405_6126857/container_e124_1543893582405_6126857_01_000011)
   at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)`
   Current spark sql generate a local temporary path in local staging directory.The schema of local temporary path is file,so the HiveException appears.
   This pr change the local temporary path to HDFS temporary path, and use DistributedFileSystem instance copy the data from HDFS temporary path to local directory.
   ## How was this patch tested?
   
   Using exists junit or suite.
   
   Please review http://spark.apache.org/contributing.html before opening a pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org