You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2017/03/15 06:31:41 UTC
[jira] [Commented] (SPARK-6384) saveAsParquet doesn't clean up attempt_* folders

    [ https://issues.apache.org/jira/browse/SPARK-6384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925617#comment-15925617 ] 

Takeshi Yamamuro commented on SPARK-6384:
-----------------------------------------

Since this ticket is almost inactive and the related code has totally changed (At least, SchemaRDD's gone), I'll close this. If you have any problem, you feel free to update description and reopen this. Thanks!

> saveAsParquet doesn't clean up attempt_* folders
> ------------------------------------------------
>
>                 Key: SPARK-6384
>                 URL: https://issues.apache.org/jira/browse/SPARK-6384
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.1
>            Reporter: Rex Xiong
>
> After calling SchemaRDD.saveAsParquet, it runs well and generate *.parquet, _SUCCESS, _common_metadata, _metadata files successfully.
> But sometimes, there will be some attempt_* folder (e.g. attempt_201503170229_0006_r_000006_736, attempt_201503170229_0006_r_000404_416) under the same folder, it contains one parquet file, seems to be a working temp folder.
> It happens even though _SUCCESS file created.
> In this situation, SparkSQL (Hive table) throws exception when loading this parquet folder:
> Error: java.io.FileNotFoundException: Path is not a file: ............../attempt_201503170229_0006_r_000006_736
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.ja
> va:69)
>         at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.ja
> va:55)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations
> UpdateTimes(FSNamesystem.java:1728)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations
> Int(FSNamesystem.java:1671)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations
> (FSNamesystem.java:1651)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations
> (FSNamesystem.java:1625)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLoca
> tions(NameNodeRpcServer.java:503)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTra
> nslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:32
> 2)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cl
> ientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.cal
> l(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
> tion.java:1594)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) (state=,co
> de=0)
> I'm not sure whether it's a Spark bug or a Parquet bug.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org