You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Frederick Reiss (JIRA)" <ji...@apache.org> on 2016/09/09 17:27:21 UTC

[jira] [Created] (SPARK-17475) HDFSMetadataLog should not leak CRC files

Frederick Reiss created SPARK-17475:
---------------------------------------

             Summary: HDFSMetadataLog should not leak CRC files
                 Key: SPARK-17475
                 URL: https://issues.apache.org/jira/browse/SPARK-17475
             Project: Spark
          Issue Type: Sub-task
          Components: Streaming
            Reporter: Frederick Reiss


When HDFSMetadataLog uses a log directory on a filesystem other than HDFS (i.e. NFS or the driver node's local filesystem), the class leaves orphan checksum (CRC) files in the log directory. The files have names that follow the pattern "..[long UUID hex string].tmp.crc". These files exist HDFSMetaDataLog renames other temporary files without renaming the corresponding checksum files. There is one CRC file per batch, so the directory fills up quite quickly.

I'm not certain, but this problem might also occur on certain versions of the HDFS APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org