You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/12/10 15:18:00 UTC
[jira] [Created] (SPARK-33739) Magic committer files don't have the
count of bytes written collected by spark
Steve Loughran created SPARK-33739:
--------------------------------------
Summary: Magic committer files don't have the count of bytes written collected by spark
Key: SPARK-33739
URL: https://issues.apache.org/jira/browse/SPARK-33739
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.1
Reporter: Steve Loughran
The spark statistics tracking doesn't correctly assess the size of the uploaded files as it only calls getFileStatus on the zero byte objects -not the yet-to-manifest files. Which, given they don't exist yet, isn't easy to do.
HADOOP-17414 will attach the final length as a custom header to the marker object, and implement getXAttr in the S3A FS to probe for it.
BasicWriteStatsTracker can probe for this custom Xattr if the size of the generated file is 0 bytes; if found and parseable use that as the declared length of the output.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org