You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Luca M (Jira)" <ji...@apache.org> on 2022/05/12 21:36:00 UTC

[jira] [Comment Edited] (HADOOP-17847) S3AInstrumentation Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics

    [ https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536329#comment-17536329 ] 

Luca M edited comment on HADOOP-17847 at 5/12/22 9:35 PM:
----------------------------------------------------------

Hi 
I am seeing a similar issue in our project and not sure if to create a separate JIRA or comment on this. 
The S3 connections seems to be successfully completed but the statistics are logging warnings about pending data. We are using hadoop=3.3.2, aws-java-sdk-bundle=1.11.1026

Here the log snippet with relevant information
{code}
[09-May-2022 22:13:05.013 UTC] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem                       [] - PUT completed success=true; 48819 bytes
*****************
[09-May-2022 22:13:05.013 UTC] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem                       [] - Finished write to PATH_WITH_FILENAME, len 48819. etag **, version **
***************
[09-May-2022 22:13:05.012 UTC] DEBUG org.apache.hadoop.fs.s3a.S3ABlockOutputStream                [] - Upload complete to PATH_WITH_FILENAME by WriteOperationHelper \{bucket=BUCKETNAME}
*****************
09-May-2022 22:13:05.014 UTC] WARN org.apache.hadoop.fs.s3a.S3AInstrumentation                  [] - Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics
******
[09-May-2022 22:13:05.014 UTC] WARN org.apache.hadoop.fs.s3a.S3AInstrumentation                  [] - Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics{counters=((stream_write_queue_duration=0) (action_executor_acquired.failures=0) (op_abort.failures=0) (stream_write_bytes=48819) (op_abort=0) (action_executor_acquired=0) (multipart_upload_completed.failures=0) (object_multipart_aborted.failures=0) (op_hsync=0) (op_hflush=0) (stream_write_exceptions_completing_upload=0) (object_multipart_aborted=0) (stream_write_total_data=40960) (stream_write_block_uploads=1) (stream_write_exceptions=0) (stream_write_total_time=0) (multipart_upload_completed=0));
gauges=((stream_write_block_uploads_data_pending=7859) (stream_write_block_uploads_pending=1));
minimums=((action_executor_acquired.min=-1) (multipart_upload_completed.min=-1) (object_multipart_aborted.min=-1) (op_abort.failures.min=-1) (multipart_upload_completed.failures.min=-1) (action_executor_acquired.failures.min=-1) (object_multipart_aborted.failures.min=-1) (op_abort.min=-1));
maximums=((object_multipart_aborted.max=-1) (multipart_upload_completed.failures.max=-1) (action_executor_acquired.max=-1) (op_abort.max=-1) (multipart_upload_completed.max=-1) (op_abort.failures.max=-1) (action_executor_acquired.failures.max=-1) (object_multipart_aborted.failures.max=-1));
means=((action_executor_acquired.mean=(samples=0, sum=0, mean=0.0000)) (object_multipart_aborted.mean=(samples=0, sum=0, mean=0.0000)) (op_abort.failures.mean=(samples=0, sum=0, mean=0.0000)) (multipart_upload_completed.mean=(samples=0, sum=0, mean=0.0000)) (object_multipart_aborted.failures.mean=(samples=0, sum=0, mean=0.0000)) (multipart_upload_completed.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_executor_acquired.failures.mean=(samples=0, sum=0, mean=0.0000)) (op_abort.mean=(samples=0, sum=0, mean=0.0000)));

stream_write_total_data=40960 + stream_write_block_uploads_data_pending=7859 = stream_write_bytes=48819. However, previous logs seems to indicate that the whole payload has been correctly updated.
{code}
Let me know if you prefer to create a separate JIRA and need more info.

Thanks!


was (Author: JIRAUSER289406):
Hi 
I am seeing a similar issue in our project and not sure if to create a separate JIRA or comment on this. 
The S3 connections seems to be successfully completed but the statistics are logging warnings about pending data. We are using hadoop=3.3.2, aws-java-sdk-bundle=1.11.931

Here the log snippet with relevant information
{code}
[09-May-2022 22:13:05.013 UTC] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem                       [] - PUT completed success=true; 48819 bytes
*****************
[09-May-2022 22:13:05.013 UTC] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem                       [] - Finished write to PATH_WITH_FILENAME, len 48819. etag **, version **
***************
[09-May-2022 22:13:05.012 UTC] DEBUG org.apache.hadoop.fs.s3a.S3ABlockOutputStream                [] - Upload complete to PATH_WITH_FILENAME by WriteOperationHelper \{bucket=BUCKETNAME}
*****************
09-May-2022 22:13:05.014 UTC] WARN org.apache.hadoop.fs.s3a.S3AInstrumentation                  [] - Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics
******
[09-May-2022 22:13:05.014 UTC] WARN org.apache.hadoop.fs.s3a.S3AInstrumentation                  [] - Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics{counters=((stream_write_queue_duration=0) (action_executor_acquired.failures=0) (op_abort.failures=0) (stream_write_bytes=48819) (op_abort=0) (action_executor_acquired=0) (multipart_upload_completed.failures=0) (object_multipart_aborted.failures=0) (op_hsync=0) (op_hflush=0) (stream_write_exceptions_completing_upload=0) (object_multipart_aborted=0) (stream_write_total_data=40960) (stream_write_block_uploads=1) (stream_write_exceptions=0) (stream_write_total_time=0) (multipart_upload_completed=0));
gauges=((stream_write_block_uploads_data_pending=7859) (stream_write_block_uploads_pending=1));
minimums=((action_executor_acquired.min=-1) (multipart_upload_completed.min=-1) (object_multipart_aborted.min=-1) (op_abort.failures.min=-1) (multipart_upload_completed.failures.min=-1) (action_executor_acquired.failures.min=-1) (object_multipart_aborted.failures.min=-1) (op_abort.min=-1));
maximums=((object_multipart_aborted.max=-1) (multipart_upload_completed.failures.max=-1) (action_executor_acquired.max=-1) (op_abort.max=-1) (multipart_upload_completed.max=-1) (op_abort.failures.max=-1) (action_executor_acquired.failures.max=-1) (object_multipart_aborted.failures.max=-1));
means=((action_executor_acquired.mean=(samples=0, sum=0, mean=0.0000)) (object_multipart_aborted.mean=(samples=0, sum=0, mean=0.0000)) (op_abort.failures.mean=(samples=0, sum=0, mean=0.0000)) (multipart_upload_completed.mean=(samples=0, sum=0, mean=0.0000)) (object_multipart_aborted.failures.mean=(samples=0, sum=0, mean=0.0000)) (multipart_upload_completed.failures.mean=(samples=0, sum=0, mean=0.0000)) (action_executor_acquired.failures.mean=(samples=0, sum=0, mean=0.0000)) (op_abort.mean=(samples=0, sum=0, mean=0.0000)));

stream_write_total_data=40960 + stream_write_block_uploads_data_pending=7859 = stream_write_bytes=48819. However, previous logs seems to indicate that the whole payload has been correctly updated.
{code}
Let me know if you prefer to create a separate JIRA and need more info.

Thanks!

> S3AInstrumentation Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17847
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>         Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
>            Reporter: Li Rong
>            Priority: Major
>         Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0, blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716, bytesUploaded=0, blocksAllocated=1, blocksReleased=1, blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms, totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org