You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/02/25 23:35:00 UTC

[jira] [Commented] (SPARK-29995) Structured Streaming file-sink log grow indefinitely

    [ https://issues.apache.org/jira/browse/SPARK-29995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045009#comment-17045009 ] 

Jungtaek Lim commented on SPARK-29995:
--------------------------------------

[~zhangliming]

Hi, if you're open to try out something on your environment, could you please try out SPARK-30946 and see how much it helps? You will need to back up your checkpoint and "_spark_metadata" directory in output directory as SPARK-30946 will convert them to V2 format which is in proposal (no guarantee whether it will be accepted, and when).

If you're not open to try out something but open to provide your metadata files, please upload it somewhere and let me know. The latest 1 compact file would be OK but it would be better if you can provide a set of one compact interval (XXXX9.compact to XXX(X+1)8, 9 files). If you would like to do it privately, please contact me via mail, kabhwan-opensource AT gmail.com

Thanks!

> Structured Streaming file-sink log grow indefinitely
> ----------------------------------------------------
>
>                 Key: SPARK-29995
>                 URL: https://issues.apache.org/jira/browse/SPARK-29995
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: zhang liming
>            Priority: Major
>         Attachments: file.png, task.png
>
>
> When i use structured streaming parquet sink, I've noticed that the File-Sink-Log files keep getting bigger, they are in \{$checkpoint/_spark_metadata/}, i don't think this is reasonable.
> And when they merge files,task batches take longer to run, just like the screenshot below



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org