You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/06/10 03:18:00 UTC

[jira] [Commented] (SPARK-24791) Spark Structured Streaming randomly does not process batch

    [ https://issues.apache.org/jira/browse/SPARK-24791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859683#comment-16859683 ] 

Apache Spark commented on SPARK-24791:
--------------------------------------

User 'zhangmeng0426' has created a pull request for this issue:
https://github.com/apache/spark/pull/24791

> Spark Structured Streaming randomly does not process batch
> ----------------------------------------------------------
>
>                 Key: SPARK-24791
>                 URL: https://issues.apache.org/jira/browse/SPARK-24791
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.1
>            Reporter: Arvind Ramachandran
>            Priority: Major
>
> I have developed an application that writes small CSV files to a specific HDFS folder . Spark Structured Streaming reads the HDFS folder . On a random basis i see that it does not process a CSV File , the only case this occurs is the batch size is one CSV file again random in nature not consistent.I cannot guarantee the size of the batch will be greater than one because the requirement is low latency processing but volume is low.
> I can see  that the commits , offset and source folders has the batch information but the csv file is not processed when i look at the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org