You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/09/28 10:41:04 UTC

[jira] [Resolved] (SPARK-10734) DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.

     [ https://issues.apache.org/jira/browse/SPARK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-10734.
-------------------------------
    Resolution: Won't Fix

> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10734
>                 URL: https://issues.apache.org/jira/browse/SPARK-10734
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>            Reporter: Bijay Singh Bisht
>            Priority: Minor
>
> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however since OffsetRequest.LatestTime is a relative thing, its depends on when the batch is scheduled. One would imagine that given an input data set the data in the batches should be predictable, irrespective of the system conditions. Using the batch time implies that the stream processing will have the same batches irrespective of whether when the processing was started and the load conditions on the system.
> This along with [SPARK-10732] provides for a nice regression scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org