You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Cody Koeninger (JIRA)" <ji...@apache.org> on 2015/09/22 20:01:04 UTC

[jira] [Commented] (SPARK-10734) DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.

    [ https://issues.apache.org/jira/browse/SPARK-10734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903106#comment-14903106 ] 

Cody Koeninger commented on SPARK-10734:
----------------------------------------

as I explained in SPARK-10732 , kafka's getOffsetsBefore api is limited to the timestamps on log file segments, so its granularity is quite poor and doesn't really behave as one might expect.


> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however using the batch time would be more desireable.
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-10734
>                 URL: https://issues.apache.org/jira/browse/SPARK-10734
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>            Reporter: Bijay Singh Bisht
>
> DirectKafkaInputDStream uses the OffsetRequest.LatestTime to find the latest offset, however since OffsetRequest.LatestTime is a relative thing, its depends on when the batch is scheduled. One would imagine that given an input data set the data in the batches should be predictable, irrespective of the system conditions. Using the batch time implies that the stream processing will have the same batches irrespective of whether when the processing was started and the load conditions on the system.
> This along with [SPARK-10732] provides for a nice regression scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org