You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:21:30 UTC

[jira] [Updated] (SPARK-11632) Filter out empty partition for KafkaRDD

     [ https://issues.apache.org/jira/browse/SPARK-11632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-11632:
---------------------------------
    Labels: bulk-closed  (was: )

> Filter out empty partition for KafkaRDD
> ---------------------------------------
>
>                 Key: SPARK-11632
>                 URL: https://issues.apache.org/jira/browse/SPARK-11632
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>            Reporter: Saisai Shao
>            Priority: Minor
>              Labels: bulk-closed
>
> For KafkaRDD, each partition's processed message number is calculated beforehand, so empty partition could be filtered out to avoid submitting unnecessary tasks. This could alleviate scheduling overhead if there's no data come in, also makes dynamic allocation effective to shrink the resources (no pending tasks).
> For other receiver-based DStream, BlockRDD already support empty one, so if no data injected at that time period, there will be no task submitted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org