You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Saisai Shao (JIRA)" <ji...@apache.org> on 2015/04/03 08:20:52 UTC

[jira] [Created] (SPARK-6691) Abstract and add a dynamic RateLimiter for Spark Streaming

Saisai Shao created SPARK-6691:
----------------------------------

             Summary: Abstract and add a dynamic RateLimiter for Spark Streaming
                 Key: SPARK-6691
                 URL: https://issues.apache.org/jira/browse/SPARK-6691
             Project: Spark
          Issue Type: New Feature
          Components: Streaming
    Affects Versions: 1.3.0
            Reporter: Saisai Shao


Flow control (or rate control) for input data is very important in streaming system, especially for Spark Streaming to keep stable and up-to-date. The unexpected flood of incoming data or the high ingestion rate of input data which beyond the computation power of cluster will make the system unstable and increase the delay time. For Spark Streaming’s job generation and processing pattern, this delay will be accumulated and introduce unacceptable exceptions.

----

Currently in Spark Streaming’s receiver based input stream, there’s a RateLimiter in BlockGenerator which controls the ingestion rate of input data, but the current implementation has several limitations:

# The max ingestion rate is set by user through configuration beforehand, user may lack the experience of how to set an appropriate value before the application is running.
# This configuration is fixed through the life-time of application, which means you need to consider the worst scenario to set a reasonable configuration.
# Input stream like DirectKafkaInputStream need to maintain another solution to achieve the same functionality.
# Lack of slow start control makes the whole system easily trapped into large processing and scheduling delay at the very beginning.

----

So here we propose a new dynamic RateLimiter as well as the new interface for the RateLimiter to better improve the whole system's stability. The target is:


* Dynamically adjust the ingestion rate according to processing rate of previous finished jobs.
* Offer an uniform solution not only for receiver based input stream, but also for direct stream like DirectKafkaInputStream and new ones.
* Slow start rate to control the network congestion when job is started.
* Pluggable framework to make the maintenance of extension more easy.

----

Here is the design doc (https://docs.google.com/document/d/1lqJDkOYDh_9hRLQRwqvBXcbLScWPmMa7MlG8J_TE93w/edit?usp=sharing) and working branch (https://github.com/jerryshao/apache-spark/tree/dynamic-rate-limiter).

Any comment would be greatly appreciated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org