You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Lin Zhao <li...@exabeam.com> on 2016/01/28 02:28:22 UTC

Spark streaming flow control and back pressure

I have an actor receiver that reads data and calls "store()" to save data to spark. I was hoping spark.streaming.receiver.maxRate and spark.streaming.backpressure would help me block the method when needed to avoid overflowing the pipeline. But it doesn't. My actor pumps millions of lines to spark when backpressure and the rate limit is in effect. Whereas these data is slow flowing into the input blocks, the data created sits around and creates memory problem.

Is there guideline how to handle this? What's the best way for my actor to know it should slow down so it doesn't keep creating millions of messages? Blocking store() call seems aptable.

Thanks, Lin

Re: Spark streaming flow control and back pressure

Posted by Lin Zhao <li...@exabeam.com>.

One solution is to read the scheduling delay and my actor can go to sleep if needed. Is this possible?

From: Lin Zhao <li...@exabeam.com>>
Date: Wednesday, January 27, 2016 at 5:28 PM
To: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Spark streaming flow control and back pressure

I have an actor receiver that reads data and calls "store()" to save data to spark. I was hoping spark.streaming.receiver.maxRate and spark.streaming.backpressure would help me block the method when needed to avoid overflowing the pipeline. But it doesn't. My actor pumps millions of lines to spark when backpressure and the rate limit is in effect. Whereas these data is slow flowing into the input blocks, the data created sits around and creates memory problem.

Is there guideline how to handle this? What's the best way for my actor to know it should slow down so it doesn't keep creating millions of messages? Blocking store() call seems aptable.

Thanks, Lin

Re: Spark streaming flow control and back pressure

Posted by Lin Zhao <li...@exabeam.com>.

I'm using branch-1.6 built for 2.11 yesterday. Part of my actor receiver that stores data. The log reports millions while the job apparently back pressured according to UI (I. e. 2000 a 10s batch).

store((key, msg))
if (storeCount.incrementAndGet() % 100000 == 0) {
  logger.info(s"Stored ${storeCount.get()} messages to spark}")
}

From: Iulian Dragoș <iu...@typesafe.com>>
Date: Thursday, January 28, 2016 at 5:33 AM
To: Lin Zhao <li...@exabeam.com>>
Cc: "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: Spark streaming flow control and back pressure

Calling `store` should get you there. What version of Spark are you using? Can you share your code?

iulian

On Thu, Jan 28, 2016 at 2:28 AM, Lin Zhao <li...@exabeam.com>> wrote:
I have an actor receiver that reads data and calls "store()" to save data to spark. I was hoping spark.streaming.receiver.maxRate and spark.streaming.backpressure would help me block the method when needed to avoid overflowing the pipeline. But it doesn't. My actor pumps millions of lines to spark when backpressure and the rate limit is in effect. Whereas these data is slow flowing into the input blocks, the data created sits around and creates memory problem.

Is there guideline how to handle this? What's the best way for my actor to know it should slow down so it doesn't keep creating millions of messages? Blocking store() call seems aptable.

Thanks, Lin

--

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com<http://www.typesafe.com>

Re: Spark streaming flow control and back pressure

Posted by Iulian Dragoș <iu...@typesafe.com>.

Calling `store` should get you there. What version of Spark are you using?
Can you share your code?

iulian

On Thu, Jan 28, 2016 at 2:28 AM, Lin Zhao <li...@exabeam.com> wrote:

> I have an actor receiver that reads data and calls "store()" to save data
> to spark. I was hoping spark.streaming.receiver.maxRate and
> spark.streaming.backpressure would help me block the method when needed to
> avoid overflowing the pipeline. But it doesn't. My actor pumps millions of
> lines to spark when backpressure and the rate limit is in effect. Whereas
> these data is slow flowing into the input blocks, the data created sits
> around and creates memory problem.
>
> Is there guideline how to handle this? What's the best way for my actor to
> know it should slow down so it doesn't keep creating millions of messages?
> Blocking store() call seems aptable.
>
> Thanks, Lin
>



-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com