You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by aravasai <ar...@gmail.com> on 2017/01/29 23:44:29 UTC

Maximum limit for akka.frame.size be greater than 500 MB ?

I have a spark job running on 2 terabytes of data which creates more than
30,000 partitions. As a result, the spark job fails with the error 
"Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize
52428800 bytes" (For 1 TB data)
However, when I increase the akka.frame.size to around 500 MB, the job hangs
with no further progress.

So, what is the ideal or maximum limit that i can assign akka.frame.size so
that I do not get the error of map output statuses exceeding limit for large
chunks of data ?

Is coalescing the data into smaller number of partitions the only solution
to this problem? Is there any better way than coalescing many intermediate
rdd's in program ?

My driver memory: 10G
Executor memory: 36G 
Executor memory overhead : 3G







--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Maximum limit for akka.frame.size be greater than 500 MB ?

Posted by aravasai <ar...@gmail.com>.

Currently, I am using 1.6.1 version. I continue to use it as my current
code is heavily reliant on RDD's and not dataframes. Also, because 1.6.1 is
stabler than newer versions.


The input data is user behavior data of 20 fields and 1 billion records (~
1.5 TB) . I am trying to group by user id and calculate some users
statistics. But, I guess the number of mapper tasks are too high resulting
in akka.frame.size error.

1) Does akka.frame.size has to be proportionately increased with size of
data which indirectly affects the number of partitions?
2) Or do the  huge number of mappers in the code (It may not be prevented)
result in the frame size error?

On Sun, Jan 29, 2017 at 11:15 PM, Jörn Franke [via Apache Spark Developers
List] <ml...@n3.nabble.com> wrote:

> Which Spark version are you using? What are you trying to do exactly and
> what is the input data? As far as I know, akka has been dropped in recent
> Spark versions.
>
> > On 30 Jan 2017, at 00:44, aravasai <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=0>> wrote:
> >
> > I have a spark job running on 2 terabytes of data which creates more
> than
> > 30,000 partitions. As a result, the spark job fails with the error
> > "Map output statuses were 170415722 bytes which exceeds
> spark.akka.frameSize
> > 52428800 bytes" (For 1 TB data)
> > However, when I increase the akka.frame.size to around 500 MB, the job
> hangs
> > with no further progress.
> >
> > So, what is the ideal or maximum limit that i can assign akka.frame.size
> so
> > that I do not get the error of map output statuses exceeding limit for
> large
> > chunks of data ?
> >
> > Is coalescing the data into smaller number of partitions the only
> solution
> > to this problem? Is there any better way than coalescing many
> intermediate
> > rdd's in program ?
> >
> > My driver memory: 10G
> > Executor memory: 36G
> > Executor memory overhead : 3G
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Maximum-limit-for-
> akka-frame-size-be-greater-than-500-MB-tp20793.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=1>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=20796&i=2>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-developers-list.1001551.n3.
> nabble.com/Maximum-limit-for-akka-frame-size-be-greater-
> than-500-MB-tp20793p20796.html
> To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1h32@n3.nabble.com
> To unsubscribe from Maximum limit for akka.frame.size be greater than 500
> MB ?, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=20793&code=YXJhdmFzYWlAZ21haWwuY29tfDIwNzkzfDEzNDU1NjkyNTk=>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793p20797.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Maximum limit for akka.frame.size be greater than 500 MB ?

Posted by Jörn Franke <jo...@gmail.com>.

Which Spark version are you using? What are you trying to do exactly and what is the input data? As far as I know, akka has been dropped in recent Spark versions.

> On 30 Jan 2017, at 00:44, aravasai <ar...@gmail.com> wrote:
> 
> I have a spark job running on 2 terabytes of data which creates more than
> 30,000 partitions. As a result, the spark job fails with the error 
> "Map output statuses were 170415722 bytes which exceeds spark.akka.frameSize
> 52428800 bytes" (For 1 TB data)
> However, when I increase the akka.frame.size to around 500 MB, the job hangs
> with no further progress.
> 
> So, what is the ideal or maximum limit that i can assign akka.frame.size so
> that I do not get the error of map output statuses exceeding limit for large
> chunks of data ?
> 
> Is coalescing the data into smaller number of partitions the only solution
> to this problem? Is there any better way than coalescing many intermediate
> rdd's in program ?
> 
> My driver memory: 10G
> Executor memory: 36G 
> Executor memory overhead : 3G
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Maximum-limit-for-akka-frame-size-be-greater-than-500-MB-tp20793.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org