You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Julien CHAMP <jc...@tellmeplus.com> on 2017/07/26 08:33:40 UTC

Cache problem ? Too much Stages ? Need help :(

Hi there,

In a spark job I have the following problem :
I have a val xyz  = List[List[MyObject]]

And I'm doing something like a map on xyz and then to reduce which union
everything....
It seems to work with small datasets... but with too many data I'm
experiencing a strange problem.

As you can see on spark ui, I'm doing several times the same kind of
operation ( job 4 to 23 ) and it just union the resulting dataframes.

But starting at the job 15 it starts taking more and more time but I really
don't know why !
Is there too much stages ?
Too much skipped tasks ?

Do you have any idea on how to resolve this ? Because my job currently
doesn't scale at all...

Thx



[image: spark.png]
-- 


Julien CHAMP — Data Scientist


*Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email :
**jchamp@tellmeplus.com
<jc...@tellmeplus.com>*

*Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
<https://www.linkedin.com/in/julienchamp>

TellMePlus S.A — Predictive Objects

*Paris* : 7 rue des Pommerots, 78400 Chatou
*Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière

-- 

Ce message peut contenir des informations confidentielles ou couvertes par 
le secret professionnel, à l’intention de son destinataire. Si vous n’en 
êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer 
toute copie.
This email may contain confidential and/or privileged information for the 
intended recipient. If you are not the intended recipient, please contact 
the sender and delete all copies.


-- 
 <http://www.tellmeplus.com/assets/emailing/banner.html>

Re: Cache problem ? Too much Stages ? Need help :(

Posted by Xiayun Sun <xi...@gmail.com>.
It's a bit difficult to tell given these information..

Can you go into each job's detail page (click at the job row, "show at
<consul>:xx"), and compare things like DAG execution plan; shuffle
read/write?

Or if you don't mind can you give more detail about your code? A snippet
would be pretty helpful.

To me it looks like sth is being accumulated over jobs...

On 26 July 2017 at 15:33, Julien CHAMP <jc...@tellmeplus.com> wrote:

> Hi there,
>
> In a spark job I have the following problem :
> I have a val xyz  = List[List[MyObject]]
>
> And I'm doing something like a map on xyz and then to reduce which union
> everything....
> It seems to work with small datasets... but with too many data I'm
> experiencing a strange problem.
>
> As you can see on spark ui, I'm doing several times the same kind of
> operation ( job 4 to 23 ) and it just union the resulting dataframes.
>
> But starting at the job 15 it starts taking more and more time but I
> really don't know why !
> Is there too much stages ?
> Too much skipped tasks ?
>
> Do you have any idea on how to resolve this ? Because my job currently
> doesn't scale at all...
>
> Thx
>
>
>
> [image: spark.png]
> --
>
>
> Julien CHAMP — Data Scientist
>
>
> *Web : **www.tellmeplus.com* <http://tellmeplus.com/> — *Email : **jchamp@tellmeplus.com
> <jc...@tellmeplus.com>*
>
> *Phone ** : **06 89 35 01 89 <0689350189> * — *LinkedIn* :  *here*
> <https://www.linkedin.com/in/julienchamp>
>
> TellMePlus S.A — Predictive Objects
>
> *Paris* : 7 rue des Pommerots, 78400 Chatou
> *Montpellier* : 51 impasse des églantiers, 34980 St Clément de Rivière
>
>
> Ce message peut contenir des informations confidentielles ou couvertes par
> le secret professionnel, à l’intention de son destinataire. Si vous n’en
> êtes pas le destinataire, merci de contacter l’expéditeur et d’en supprimer
> toute copie.
> This email may contain confidential and/or privileged information for the
> intended recipient. If you are not the intended recipient, please contact
> the sender and delete all copies.
>
>
> <http://www.tellmeplus.com/assets/emailing/banner.html>




-- 
Xiayun Sun

Home is behind, the world ahead,
and there are many paths to tread
through shadows to the edge of night,
until the stars are all alight.