You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2016/06/01 08:40:59 UTC

[jira] [Commented] (PIG-4893) Task deserialization time is too long for spark on yarn mode

    [ https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309940#comment-15309940 ] 

liyunzhang_intel commented on PIG-4893:
---------------------------------------

Here summary the reason why task deserialization time is too long:
 we add all dependency jars under $PIG_HOME/lib/ and $PIG_HOME/lib/spark/ to $SPARK_JARS, spark will ship all these jars to hadoop distributed cache. Yarn container will download all these jars when deserializing a job([org.apache.spark.executor.Executor#updateDependencies|https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/core/src/main/scala/org/apache/spark/executor/Executor.scala#L424].  

After removing some big dependencies in $PIG_HOME/lib/( such as jython-standalone-2.5.3.jar,jruby-complete-1.6.7.jar and so on, we don't need these jars when running a simple pig script), the deserialization time is reduced from 12s to 4s. So do we need ship all the jars under $PIG_HOME/lib/* every time even though some jars actually are not needed? 



> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
>                 Key: PIG-4893
>                 URL: https://issues.apache.org/jira/browse/PIG-4893
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of pigmix in spark on yarn mode.  see the attachment picture.  The duration time is 3s but the task deserialization is 13s.  
> My env is hadoop2.6+spark1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)