You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Koji Noguchi (Jira)" <ji...@apache.org> on 2021/10/08 19:06:00 UTC

[jira] [Updated] (PIG-5413) [spark] TestStreaming.testInputCacheSpecs failing with "File script1.pl was already registered"

     [ https://issues.apache.org/jira/browse/PIG-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated PIG-5413:
------------------------------
    Attachment: pig-5413-v01.patch

This issue will be fixed when
    "PIG-5241: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java" 
is fixed since the underlying issue here is SparkLauncher.cacheFiles is creating a unique tmp file for every call preventing Spark/Hadoop layer to be able to skip the redundant paths. 

I took a quick look on PIG-5241 but couldn't figure out how Spark uses Hadoop's distributed cache especially with "#" symlinks.   For now, I'm adding another layer of hack over the existing hack to avoid registering same files more than once (when multiple jobs are submitted).


> [spark] TestStreaming.testInputCacheSpecs failing with "File script1.pl was already registered"
> -----------------------------------------------------------------------------------------------
>
>                 Key: PIG-5413
>                 URL: https://issues.apache.org/jira/browse/PIG-5413
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-5413-v01.patch
>
>
>  {noformat}
> Caused by: java.lang.IllegalArgumentException: requirement failed: File script1.pl was already registered with a different path (old path = /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache7028476439694979845/script1.pl, new path = /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache4167672945345635171/script1.pl
> at scala.Predef$.require(Predef.scala:224)
> at org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:70)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1559)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)