You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (Jira)" <ji...@apache.org> on 2021/10/08 19:06:00 UTC
[jira] [Updated] (PIG-5413) [spark]
TestStreaming.testInputCacheSpecs failing with "File script1.pl was already
registered"
[ https://issues.apache.org/jira/browse/PIG-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Koji Noguchi updated PIG-5413:
------------------------------
Attachment: pig-5413-v01.patch
This issue will be fixed when
"PIG-5241: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java"
is fixed since the underlying issue here is SparkLauncher.cacheFiles is creating a unique tmp file for every call preventing Spark/Hadoop layer to be able to skip the redundant paths.
I took a quick look on PIG-5241 but couldn't figure out how Spark uses Hadoop's distributed cache especially with "#" symlinks. For now, I'm adding another layer of hack over the existing hack to avoid registering same files more than once (when multiple jobs are submitted).
> [spark] TestStreaming.testInputCacheSpecs failing with "File script1.pl was already registered"
> -----------------------------------------------------------------------------------------------
>
> Key: PIG-5413
> URL: https://issues.apache.org/jira/browse/PIG-5413
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Minor
> Attachments: pig-5413-v01.patch
>
>
> {noformat}
> Caused by: java.lang.IllegalArgumentException: requirement failed: File script1.pl was already registered with a different path (old path = /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache7028476439694979845/script1.pl, new path = /tmp/yarn-local/usercache/knoguchi/appcache/application_1628754354801_523406/container_e07_1628754354801_523406_01_000061/tmp/pig_junit_tmp1798933174/cache4167672945345635171/script1.pl
> at scala.Predef$.require(Predef.scala:224)
> at org.apache.spark.rpc.netty.NettyStreamManager.addFile(NettyStreamManager.scala:70)
> at org.apache.spark.SparkContext.addFile(SparkContext.scala:1559)
> ...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)