You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2018/06/14 21:47:00 UTC

[jira] [Moved] (YARN-8430) Some zip files passed with spark-submit --archives causing "invalid CEN header" error

     [ https://issues.apache.org/jira/browse/YARN-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin moved SPARK-24559 to YARN-8430:
----------------------------------------------

    Affects Version/s:     (was: 2.2.0)
          Component/s:     (was: Spark Submit)
             Workflow: no-reopen-closed, patch-avail  (was: no-reopen-closed)
                  Key: YARN-8430  (was: SPARK-24559)
              Project: Hadoop YARN  (was: Spark)

> Some zip files passed with spark-submit --archives causing "invalid CEN header" error
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-8430
>                 URL: https://issues.apache.org/jira/browse/YARN-8430
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: James Porritt
>            Priority: Major
>
> I'm encountering an error when submitting some zip files to spark-submit using --archive that are over 2Gb and have the zip64 flag set.
> {{PYSPARK_PYTHON=./ROOT/myspark/bin/python /usr/hdp/current/spark2-client/bin/spark-submit \}}
> {{ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./ROOT/myspark/bin/python \}}
> {{ --master=yarn \}}
> {{ --deploy-mode=cluster \}}
> {{ --driver-memory=4g \}}
> {{ --archives=myspark.zip#ROOT \}}
> {{ --num-executors=32 \}}
> {{ --packages com.databricks:spark-avro_2.11:4.0.0 \}}
> {{ foo.py}}
> (As a bit of background, I'm trying to prepare files using the trick of zipping a conda environment and passing the zip file via --archives, as per: https://community.hortonworks.com/articles/58418/running-pyspark-with-conda-env.html)
> myspark.zip is a zipped conda environment. It was created using python with the zipfile pacakge. The files are stored without deflation and with the zip64 flag set. foo.py is my application code. This normally works, but if myspark.zip is greater than 2Gb and has the zip64 flag set I get:
> java.util.zip.ZipException: invalid CEN header (bad signature)
> There seems to be much written on the subject, and I was able to write Java code that utilises the java.util.zip library that both does and doesn't encounter this error for one of the problematic zip files.
> Spark compile info:
> {{Welcome to}}
> {{ ____ __}}
> {{ / __/__ ___ _____/ /__}}
> {{ _\ \/ _ \/ _ `/ __/ '_/}}
> {{ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.4.0-91}}
> {{ /_/}}
> {{Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_112}}
> {{Branch HEAD}}
> {{Compiled by user jenkins on 2018-01-04T10:41:05Z}}
> {{Revision a24017869f5450397136ee8b11be818e7cd3facb}}
> {{Url git@github.com:hortonworks/spark2.git}}
> {{Type --help for more information.}}
> YARN logs on console after above command. I've tried both --deploy-mode=cluster and --deploy-mode=client.
> {{18/06/13 16:00:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable}}
> {{18/06/13 16:00:23 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.}}
> {{18/06/13 16:00:23 INFO RMProxy: Connecting to ResourceManager at myhost2.myfirm.com/10.87.11.17:8050}}
> {{18/06/13 16:00:23 INFO Client: Requesting a new application from cluster with 6 NodeManagers}}
> {{18/06/13 16:00:23 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (221184 MB per container)}}
> {{18/06/13 16:00:23 INFO Client: Will allocate AM container, with 18022 MB memory including 1638 MB overhead}}
> {{18/06/13 16:00:23 INFO Client: Setting up container launch context for our AM}}
> {{18/06/13 16:00:23 INFO Client: Setting up the launch environment for our AM container}}
> {{18/06/13 16:00:23 INFO Client: Preparing resources for our AM container}}
> {{18/06/13 16:00:24 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz}}
> {{18/06/13 16:00:24 INFO Client: Source and destination file systems are the same. Not copying hdfs://myhost.myfirm.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz}}
> {{18/06/13 16:00:24 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.databri}}
> {{cks_spark-avro_2.11-4.0.0.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.slf4j_slf4j-api-1.}}
> {{7.5.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.apache.avro_avro-}}
> {{1.7.6.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org}}
> {{.codehaus.jackson_jackson-core-asl-1.9.13.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/o}}
> {{rg.codehaus.jackson_jackson-mapper-asl-1.9.13.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/com.tho}}
> {{ughtworks.paranamer_paranamer-2.3.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.xerial.s}}
> {{nappy_snappy-java-1.0.5.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.ap}}
> {{ache.commons_commons-compress-1.4.1.jar}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/org.tukaani_xz-1.0.jar}}
> {{18/06/13 16:00:26 INFO Client: Source and destination file systems are the same. Not copying hdfs:/user/myuser/release/alphagenspark.zip#ROOT}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/my/script/dir/spark/alphagen/foo.py -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/foo.py}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/pyspark.zip}}
> {{18/06/13 16:00:26 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/py4j-0.10.4-src}}
> {{.zip}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/com.databricks_spark-avro_2.11-4.0.0.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.slf4j_slf4j-api-1.7.5.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.apache.avro_avro-1.7.6.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-core-asl-1.9.13.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.codehaus.jackson_jackson-mapper-asl-1.9.13.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/com.thoughtworks.paranamer_paranamer-2.3.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.xerial.snappy_snappy-java-1.0.5.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.apache.commons_commons-compress-1.4.1.jar added multiple times to distributed cache.}}{{18/06/13 16:00:26 WARN Client: Same path resource file:/home/myuser/.ivy2/jars/org.tukaani_xz-1.0.jar added multiple times to distributed cache.}}
> {{18/06/13 16:00:27 INFO Client: Uploading resource file:/tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb/__spark_conf__4405623606341803690.zip -> hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019/__spark_conf__.zip}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing view acls to: myuser}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing modify acls to: myuser}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing view acls groups to:}}
> {{18/06/13 16:00:27 INFO SecurityManager: Changing modify acls groups to:}}
> {{18/06/13 16:00:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(myuser); groups with view permissions: Set(); users with modify permissions: Set(myuser); groups with modify permissions: Set()}}
> {{18/06/13 16:00:27 INFO Client: Submitting application application_1528901858967_0019 to ResourceManager}}
> {{18/06/13 16:00:27 INFO YarnClientImpl: Submitted application application_1528901858967_0019}}
> {{18/06/13 16:00:28 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:28 INFO Client:}}
> {{ client token: N/A}}
> {{ diagnostics: AM container is launched, waiting for AM container to Register with RM}}
> {{ ApplicationMaster host: N/A}}
> {{ ApplicationMaster RPC port: -1}}
> {{ queue: default}}
> {{ start time: 1528923627110}}
> {{ final status: UNDEFINED}}
> {{ tracking URL: http://myhost2.myfirm.com:8088/proxy/application_1528901858967_0019/}}
> {{ user: myuser}}
> {{18/06/13 16:00:29 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:30 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:31 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:32 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:33 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:34 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:35 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:36 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:37 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:38 INFO Client: Application report for application_1528901858967_0019 (state: ACCEPTED)}}
> {{18/06/13 16:00:39 INFO Client: Application report for application_1528901858967_0019 (state: FAILED)}}
> {{18/06/13 16:00:39 INFO Client:}}
> {{ client token: N/A}}
> {{ diagnostics: Application application_1528901858967_0019 failed 2 times due to AM Container for appattempt_1528901858967_0019_000002 exited with exitCode: -1000}}
> {{For more detailed output, check the application tracking page: http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019 Then click on links to logs of each attempt.}}
> {{Diagnostics: java.util.zip.ZipException: invalid CEN header (bad signature)}}
> {{Failing this attempt. Failing the application.}}
> {{ ApplicationMaster host: N/A}}
> {{ ApplicationMaster RPC port: -1}}
> {{ queue: default}}
> {{ start time: 1528923627110}}
> {{ final status: FAILED}}
> {{ tracking URL: http://myhost2.myfirm.com:8088/cluster/app/application_1528901858967_0019}}
> {{ user: myuser}}
> {{18/06/13 16:00:39 INFO Client: Deleted staging directory hdfs://myhost.myfirm.com:8020/user/myuser/.sparkStaging/application_1528901858967_0019}}
> {{Exception in thread "main" org.apache.spark.SparkException: Application application_1528901858967_0019 finished with failed status}}
> {{ at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)}}
> {{ at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)}}
> {{ at org.apache.spark.deploy.yarn.Client.main(Client.scala)}}
> {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{ at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)}}
> {{ at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)}}
> {{ at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)}}
> {{18/06/13 16:00:39 INFO ShutdownHookManager: Shutdown hook called}}
> {{18/06/13 16:00:39 INFO ShutdownHookManager: Deleting directory /tmp/spark-6c26ae3b-7248-488f-bc33-9766251474bb}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org