You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "holdenk (Jira)" <ji...@apache.org> on 2019/09/23 21:27:00 UTC

[jira] [Commented] (SPARK-28517) pyspark with --conf spark.jars.packages causes duplicate jars to be uploaded

    [ https://issues.apache.org/jira/browse/SPARK-28517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936201#comment-16936201 ] 

holdenk commented on SPARK-28517:
---------------------------------

cc [~bryanc] / [~ifilonenko]

> pyspark with --conf spark.jars.packages causes duplicate jars to be uploaded
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-28517
>                 URL: https://issues.apache.org/jira/browse/SPARK-28517
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, YARN
>    Affects Versions: 2.4.3
>         Environment: spark 2.4.3_2.12 without hadoop
> yarn 2.6
> python 2.7.16
> centos 7
>            Reporter: Barry
>            Priority: Major
>              Labels: ivy, pyspark, yarn
>
> h2. Steps to reproduce:
> {{spark-submit --master yarn --conf "spark.jars.packages=org.apache.spark:spark-avro_2.12:2.4.3" ${SPARK_HOME}/examples/src/main/python/pi.py 100}}
> h2. Undesirable behavior:
> warnings are printed package jars have been added to the distributed cache multiple times
> {{19/07/25 23:25:07 WARN Client: Same path resource file:///home/barryl/.ivy2/jars/org.apache.spark_spark-avro_2.12-2.4.3.jar added multiple times to distributed cache.}}
> {{19/07/25 23:25:07 WARN Client: Same path resource file:///home/barryl/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar added multiple times to distributed cache.}}
> This does not happen for Scala jobs, only Pyspark
>  
> h2. Full output of example run.
> {{[barryl@hostname ~]$ /opt/spark2/bin/spark-submit --master yarn --conf "spark.jars.packages=org.apache.spark:spark-avro_2.12:2.4.3" /opt/spark2/examples/src/main/python/pi.py 100}}
> {{Ivy Default Cache set to: /home/barryl/.ivy2/cache}}
> {{The jars for the packages stored in: /home/barryl/.ivy2/jars}}
> {{:: loading settings :: url = jar:file:/opt/spark-2.4.3-bin-without-hadoop-scala-2.12/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml}}
> {{org.apache.spark#spark-avro_2.12 added as a dependency}}
> {{:: resolving dependencies :: org.apache.spark#spark-submit-parent-2c34ecff-b060-4af9-9b9f-83867672748c;1.0}}
> {{    confs: [default]}}
> {{    found org.apache.spark#spark-avro_2.12;2.4.3 in central}}
> {{    found org.spark-project.spark#unused;1.0.0 in central}}
> {{:: resolution report :: resolve 457ms :: artifacts dl 5ms}}
> {{    :: modules in use:}}
> {{    org.apache.spark#spark-avro_2.12;2.4.3 from central in [default]}}
> {{    org.spark-project.spark#unused;1.0.0 from central in [default]}}
> {{    ---------------------------------------------------------------------}}
> {{    |                  |            modules            ||   artifacts   |}}
> {{    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|}}
> {{    ---------------------------------------------------------------------}}
> {{    |      default     |   2   |   0   |   0   |   0   ||   2   |   0   |}}
> {{    ---------------------------------------------------------------------}}
> {{:: retrieving :: org.apache.spark#spark-submit-parent-2c34ecff-b060-4af9-9b9f-83867672748c}}
> {{    confs: [default]}}
> {{    0 artifacts copied, 2 already retrieved (0kB/7ms)}}
> {{19/07/25 23:25:03 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.}}
> {{19/07/25 23:25:07 WARN Client: Same path resource file:///home/barryl/.ivy2/jars/org.apache.spark_spark-avro_2.12-2.4.3.jar added multiple times to distributed cache.}}
> {{19/07/25 23:25:07 WARN Client: Same path resource file:///home/barryl/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar added multiple times to distributed cache.}}
> {{19/07/25 23:25:28 WARN TaskSetManager: Stage 0 contains a task of very large size (365 KB). The maximum recommended task size is 100 KB.}}
> {{Pi is roughly 3.142308}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org