You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Atheer Alabdullatif <a....@lean.sa> on 2021/11/24 11:06:08 UTC

[issue] not able to add external libs to pyspark job while using spark-submit

Dear Spark team,
hope my email finds you well



I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn]

issue:


import configparser
ImportError: No module named configparser
21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called

solutions I tried:

1- installing library src files and adding it to the session using [addPyFile]:

  *   files structure:

-- main dir
   -- subdir
      -- libs
         -- configparser-5.1.0
            -- src
               -- configparser.py
         -- configparser.zip
      -- sparkjob.py

1.a zip file:

    spark = SparkSession.builder.appName(jobname + '_' + table).config(
    "spark.mongodb.input.uri", uri +
    "." +
    table +
    "").config(
    "spark.mongodb.input.sampleSize",
    9900000).getOrCreate()

spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')
df = spark.read.format("mongo").load()

1.b python file

    spark = SparkSession.builder.appName(jobname + '_' + table).config(
    "spark.mongodb.input.uri", uri +
    "." +
    table +
    "").config(
    "spark.mongodb.input.sampleSize",
    9900000).getOrCreate()

spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')
df = spark.read.format("mongo").load()


2- using os library

def install_libs():
    '''
    this function used to install external python libs in yarn
    '''
    os.system("pip3 install configparser")

if __name__ == "__main__":

    # install libs
    install_libs()


we value your support

best,

Atheer Alabdullatif



*????? ?????? ?????? ?????????*
??? ??????? ????????? ???? ???????? ??????? ???? ??????? ???????? ??? ??? ????? ??? ??????? ???? ?? ????? ????????? ?? ?? ??? ????? ??????? ????? ????? ??????? ????? ?? ???? ???? ??? ??? ?????? ?????????? ???? ??????? ??  ?????? ??????????? ???? ????? ??? ????  ?? ???? ??????? ?? ??? ?? ??? ??????? ???? ???? ????? ?? ??? ????? ??? ?????? ???? ?????? ?? ????? ???? ??? ??????? ??????? ??????? ?? ?? ??????? ?? ?????? ??? ???????.

*Confidentiality & Disclaimer Notice*
This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information or otherwise protected by law. If you are not the intended recipient, please immediately notify the sender, delete the e-mail, and do not retain any copies of it. It is prohibited to use, disseminate or distribute the content of this e-mail, directly or indirectly, without prior written consent. Lean accepts no liability for damage caused by any virus that may be transmitted by this Email.

Re: [issue] not able to add external libs to pyspark job while using spark-submit

Posted by Mich Talebzadeh <mi...@gmail.com>.

I am not sure about that. However, with Kubernetes and docker image for
PySpark, I build the packages into the image itself as below in the
dockerfile

RUN pip install pyyaml numpy cx_Oracle

and that will add those packages that you can reference in your py script

import yaml
import cx_Oracle

HTH







   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 24 Nov 2021 at 17:44, Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> Can we add Python dependencies as we can do for mvn coordinates? So that
> we run sth like pip install <dep> or download from pypi index?
>
>
>
> *From:* Mich Talebzadeh <mi...@gmail.com>
> *Sent:* Mittwoch, 24. November 2021 18:28
> *Cc:* user@spark.apache.org
> *Subject:* Re: [issue] not able to add external libs to pyspark job while
> using spark-submit
>
>
>
> The easiest way to set this up is to create dependencies.zip file.
>
>
>
> Assuming that you have a virtual environment already set-up, where there
> is directory called site-packages, go to that directory and just create a
> minimal a shell script  say package_and_zip_dependencies.sh to do it for
> you
>
>
>
> Example:
>
>
>
> cat package_and_zip_dependencies.sh
>
>
>
> #!/bin/bash
>
> # https://blog.danielcorin.com/posts/2015-11-09-pyspark/
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.danielcorin.com%2Fposts%2F2015-11-09-pyspark%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018773969%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WyMHGm1PfvLfcoyUfu0mQRewFxJ6%2FSLz1Q6hCjnySnM%3D&reserved=0>
>
> zip -r ../dependencies.zip .
>
> ls -l ../dependencies.zip
>
> exit 0
>
>
>
> One created, create an environment variable called DEPENDENCIES
>
>
>
> export DEPENDENCIES="export
> DEPENDENCIES="/usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip"
>
>
>
> Then in spark-submit you can do this
>
>
>
> spark-submit --master yarn --deploy-mode client --driver-memory xG
> --executor-memory yG --num-executors m --executor-cores n --py-files
> $DEPENDENCIES --jars $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.0.jar
>
>
>
> Also check this link as well
> https://blog.danielcorin.com/posts/2015-11-09-pyspark/
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.danielcorin.com%2Fposts%2F2015-11-09-pyspark%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018783923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=OSzimB4rV0vksIgvoEdQedI47NNxi5EH6XmucYGT%2Bpo%3D&reserved=0>
>
>
>
> HTH
>
>
>
>
>
>    view my Linkedin profile
> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018783923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=drsA5Ywhxbav%2Bj2E255t4I14lS4wEXAQ5gEtsdIpbZo%3D&reserved=0>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Wed, 24 Nov 2021 at 14:03, Atheer Alabdullatif <a....@lean.sa>
> wrote:
>
> Dear Spark team,
>
> hope my email finds you well
>
>
>
>
>
> I am using pyspark 3.0 and facing an issue with adding external library
> [configparser] while running the job using [spark-submit] & [yarn]
>
> issue:
>
>
>
> import configparser
>
> ImportError: No module named configparser
>
> 21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called
>
> solutions I tried:
>
> 1- installing library src files and adding it to the session using
> [addPyFile]:
>
>    - files structure:
>
> -- main dir
>
>    -- subdir
>
>       -- libs
>
>          -- configparser-5.1.0
>
>             -- src
>
>                -- configparser.py
>
>          -- configparser.zip
>
>       -- sparkjob.py
>
> 1.a zip file:
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>
>     "spark.mongodb.input.uri", uri +
>
>     "." +
>
>     table +
>
>     "").config(
>
>     "spark.mongodb.input.sampleSize",
>
>     9900000).getOrCreate()
>
>
>
> spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')
>
> df = spark.read.format("mongo").load()
>
> 1.b python file
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>
>     "spark.mongodb.input.uri", uri +
>
>     "." +
>
>     table +
>
>     "").config(
>
>     "spark.mongodb.input.sampleSize",
>
>     9900000).getOrCreate()
>
>
>
> spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')
>
> df = spark.read.format("mongo").load()
>
>
>
> 2- using os library
>
> def install_libs():
>
>     '''
>
>     this function used to install external python libs in yarn
>
>     '''
>
>     os.system("pip3 install configparser")
>
>
>
> if __name__ == "__main__":
>
>
>
>     # install libs
>
>     install_libs()
>
>
>
> we value your support
>
> best,
>
> Atheer Alabdullatif
>
>
>
>
>
>
>
> ****إشعار السرية وإخلاء المسؤولية****
> هذه الرسالة ومرفقاتها معدة لاستخدام المُرسل إليه المقصود بالرسالة فقط وقد
> تحتوي على معلومات سرية أو محمية قانونياً، إن لم تكن الشخص المقصود فنرجو
> إخطار المُرسل فوراً عن طريق الرد على هذا البريد الإلكتروني وحذف الرسالة من
> البريد الإلكتروني، وعدم إبقاء نسخ منه،  لا يجوز استخدام أو عرض أو نشر
> المحتوى سواء بشكل مباشر أو غير مباشر دون موافقة خطية مسبقة، لا تتحمل شركة
> لين مسؤولية الأضرار الناتجة عن أي فيروسات قد تحملها هذه الرسالة.
>
>
>
> **Confidentiality & Disclaimer Notice**
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information or otherwise protected by law. If you are not the intended
> recipient, please immediately notify the sender, delete the e-mail, and do
> not retain any copies of it. It is prohibited to use, disseminate or
> distribute the content of this e-mail, directly or indirectly, without
> prior written consent. Lean accepts no liability for damage caused by any
> virus that may be transmitted by this Email.
>
>
>
>
>
>

RE: [issue] not able to add external libs to pyspark job while using spark-submit

Posted by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de>.

Can we add Python dependencies as we can do for mvn coordinates? So that we run sth like pip install <dep> or download from pypi index?

From: Mich Talebzadeh <mi...@gmail.com>
Sent: Mittwoch, 24. November 2021 18:28
Cc: user@spark.apache.org
Subject: Re: [issue] not able to add external libs to pyspark job while using spark-submit

The easiest way to set this up is to create dependencies.zip file.

Assuming that you have a virtual environment already set-up, where there is directory called site-packages, go to that directory and just create a minimal a shell script  say package_and_zip_dependencies.sh to do it for you

Example:

cat package_and_zip_dependencies.sh

#!/bin/bash
# https://blog.danielcorin.com/posts/2015-11-09-pyspark/<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.danielcorin.com%2Fposts%2F2015-11-09-pyspark%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018773969%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=WyMHGm1PfvLfcoyUfu0mQRewFxJ6%2FSLz1Q6hCjnySnM%3D&reserved=0>
zip -r ../dependencies.zip .
ls -l ../dependencies.zip
exit 0

One created, create an environment variable called DEPENDENCIES

export DEPENDENCIES="export DEPENDENCIES="/usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip"

Then in spark-submit you can do this

spark-submit --master yarn --deploy-mode client --driver-memory xG --executor-memory yG --num-executors m --executor-cores n --py-files $DEPENDENCIES --jars $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.0.jar

Also check this link as well  https://blog.danielcorin.com/posts/2015-11-09-pyspark/<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fblog.danielcorin.com%2Fposts%2F2015-11-09-pyspark%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018783923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=OSzimB4rV0vksIgvoEdQedI47NNxi5EH6XmucYGT%2Bpo%3D&reserved=0>

HTH

 [https://docs.google.com/uc?export=download&id=1-q7RFGRfLMObPuQPWSd9sl_H1UPNFaIZ&revid=0B1BiUVX33unjMWtVUWpINWFCd0ZQTlhTRHpGckh4Wlg4RG80PQ]   view my Linkedin profile<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cbdadcaa955124c44178808d9af6fcf46%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637733717018783923%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=drsA5Ywhxbav%2Bj2E255t4I14lS4wEXAQ5gEtsdIpbZo%3D&reserved=0>

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

On Wed, 24 Nov 2021 at 14:03, Atheer Alabdullatif <a....@lean.sa>> wrote:
Dear Spark team,
hope my email finds you well

I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn]

issue:

import configparser

ImportError: No module named configparser

21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called

solutions I tried:

1- installing library src files and adding it to the session using [addPyFile]:

  *   files structure:

-- main dir

   -- subdir

      -- libs

         -- configparser-5.1.0

            -- src

               -- configparser.py

         -- configparser.zip

      -- sparkjob.py

1.a zip file:

    spark = SparkSession.builder.appName(jobname + '_' + table).config(

    "spark.mongodb.input.uri", uri +

    "." +

    table +

    "").config(

    "spark.mongodb.input.sampleSize",

    9900000).getOrCreate()

spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')

df = spark.read.format("mongo").load()

1.b python file

    spark = SparkSession.builder.appName(jobname + '_' + table).config(

    "spark.mongodb.input.uri", uri +

    "." +

    table +

    "").config(

    "spark.mongodb.input.sampleSize",

    9900000).getOrCreate()

spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')

df = spark.read.format("mongo").load()

2- using os library

def install_libs():

    '''

    this function used to install external python libs in yarn

    '''

    os.system("pip3 install configparser")

if __name__ == "__main__":

    # install libs

    install_libs()

we value your support

best,

Atheer Alabdullatif

*إشعار السرية وإخلاء المسؤولية*
هذه الرسالة ومرفقاتها معدة لاستخدام المُرسل إليه المقصود بالرسالة فقط وقد تحتوي على معلومات سرية أو محمية قانونياً، إن لم تكن الشخص المقصود فنرجو إخطار المُرسل فوراً عن طريق الرد على هذا البريد الإلكتروني وحذف الرسالة من  البريد الإلكتروني، وعدم إبقاء نسخ منه،  لا يجوز استخدام أو عرض أو نشر المحتوى سواء بشكل مباشر أو غير مباشر دون موافقة خطية مسبقة، لا تتحمل شركة لين مسؤولية الأضرار الناتجة عن أي فيروسات قد تحملها هذه الرسالة.

*Confidentiality & Disclaimer Notice*
This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information or otherwise protected by law. If you are not the intended recipient, please immediately notify the sender, delete the e-mail, and do not retain any copies of it. It is prohibited to use, disseminate or distribute the content of this e-mail, directly or indirectly, without prior written consent. Lean accepts no liability for damage caused by any virus that may be transmitted by this Email.

Re: [issue] not able to add external libs to pyspark job while using spark-submit

Posted by Mich Talebzadeh <mi...@gmail.com>.

The easiest way to set this up is to create dependencies.zip file.

Assuming that you have a virtual environment already set-up, where there is
directory called site-packages, go to that directory and just create a
minimal a shell script  say package_and_zip_dependencies.sh to do it for you

Example:

cat package_and_zip_dependencies.sh

#!/bin/bash
# https://blog.danielcorin.com/posts/2015-11-09-pyspark/
zip -r ../dependencies.zip .
ls -l ../dependencies.zip
exit 0

One created, create an environment variable called DEPENDENCIES

export DEPENDENCIES="export
DEPENDENCIES="/usr/src/Python-3.7.3/airflow_virtualenv/lib/python3.7/dependencies.zip"

Then in spark-submit you can do this

spark-submit --master yarn --deploy-mode client --driver-memory xG
--executor-memory yG --num-executors m --executor-cores n --py-files
$DEPENDENCIES --jars $HOME/jars/spark-sql-kafka-0-10_2.12-3.1.0.jar

Also check this link as well
https://blog.danielcorin.com/posts/2015-11-09-pyspark/

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 24 Nov 2021 at 14:03, Atheer Alabdullatif <a....@lean.sa>
wrote:

> Dear Spark team,
> hope my email finds you well
>
>
> I am using pyspark 3.0 and facing an issue with adding external library
> [configparser] while running the job using [spark-submit] & [yarn]
>
> issue:
>
>
> import configparser
> ImportError: No module named configparser21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called
>
> solutions I tried:
>
> 1- installing library src files and adding it to the session using
> [addPyFile]:
>
>
>    - files structure:
>
> -- main dir
>    -- subdir
>       -- libs
>          -- configparser-5.1.0
>             -- src
>                -- configparser.py
>          -- configparser.zip
>       -- sparkjob.py
>
> 1.a zip file:
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>     "spark.mongodb.input.uri", uri +
>     "." +
>     table +
>     "").config(
>     "spark.mongodb.input.sampleSize",
>     9900000).getOrCreate()
>
> spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')
> df = spark.read.format("mongo").load()
>
> 1.b python file
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>     "spark.mongodb.input.uri", uri +
>     "." +
>     table +
>     "").config(
>     "spark.mongodb.input.sampleSize",
>     9900000).getOrCreate()
>
> spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')
> df = spark.read.format("mongo").load()
>
>
> 2- using os library
>
> def install_libs():
>     '''
>     this function used to install external python libs in yarn
>     '''
>     os.system("pip3 install configparser")
> if __name__ == "__main__":
>
>     # install libs
>     install_libs()
>
>
> we value your support
>
> best,
>
> Atheer Alabdullatif
>
>
>
>
>
>
> ****إشعار السرية وإخلاء المسؤولية****
> هذه الرسالة ومرفقاتها معدة لاستخدام المُرسل إليه المقصود بالرسالة فقط وقد
> تحتوي على معلومات سرية أو محمية قانونياً، إن لم تكن الشخص المقصود فنرجو
> إخطار المُرسل فوراً عن طريق الرد على هذا البريد الإلكتروني وحذف الرسالة من
> البريد الإلكتروني، وعدم إبقاء نسخ منه،  لا يجوز استخدام أو عرض أو نشر
> المحتوى سواء بشكل مباشر أو غير مباشر دون موافقة خطية مسبقة، لا تتحمل شركة
> لين مسؤولية الأضرار الناتجة عن أي فيروسات قد تحملها هذه الرسالة.
>
>
>
> **Confidentiality & Disclaimer Notice**
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information or otherwise protected by law. If you are not the intended
> recipient, please immediately notify the sender, delete the e-mail, and do
> not retain any copies of it. It is prohibited to use, disseminate or
> distribute the content of this e-mail, directly or indirectly, without
> prior written consent. Lean accepts no liability for damage caused by any
> virus that may be transmitted by this Email.
>
>
>
>
>

Re: [issue] not able to add external libs to pyspark job while using spark-submit

Posted by Atheer Alabdullatif <a....@lean.sa>.

Hello Owen,
Thank you for your prompt reply!
We will check it out.

best,
Atheer Alabdullatif
________________________________
From: Sean Owen <sr...@gmail.com>
Sent: Wednesday, November 24, 2021 5:06 PM
To: Atheer Alabdullatif <a....@lean.sa>
Cc: user@spark.apache.org <us...@spark.apache.org>; Data Engineering <Da...@lean.sa>
Subject: Re: [issue] not able to add external libs to pyspark job while using spark-submit

You don't often get email from srowen@gmail.com. Learn why this is important<http://aka.ms/LearnAboutSenderIdentification>
External Sender: be CAUTION , Particularly with links and attachments.
That's not how you add a library. From the docs: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html

On Wed, Nov 24, 2021 at 8:02 AM Atheer Alabdullatif <a....@lean.sa>> wrote:
Dear Spark team,
hope my email finds you well

I am using pyspark 3.0 and facing an issue with adding external library [configparser] while running the job using [spark-submit] & [yarn]

issue:

import configparser
ImportError: No module named configparser
21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called

solutions I tried:

1- installing library src files and adding it to the session using [addPyFile]:

  *   files structure:

-- main dir
   -- subdir
      -- libs
         -- configparser-5.1.0
            -- src
               -- configparser.py
         -- configparser.zip
      -- sparkjob.py

1.a zip file:

    spark = SparkSession.builder.appName(jobname + '_' + table).config(
    "spark.mongodb.input.uri", uri +
    "." +
    table +
    "").config(
    "spark.mongodb.input.sampleSize",
    9900000).getOrCreate()

spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')
df = spark.read.format("mongo").load()

1.b python file

    spark = SparkSession.builder.appName(jobname + '_' + table).config(
    "spark.mongodb.input.uri", uri +
    "." +
    table +
    "").config(
    "spark.mongodb.input.sampleSize",
    9900000).getOrCreate()

spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')
df = spark.read.format("mongo").load()

2- using os library

def install_libs():
    '''
    this function used to install external python libs in yarn
    '''
    os.system("pip3 install configparser")

if __name__ == "__main__":

    # install libs
    install_libs()

we value your support

best,

Atheer Alabdullatif

*إشعار السرية وإخلاء المسؤولية*
هذه الرسالة ومرفقاتها معدة لاستخدام المُرسل إليه المقصود بالرسالة فقط وقد تحتوي على معلومات سرية أو محمية قانونياً، إن لم تكن الشخص المقصود فنرجو إخطار المُرسل فوراً عن طريق الرد على هذا البريد الإلكتروني وحذف الرسالة من  البريد الإلكتروني، وعدم إبقاء نسخ منه،  لا يجوز استخدام أو عرض أو نشر المحتوى سواء بشكل مباشر أو غير مباشر دون موافقة خطية مسبقة، لا تتحمل شركة لين مسؤولية الأضرار الناتجة عن أي فيروسات قد تحملها هذه الرسالة.

*Confidentiality & Disclaimer Notice*
This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information or otherwise protected by law. If you are not the intended recipient, please immediately notify the sender, delete the e-mail, and do not retain any copies of it. It is prohibited to use, disseminate or distribute the content of this e-mail, directly or indirectly, without prior written consent. Lean accepts no liability for damage caused by any virus that may be transmitted by this Email.

Re: [issue] not able to add external libs to pyspark job while using spark-submit

Posted by Sean Owen <sr...@gmail.com>.

That's not how you add a library. From the docs:
https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html

On Wed, Nov 24, 2021 at 8:02 AM Atheer Alabdullatif <a....@lean.sa>
wrote:

> Dear Spark team,
> hope my email finds you well
>
>
> I am using pyspark 3.0 and facing an issue with adding external library
> [configparser] while running the job using [spark-submit] & [yarn]
>
> issue:
>
>
> import configparser
> ImportError: No module named configparser21/11/24 08:54:38 INFO util.ShutdownHookManager: Shutdown hook called
>
> solutions I tried:
>
> 1- installing library src files and adding it to the session using
> [addPyFile]:
>
>
>    - files structure:
>
> -- main dir
>    -- subdir
>       -- libs
>          -- configparser-5.1.0
>             -- src
>                -- configparser.py
>          -- configparser.zip
>       -- sparkjob.py
>
> 1.a zip file:
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>     "spark.mongodb.input.uri", uri +
>     "." +
>     table +
>     "").config(
>     "spark.mongodb.input.sampleSize",
>     9900000).getOrCreate()
>
> spark.sparkContext.addPyFile('/maindir/subdir/libs/configparser.zip')
> df = spark.read.format("mongo").load()
>
> 1.b python file
>
>     spark = SparkSession.builder.appName(jobname + '_' + table).config(
>     "spark.mongodb.input.uri", uri +
>     "." +
>     table +
>     "").config(
>     "spark.mongodb.input.sampleSize",
>     9900000).getOrCreate()
>
> spark.sparkContext.addPyFile('maindir/subdir/libs/configparser-5.1.0/src/configparser.py')
> df = spark.read.format("mongo").load()
>
>
> 2- using os library
>
> def install_libs():
>     '''
>     this function used to install external python libs in yarn
>     '''
>     os.system("pip3 install configparser")
> if __name__ == "__main__":
>
>     # install libs
>     install_libs()
>
>
> we value your support
>
> best,
>
> Atheer Alabdullatif
>
>
>
>
>
>
> ****إشعار السرية وإخلاء المسؤولية****
> هذه الرسالة ومرفقاتها معدة لاستخدام المُرسل إليه المقصود بالرسالة فقط وقد
> تحتوي على معلومات سرية أو محمية قانونياً، إن لم تكن الشخص المقصود فنرجو
> إخطار المُرسل فوراً عن طريق الرد على هذا البريد الإلكتروني وحذف الرسالة من
> البريد الإلكتروني، وعدم إبقاء نسخ منه،  لا يجوز استخدام أو عرض أو نشر
> المحتوى سواء بشكل مباشر أو غير مباشر دون موافقة خطية مسبقة، لا تتحمل شركة
> لين مسؤولية الأضرار الناتجة عن أي فيروسات قد تحملها هذه الرسالة.
>
>
>
> **Confidentiality & Disclaimer Notice**
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information or otherwise protected by law. If you are not the intended
> recipient, please immediately notify the sender, delete the e-mail, and do
> not retain any copies of it. It is prohibited to use, disseminate or
> distribute the content of this e-mail, directly or indirectly, without
> prior written consent. Lean accepts no liability for damage caused by any
> virus that may be transmitted by this Email.
>
>
>
>
>