You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2021/12/25 16:24:42 UTC

Re: Time to start publishing Spark Docker Images?

Season's greetings to all.

A while back we discussed publishing docker images, mainly for Kubernetes.

Increasing number of people are using Spark on Kubernetes.

Following our previous discussions, what matters is the tag, which is the
detailed identifier of the image used. These images are normally loaded to
the container/artefact registories in Cloud

For example with SPARK_VERSION, SCALA_VERSION, DOCKERIMAGETAG, BASE_OS and
the used DOCKERFILE


export PROJECT_ID=$(gcloud info --format='value(config.project)')
export GCP_CR=eu.gcr.io/${PROJECT_ID}

BASE_OS="buster"
SPARK_VERSION="3.1.1"
SCALA_VERSION="scala_2.12"
DOCKERFILE="java8PlusPackages"
DOCKERIMAGETAG="8-jre-slim"
cd $SPARK_HOME

# Building Docker image from provided Dockerfile base 11
cd $SPARK_HOME
/opt/spark/bin/docker-image-tool.sh \
              -r $GCP_CR \
              -t
${SPARK_VERSION}-${SCALA_VERSION}-${DOCKERIMAGETAG}-${BASE_OS}-${DOCKERFILE}
\
              -b java_image_tag=${DOCKERIMAGETAG} \
              -p
./kubernetes/dockerfiles/spark/bindings/python/${DOCKERFILE} \
               build

This results in a docker image created with a tag

IMAGEDRIVER="eu.gcr.io/
<PROJECT_ID>/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-java8PlusPackages"

and

--conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \
 --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \

The question is do we need anything else in the tag itself or enough info
is provided?

Cheers


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 21 Aug 2021 at 15:50, Ankit Gupta <in...@gmail.com> wrote:

> Hey All
>
> Just a suggestion, or maybe a future enhancement, we should also try and
> use different base OSs like buster, alpine, slim, stretch, etc. and add
> that in the tag as well. This will help the users to choose the images
> according to their requirements.
>
> Thanks and Regards.
>
> Ankit Prakash Gupta
> info.ankitp@gmail.com
> LinkedIn : https://www.linkedin.com/in/infoankitp/
> Medium: https://medium.com/@info.ankitp
>
>
> On Thu, Aug 19, 2021 at 4:13 AM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> We have both base images now
>>
>> REPOSITORY       TAG                                      IMAGE ID
>>  CREATED              SIZE
>> openjdk          8-jre-slim                               0d0a85fdf642
>>  40 hours ago         187MB
>> openjdk          11-jre-slim                              eb77da2ec13c
>>  3 weeks ago          221MB
>>
>>
>> Only java version differences:
>>
>> For 11-jre-slim we have:
>>
>> ARG java_image_tag=*11-jre-slim*
>>
>> FROM openjdk:${java_image_tag}
>>
>>
>> And for 8-jre-slim
>>
>>
>> ARG java_image_tag=*8-jre-slim*
>>
>> FROM openjdk:${java_image_tag}
>>
>>
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Wed, 18 Aug 2021 at 21:27, Holden Karau <ho...@pigscanfly.ca> wrote:
>>
>>> So the default image we use right now for the 3.2 line is 11-jre-slim,
>>> in 3.0 we used 8-jre-slim, I think these are ok bases for us to build from
>>> unless someone has a good reason otherwise?
>>>
>>> On Wed, Aug 18, 2021 at 2:10 AM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> A rather related point
>>>>
>>>> The docker image comes with the following java
>>>>
>>>> root@73a798cc3303:/opt/spark/work-dir# java -version
>>>> openjdk version "11.0.12" 2021-07-20
>>>> OpenJDK Runtime Environment 18.9 (build 11.0.12+7)
>>>> OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7, mixed mode, sharing)
>>>>
>>>> For Java 8 I believe debian buster does not support Java 8,. This will
>>>> be added to the docker image.
>>>>
>>>> Any particular java 8 we should go for.
>>>>
>>>> For now I am using jdk1.8.0_201 which is Oracle Java. Current debian
>>>> versions built in GCP use
>>>>
>>>> openjdk version "1.8.0_292"
>>>>
>>>> Shall we choose and adopt one java 8 version for docker images? This
>>>> will be in addition to java 11 already installed with base
>>>>
>>>> HTH
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 17 Aug 2021 at 23:52, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Well, we need to decide what packages need to be installed with
>>>>> spark-py. PySpark is not one of them. true
>>>>>
>>>>> The docker build itself takes care of PySpark by copying them from the
>>>>> $SPARK_HOME directory
>>>>>
>>>>> COPY python/pyspark ${SPARK_HOME}/python/pyspark
>>>>> COPY python/lib ${SPARK_HOME}/python/lib
>>>>>
>>>>> Please review the docker file for python in $SPARK_HOME/kubernetes/dockerfiles/spark/bindings/python/Dockerfile
>>>>> and make changes needed.
>>>>>
>>>>> ARG base_img
>>>>> FROM $base_img
>>>>> WORKDIR /
>>>>> # Reset to root to run installation tasks
>>>>> USER 0
>>>>> RUN mkdir ${SPARK_HOME}/python
>>>>> RUN apt-get update && \
>>>>>     apt install -y python3 python3-pip && \
>>>>>     pip3 install --upgrade pip setuptools && \
>>>>>     # Removed the .cache to save space
>>>>>     rm -r /root/.cache && rm -rf /var/cache/apt/*
>>>>>
>>>>> COPY python/pyspark ${SPARK_HOME}/python/pyspark
>>>>> COPY python/lib ${SPARK_HOME}/python/lib
>>>>>
>>>>> WORKDIR /opt/spark/work-dir
>>>>> ENTRYPOINT [ "/opt/entrypoint.sh" ]
>>>>>
>>>>> # Specify the User that the actual main process will run as
>>>>> ARG spark_uid=185
>>>>> USER ${spark_uid}
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 17 Aug 2021 at 23:26, Holden Karau <ho...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> pip installing pyspark like that probably isn't a great idea since
>>>>>> there isn't a version tagged to it. Probably better to install from the
>>>>>> local files copied in than potentially from pypi. Might be able to install
>>>>>> in -e mode where it'll do symlinks to save space I'm not sure.
>>>>>>
>>>>>> On Tue, Aug 17, 2021 at 3:12 PM Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Andrew, that was helpful.
>>>>>>>
>>>>>>> Step 10/23 : RUN pip install pyyaml numpy cx_Oracle pyspark
>>>>>>> --no-cache-dir
>>>>>>>
>>>>>>> And the reduction in size is considerable, 1.75GB vs 2.19GB . Note
>>>>>>> that the original run has now been invalidated
>>>>>>>
>>>>>>> REPOSITORY       TAG                                      IMAGE ID
>>>>>>>      CREATED                  SIZE
>>>>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8
>>>>>>>  ecef8bd15731   Less than a second ago   1.75GB
>>>>>>> <none>           <none>
>>>>>>>  ba3c17bc9337   10 hours ago             2.19GB
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 17 Aug 2021 at 20:44, Andrew Melo <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Mich,
>>>>>>>>
>>>>>>>> By default, pip caches downloaded binaries to somewhere like
>>>>>>>> $HOME/.cache/pip. So after doing any "pip install", you'll want to either
>>>>>>>> delete that directory, or pass the "--no-cache-dir" option to pip to
>>>>>>>> prevent the download binaries from being added to the image.
>>>>>>>>
>>>>>>>> HTH
>>>>>>>> Andrew
>>>>>>>>
>>>>>>>> On Tue, Aug 17, 2021 at 2:29 PM Mich Talebzadeh <
>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Andrew,
>>>>>>>>>
>>>>>>>>> Can you please elaborate on blowing pip cache before committing
>>>>>>>>> the layer?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Much
>>>>>>>>>
>>>>>>>>> On Tue, 17 Aug 2021 at 16:57, Andrew Melo <an...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Silly Q, did you blow away the pip cache before committing the
>>>>>>>>>> layer? That always trips me up.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 17, 2021 at 10:56 Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> With no additional python packages etc we get 1.4GB compared to
>>>>>>>>>>> 2.19GB before
>>>>>>>>>>>
>>>>>>>>>>> REPOSITORY       TAG                                      IMAGE
>>>>>>>>>>> ID       CREATED                  SIZE
>>>>>>>>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8only
>>>>>>>>>>>  faee4dbb95dd   Less than a second ago   1.41GB
>>>>>>>>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8
>>>>>>>>>>>  ba3c17bc9337   4 hours ago              2.19GB
>>>>>>>>>>>
>>>>>>>>>>> root@233a81199b43:/opt/spark/work-dir# pip list
>>>>>>>>>>> Package       Version
>>>>>>>>>>> ------------- -------
>>>>>>>>>>> asn1crypto    0.24.0
>>>>>>>>>>> cryptography  2.6.1
>>>>>>>>>>> entrypoints   0.3
>>>>>>>>>>> keyring       17.1.1
>>>>>>>>>>> keyrings.alt  3.1.1
>>>>>>>>>>> pip           21.2.4
>>>>>>>>>>> pycrypto      2.6.1
>>>>>>>>>>> PyGObject     3.30.4
>>>>>>>>>>> pyxdg         0.25
>>>>>>>>>>> SecretStorage 2.3.1
>>>>>>>>>>> setuptools    57.4.0
>>>>>>>>>>> six           1.12.0
>>>>>>>>>>> wheel         0.32.3
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 17 Aug 2021 at 16:24, Mich Talebzadeh <
>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes, I will double check. it includes java 8 in addition to
>>>>>>>>>>>> base java 11.
>>>>>>>>>>>>
>>>>>>>>>>>> in addition it has these Python packages for now (added for my
>>>>>>>>>>>> own needs for now)
>>>>>>>>>>>>
>>>>>>>>>>>> root@ce6773017a14:/opt/spark/work-dir# pip list
>>>>>>>>>>>> Package       Version
>>>>>>>>>>>> ------------- -------
>>>>>>>>>>>> asn1crypto    0.24.0
>>>>>>>>>>>> cryptography  2.6.1
>>>>>>>>>>>> cx-Oracle     8.2.1
>>>>>>>>>>>> entrypoints   0.3
>>>>>>>>>>>> keyring       17.1.1
>>>>>>>>>>>> keyrings.alt  3.1.1
>>>>>>>>>>>> numpy         1.21.2
>>>>>>>>>>>> pip           21.2.4
>>>>>>>>>>>> py4j          0.10.9
>>>>>>>>>>>> pycrypto      2.6.1
>>>>>>>>>>>> PyGObject     3.30.4
>>>>>>>>>>>> pyspark       3.1.2
>>>>>>>>>>>> pyxdg         0.25
>>>>>>>>>>>> PyYAML        5.4.1
>>>>>>>>>>>> SecretStorage 2.3.1
>>>>>>>>>>>> setuptools    57.4.0
>>>>>>>>>>>> six           1.12.0
>>>>>>>>>>>> wheel         0.32.3
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> HTH
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 17 Aug 2021 at 16:17, Maciej <ms...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Quick question ‒ is this actual output? If so, do we know what
>>>>>>>>>>>>> accounts 1.5GB overhead for PySpark image. Even without
>>>>>>>>>>>>> --no-install-recommends this seems like a lot (if I recall
>>>>>>>>>>>>> correctly it was around 400MB for existing images).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 8/17/21 2:24 PM, Mich Talebzadeh wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Examples:
>>>>>>>>>>>>>
>>>>>>>>>>>>> *docker images*
>>>>>>>>>>>>>
>>>>>>>>>>>>> REPOSITORY       TAG                                  IMAGE
>>>>>>>>>>>>> ID       CREATED          SIZE
>>>>>>>>>>>>>
>>>>>>>>>>>>> spark/spark-py   3.1.1_sparkpy_3.7-scala_2.12-java8
>>>>>>>>>>>>>  ba3c17bc9337   2 minutes ago    2.19GB
>>>>>>>>>>>>>
>>>>>>>>>>>>> spark            3.1.1-scala_2.12-java11
>>>>>>>>>>>>> 4595c4e78879   18 minutes ago   635MB
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 17 Aug 2021 at 10:31, Mich Talebzadeh <
>>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>>>>>>>>>>> Yes let us go with that and remember that we can change the
>>>>>>>>>>>>>> tags anytime. The accompanying release note should detail what is inside
>>>>>>>>>>>>>> the image downloaded.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 for me
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 17 Aug 2021 at 09:51, Maciej <ms...@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 8/17/21 4:04 AM, Holden Karau wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> These are some really good points all around.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think, in the interest of simplicity, well start with just
>>>>>>>>>>>>>>> the 3 current Dockerfiles in the Spark repo but for the next release (3.3)
>>>>>>>>>>>>>>> we should explore adding some more Dockerfiles/build options.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sounds good.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, I'd consider adding guest lang version to the tag
>>>>>>>>>>>>>>> names, i.e.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3.1.2_sparkpy_3.7-scala_2.12-java11
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3.1.2_sparkR_3.6-scala_2.12-java11
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and some basics safeguards in the layers, to make sure that
>>>>>>>>>>>>>>> these are really the versions we use.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 16, 2021 at 10:46 AM Maciej <
>>>>>>>>>>>>>>> mszymkiewicz@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have a few concerns regarding PySpark and SparkR images.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> First of all, how do we plan to handle interpreter
>>>>>>>>>>>>>>>> versions? Ideally, we should provide images for all supported variants, but
>>>>>>>>>>>>>>>> based on the preceding discussion and the proposed naming convention, I
>>>>>>>>>>>>>>>> assume it is not going to happen. If that's the case, it would be great if
>>>>>>>>>>>>>>>> we could fix interpreter versions based on some support criteria (lowest
>>>>>>>>>>>>>>>> supported, lowest non-deprecated, highest supported at the time of release,
>>>>>>>>>>>>>>>> etc.)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Currently, we use the following:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - for R use buster-cran35 Debian repositories which
>>>>>>>>>>>>>>>>    install R 3.6 (provided version already changed in the past and broke image
>>>>>>>>>>>>>>>>    build ‒ SPARK-28606).
>>>>>>>>>>>>>>>>    - for Python we depend on the system provided python3
>>>>>>>>>>>>>>>>    packages, which currently provides Python 3.7.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> which don't guarantee stability over time and might be hard
>>>>>>>>>>>>>>>> to synchronize with our support matrix.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Secondly, omitting libraries which are required for the
>>>>>>>>>>>>>>>> full functionality and performance, specifically
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Numpy, Pandas and Arrow for PySpark
>>>>>>>>>>>>>>>>    - Arrow for SparkR
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is likely to severely limit usability of the images (out of
>>>>>>>>>>>>>>>> these, Arrow is probably the hardest to manage, especially when you already
>>>>>>>>>>>>>>>> depend on system packages to provide R or Python interpreter).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 8/14/21 12:43 AM, Mich Talebzadeh wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We can cater for multiple types (spark, spark-py and
>>>>>>>>>>>>>>>> spark-r) and spark versions (assuming they are downloaded and available).
>>>>>>>>>>>>>>>> The challenge is that these docker images built are
>>>>>>>>>>>>>>>> snapshots. They cannot be amended later and if you change anything by going
>>>>>>>>>>>>>>>> inside docker, as soon as you are logged out whatever you did is reversed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For example, I want to add tensorflow to my docker image.
>>>>>>>>>>>>>>>> These are my images
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> REPOSITORY                                TAG
>>>>>>>>>>>>>>>>  IMAGE ID       CREATED         SIZE
>>>>>>>>>>>>>>>> eu.gcr.io/axial-glow-224522/spark-py      java8_3.1.1
>>>>>>>>>>>>>>>>  cfbb0e69f204   5 days ago      2.37GB
>>>>>>>>>>>>>>>> eu.gcr.io/axial-glow-224522/spark         3.1.1
>>>>>>>>>>>>>>>>  8d1bf8e7e47d   5 days ago      805MB
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> using image ID I try to log in as root to the image
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *docker run -u0 -it cfbb0e69f204 bash*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip install keras
>>>>>>>>>>>>>>>> Collecting keras
>>>>>>>>>>>>>>>>   Downloading keras-2.6.0-py2.py3-none-any.whl (1.3 MB)
>>>>>>>>>>>>>>>>      |████████████████████████████████| 1.3 MB 1.1 MB/s
>>>>>>>>>>>>>>>> Installing collected packages: keras
>>>>>>>>>>>>>>>> Successfully installed keras-2.6.0
>>>>>>>>>>>>>>>> WARNING: Running pip as the 'root' user can result in
>>>>>>>>>>>>>>>> broken permissions and conflicting behaviour with the system package
>>>>>>>>>>>>>>>> manager. It is recommended to use a virtual environment instead:
>>>>>>>>>>>>>>>> https://pip.pypa.io/warnings/venv
>>>>>>>>>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# pip list
>>>>>>>>>>>>>>>> Package       Version
>>>>>>>>>>>>>>>> ------------- -------
>>>>>>>>>>>>>>>> asn1crypto    0.24.0
>>>>>>>>>>>>>>>> cryptography  2.6.1
>>>>>>>>>>>>>>>> cx-Oracle     8.2.1
>>>>>>>>>>>>>>>> entrypoints   0.3
>>>>>>>>>>>>>>>> *keras         2.6.0      <--- it is here*
>>>>>>>>>>>>>>>> keyring       17.1.1
>>>>>>>>>>>>>>>> keyrings.alt  3.1.1
>>>>>>>>>>>>>>>> numpy         1.21.1
>>>>>>>>>>>>>>>> pip           21.2.3
>>>>>>>>>>>>>>>> py4j          0.10.9
>>>>>>>>>>>>>>>> pycrypto      2.6.1
>>>>>>>>>>>>>>>> PyGObject     3.30.4
>>>>>>>>>>>>>>>> pyspark       3.1.2
>>>>>>>>>>>>>>>> pyxdg         0.25
>>>>>>>>>>>>>>>> PyYAML        5.4.1
>>>>>>>>>>>>>>>> SecretStorage 2.3.1
>>>>>>>>>>>>>>>> setuptools    57.4.0
>>>>>>>>>>>>>>>> six           1.12.0
>>>>>>>>>>>>>>>> wheel         0.32.3
>>>>>>>>>>>>>>>> root@b542b0f1483d:/opt/spark/work-dir# exit
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now I exited from the image and try to log in again
>>>>>>>>>>>>>>>> (pyspark_venv) hduser@rhes76: /home/hduser/dba/bin/build>
>>>>>>>>>>>>>>>> docker run -u0 -it cfbb0e69f204 bash
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> root@5231ee95aa83:/opt/spark/work-dir# pip list
>>>>>>>>>>>>>>>> Package       Version
>>>>>>>>>>>>>>>> ------------- -------
>>>>>>>>>>>>>>>> asn1crypto    0.24.0
>>>>>>>>>>>>>>>> cryptography  2.6.1
>>>>>>>>>>>>>>>> cx-Oracle     8.2.1
>>>>>>>>>>>>>>>> entrypoints   0.3
>>>>>>>>>>>>>>>> keyring       17.1.1
>>>>>>>>>>>>>>>> keyrings.alt  3.1.1
>>>>>>>>>>>>>>>> numpy         1.21.1
>>>>>>>>>>>>>>>> pip           21.2.3
>>>>>>>>>>>>>>>> py4j          0.10.9
>>>>>>>>>>>>>>>> pycrypto      2.6.1
>>>>>>>>>>>>>>>> PyGObject     3.30.4
>>>>>>>>>>>>>>>> pyspark       3.1.2
>>>>>>>>>>>>>>>> pyxdg         0.25
>>>>>>>>>>>>>>>> PyYAML        5.4.1
>>>>>>>>>>>>>>>> SecretStorage 2.3.1
>>>>>>>>>>>>>>>> setuptools    57.4.0
>>>>>>>>>>>>>>>> six           1.12.0
>>>>>>>>>>>>>>>> wheel         0.32.3
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Hm that keras is not there*. The docker Image cannot be
>>>>>>>>>>>>>>>> altered after build! So once the docker image is created that is just a
>>>>>>>>>>>>>>>> snapshot. However, it will still have tons of useful stuff for most
>>>>>>>>>>>>>>>> users/organisations. My suggestions is to create for a given type (spark,
>>>>>>>>>>>>>>>> spark-py etc):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    1. One vanilla flavour for everyday use with few useful
>>>>>>>>>>>>>>>>    packages
>>>>>>>>>>>>>>>>    2. One for medium use with most common packages for
>>>>>>>>>>>>>>>>    ETL/ELT stuff
>>>>>>>>>>>>>>>>    3. One specialist for ML etc with keras, tensorflow and
>>>>>>>>>>>>>>>>    anything else needed
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> These images should be maintained as we currently maintain
>>>>>>>>>>>>>>>> spark releases with accompanying documentation. Any reason why we cannot
>>>>>>>>>>>>>>>> maintain ourselves?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, 13 Aug 2021 at 17:26, Holden Karau <
>>>>>>>>>>>>>>>> holden@pigscanfly.ca> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So we actually do have a script that does the build
>>>>>>>>>>>>>>>>> already it's more a matter of publishing the results for easier use.
>>>>>>>>>>>>>>>>> Currently the script produces three images spark, spark-py, and spark-r. I
>>>>>>>>>>>>>>>>> can certainly see a solid reason to publish like with a jdk11 & jdk8 suffix
>>>>>>>>>>>>>>>>> as well if there is interest in the community. If we want to have a say
>>>>>>>>>>>>>>>>> spark-py-pandas for a Spark container image with everything necessary for
>>>>>>>>>>>>>>>>> the Koalas stuff to work then I think that could be a great PR from someone
>>>>>>>>>>>>>>>>> to add :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Aug 13, 2021 at 1:00 AM Mich Talebzadeh <
>>>>>>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> should read PySpark
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, 13 Aug 2021 at 08:51, Mich Talebzadeh <
>>>>>>>>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have already built a few latest for Spark and PYSpark
>>>>>>>>>>>>>>>>>>> on 3.1.1 with Java 8 as I found out Java 11 does not work with Google
>>>>>>>>>>>>>>>>>>> BigQuery data warehouse. However, to hack the Dockerfile one finds out the
>>>>>>>>>>>>>>>>>>> hard way.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> For example how to add additional Python libraries like
>>>>>>>>>>>>>>>>>>> tensorflow etc. Loading these libraries through Kubernetes is not practical
>>>>>>>>>>>>>>>>>>> as unzipping and installing it through --py-files etc will
>>>>>>>>>>>>>>>>>>> take considerable time so they need to be added to the dockerfile at the
>>>>>>>>>>>>>>>>>>> built time in directory for Python under Kubernetes
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> /opt/spark/kubernetes/dockerfiles/spark/bindings/python
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> RUN pip install pyyaml numpy cx_Oracle tensorflow ....
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also you will need curl to test the ports from inside
>>>>>>>>>>>>>>>>>>> the docker
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> RUN apt-get update && apt-get install -y curl
>>>>>>>>>>>>>>>>>>> RUN ["apt-get","install","-y","vim"]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As I said I am happy to build these specific dockerfiles
>>>>>>>>>>>>>>>>>>> plus the complete documentation for it. I have already built one for Google
>>>>>>>>>>>>>>>>>>> (GCP). The difference between Spark and PySpark version is that in
>>>>>>>>>>>>>>>>>>> Spark/scala a fat jar file will contain all needed. That is not the case
>>>>>>>>>>>>>>>>>>> with Python I am afraid.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, 13 Aug 2021 at 08:13, Bode, Meikel, NMA-CFD <
>>>>>>>>>>>>>>>>>>> Meikel.Bode@bertelsmann.de> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am Meikel Bode and only an interested reader of dev
>>>>>>>>>>>>>>>>>>>> and user list. Anyway, I would appreciate to have official docker images
>>>>>>>>>>>>>>>>>>>> available.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Maybe one could get inspiration from the Jupyter docker
>>>>>>>>>>>>>>>>>>>> stacks and provide an hierarchy of different images like this:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#image-relationships
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Having a core image only supporting Java, an extended
>>>>>>>>>>>>>>>>>>>> supporting Python and/or R etc.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Looking forward to the discussion.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Meikel
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *From:* Mich Talebzadeh <mi...@gmail.com>
>>>>>>>>>>>>>>>>>>>> *Sent:* Freitag, 13. August 2021 08:45
>>>>>>>>>>>>>>>>>>>> *Cc:* dev <de...@spark.apache.org>
>>>>>>>>>>>>>>>>>>>> *Subject:* Re: Time to start publishing Spark Docker
>>>>>>>>>>>>>>>>>>>> Images?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I concur this is a good idea and certainly worth
>>>>>>>>>>>>>>>>>>>> exploring.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In practice, preparing docker images as deployable will
>>>>>>>>>>>>>>>>>>>> throw some challenges because creating docker for Spark  is not really a
>>>>>>>>>>>>>>>>>>>> singular modular unit, say  creating docker for Jenkins. It involves
>>>>>>>>>>>>>>>>>>>> different versions and different images for Spark and PySpark and most
>>>>>>>>>>>>>>>>>>>> likely will end up as part of Kubernetes deployment.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Individuals and organisations will deploy it as the
>>>>>>>>>>>>>>>>>>>> first cut. Great but I equally feel that good documentation on how to build
>>>>>>>>>>>>>>>>>>>> a consumable deployable image will be more valuable.  FRom my own
>>>>>>>>>>>>>>>>>>>> experience the current documentation should be enhanced, for example how to
>>>>>>>>>>>>>>>>>>>> deploy working directories, additional Python packages, build with
>>>>>>>>>>>>>>>>>>>> different Java versions  (version 8 or version 11) etc.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmich-talebzadeh-ph-d-5205b2%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=0CkL3HZo9FNVUOnLQ4CYs29Z9HfrwE4xDqLgVmMbr10%3D&reserved=0>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>>>>>>>>>>>> responsibility for any loss, damage or destruction of data or any other
>>>>>>>>>>>>>>>>>>>> property which may arise from relying on this email's technical content is
>>>>>>>>>>>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>>>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, 13 Aug 2021 at 01:54, Holden Karau <
>>>>>>>>>>>>>>>>>>>> holden@pigscanfly.ca> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Awesome, I've filed an INFRA ticket to get the ball
>>>>>>>>>>>>>>>>>>>> rolling.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:48 PM John Zhuge <
>>>>>>>>>>>>>>>>>>>> jzhuge@apache.org> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Aug 12, 2021 at 5:44 PM Hyukjin Kwon <
>>>>>>>>>>>>>>>>>>>> gurwls223@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +1, I think we generally agreed upon having it. Thanks
>>>>>>>>>>>>>>>>>>>> Holden for headsup and driving this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +@Dongjoon Hyun <do...@apache.org> FYI
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2021년 7월 22일 (목) 오후 12:22, Kent Yao <yaooqinn@gmail.com
>>>>>>>>>>>>>>>>>>>> >님이 작성:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Bests,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Kent Yao*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> @ Data Science Center, Hangzhou Research Institute,
>>>>>>>>>>>>>>>>>>>> NetEase Corp.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *a spark* *enthusiast*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *kyuubi
>>>>>>>>>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fkyuubi&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790679755%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZkE%2BAK4%2BUO9JsDzZlAfY5gsATCVm5hidLCp7EGxAWiY%3D&reserved=0>**is
>>>>>>>>>>>>>>>>>>>> a unified* *multi-tenant* *JDBC interface for
>>>>>>>>>>>>>>>>>>>> large-scale data processing and analytics,* *built on
>>>>>>>>>>>>>>>>>>>> top of* *Apache Spark
>>>>>>>>>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4YYZ61B6datdx2GsxqnEUOpYuJUn35egYRQSVnUxtF0%3D&reserved=0>*
>>>>>>>>>>>>>>>>>>>> *.*
>>>>>>>>>>>>>>>>>>>> *spark-authorizer
>>>>>>>>>>>>>>>>>>>> <https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fyaooqinn%2Fspark-authorizer&data=04%7C01%7CMeikel.Bode%40bertelsmann.de%7Cd97d97be540246aa975308d95e260c99%7C1ca8bd943c974fc68955bad266b43f0b%7C0%7C0%7C637644339790689711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P6TMaSh7UeXVyv79RiRqdBpipaIjh2o3DhRs0GGhWF4%3D&reserved=0>**A
>>>>>>>>>>>>>>>>>>>> Spark SQL extension which provides SQL Standard Authorization for*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>> It's dark in this basement.
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other property which may
>>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>