You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2023/01/16 15:05:22 UTC

Running Google Dataproc on Google Kubernetes Engine (GKE) with Spark

This seems to be a halfway point for those Google Dataproc
<https://codelabs.developers.google.com/dataproc-cluster-gce#0> users
(basically those migrated from Spark on Yarn on Hadoop on-premise to the
equivalent on a Google Cloud  managed cluster as a service) to have the
look and feel of using Google Kubernetes Engine (GKE) for submitting
Dataproc jobs. This is a good solution for most legacy applications that
rely on monolithic solutions.to take advantage of containers. I was
wondering if there is experience here. Unlike Spark on GKE, where we build
the docker image ourselves, with Dataproc on GKE, you need to specify a
custom container image to use with Dataproc on GKE . Your custom container
image must use one of the Dataproc on GKE base Spark images
<https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-custom-images#base_spark_images>
.


Still it works fine starting with base Spark image and building your own
bespoke image. I tested this and would like to share experience with those
interested. Please find attached a diagram of how I think this works


[image: gk3.png]

I am not sure other Cloud vendors have provided this feature as well.


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Running Google Dataproc on Google Kubernetes Engine (GKE) with Spark

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

I have now published an article on this topic. Hopefully, those interested
may find it useful.

https://www.linkedin.com/pulse/running-google-dataproc-kubernetes-engine-gke-spark-mich



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 16 Jan 2023 at 15:05, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
> This seems to be a halfway point for those Google Dataproc
> <https://codelabs.developers.google.com/dataproc-cluster-gce#0> users
> (basically those migrated from Spark on Yarn on Hadoop on-premise to the
> equivalent on a Google Cloud  managed cluster as a service) to have the
> look and feel of using Google Kubernetes Engine (GKE) for submitting
> Dataproc jobs. This is a good solution for most legacy applications that
> rely on monolithic solutions.to take advantage of containers. I was
> wondering if there is experience here. Unlike Spark on GKE, where we build
> the docker image ourselves, with Dataproc on GKE, you need to specify a
> custom container image to use with Dataproc on GKE . Your custom container
> image must use one of the Dataproc on GKE base Spark images
> <https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-custom-images#base_spark_images>
> .
>
>
> Still it works fine starting with base Spark image and building your own
> bespoke image. I tested this and would like to share experience with those
> interested. Please find attached a diagram of how I think this works
>
>
> [image: gk3.png]
>
> I am not sure other Cloud vendors have provided this feature as well.
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Running Google Dataproc on Google Kubernetes Engine (GKE) with Spark

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

I have now published an article on this topic. Hopefully, those interested
may find it useful.

https://www.linkedin.com/pulse/running-google-dataproc-kubernetes-engine-gke-spark-mich



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 16 Jan 2023 at 15:05, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
> This seems to be a halfway point for those Google Dataproc
> <https://codelabs.developers.google.com/dataproc-cluster-gce#0> users
> (basically those migrated from Spark on Yarn on Hadoop on-premise to the
> equivalent on a Google Cloud  managed cluster as a service) to have the
> look and feel of using Google Kubernetes Engine (GKE) for submitting
> Dataproc jobs. This is a good solution for most legacy applications that
> rely on monolithic solutions.to take advantage of containers. I was
> wondering if there is experience here. Unlike Spark on GKE, where we build
> the docker image ourselves, with Dataproc on GKE, you need to specify a
> custom container image to use with Dataproc on GKE . Your custom container
> image must use one of the Dataproc on GKE base Spark images
> <https://cloud.google.com/dataproc/docs/guides/dpgke/dataproc-gke-custom-images#base_spark_images>
> .
>
>
> Still it works fine starting with base Spark image and building your own
> bespoke image. I tested this and would like to share experience with those
> interested. Please find attached a diagram of how I think this works
>
>
> [image: gk3.png]
>
> I am not sure other Cloud vendors have provided this feature as well.
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>