You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Dennis Suhari <d....@icloud.com.INVALID> on 2019/07/19 06:53:54 UTC

Spark and Oozie

Dear experts,

I am using Spark for processing data from HDFS (hadoop). These Spark application are data pipelines, data wrangling and machine learning applications. Thus Spark submits its job using YARN. 
This also works well. For scheduling I am now trying to use Apache Oozie, but I am facing performqnce impacts. A Spark job which tooks 44 seconds when submitting it via CLI now takes nearly 3 Minutes.

Have you faced similar experiences in using Oozie for scheduling Spark application jobs ? What alternative workflow tools are you using for scheduling Spark jobs on Hadoop ?


Br,

Dennis

Von meinem iPhone gesendet

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org