You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@amaterasu.apache.org by Arun Manivannan <ar...@arunma.com> on 2018/04/21 15:45:51 UTC

YARN deployment

Hi Yaniv and Eyal,

Sorry about the hiatus.  Day job has been hectic the last couple of months.

I am really glad that we now have full blown YARN support. Thanks a lot !!

Is there a place where I could find a rough document around how to submit
jobs on YARN. If you could respond to this thread, I am more than happy to
contribute to the docs.

I would like to do a POC of sorts for one of my projects at work. A really
dumbed-down version of the application is at :

https://github.com/arunma/ama_datapopulator
https://github.com/arunma/ama_reconciler

The first Spark job populates the data in a bunch of Hive tables
The second Spark job runs pre-configured queries against these tables and
compares them against another data in another Hive table (reconciliation
table).


For now, we can safely assume that there's no data shared between these
dataframes.

Greatly appreciate your response on the YARN job submission.

Cheers,
Arun

Re: YARN deployment

Posted by Yaniv Rodenski <ya...@shinto.io>.

Hi Arun,

once we release v0.2.0-incubating, our next milestone will be releasing the
documentation, hopefully both will be out soon.
In order to run Amaterasu on YARN I suggest the following:

   1. Download the binaries from
   https://dist.apache.org/repos/dist/dev/incubator/amaterasu/0.2.0rc2/
   2. Extract the tarball on a node in your YARN cluster where Spark is
   installed
   3. Configure the following
   in apache-amaterasu-0.2.0-incubating-rc2/amaterasu.properties:
      1. *zk:* should point to your zookeeper (comma delimited in case of
      an ensemble)
      2. *mode*=yarn (this is the default value)
      3. *spark.home*: should point to your spark2 home
      4. *yarn.hadoop.home.dir:*  should point to your HADOOP_HOME
   4. submit your job using ama-start-yarn.sh

This should be all in this release, but let us know how it goes.

Cheers,
Yaniv

On Sun, Apr 22, 2018 at 1:45 AM, Arun Manivannan <ar...@arunma.com> wrote:

> Hi Yaniv and Eyal,
>
> Sorry about the hiatus.  Day job has been hectic the last couple of months.
>
> I am really glad that we now have full blown YARN support. Thanks a lot !!
>
> Is there a place where I could find a rough document around how to submit
> jobs on YARN. If you could respond to this thread, I am more than happy to
> contribute to the docs.
>
> I would like to do a POC of sorts for one of my projects at work. A really
> dumbed-down version of the application is at :
>
> https://github.com/arunma/ama_datapopulator
> https://github.com/arunma/ama_reconciler
>
> The first Spark job populates the data in a bunch of Hive tables
> The second Spark job runs pre-configured queries against these tables and
> compares them against another data in another Hive table (reconciliation
> table).
>
>
> For now, we can safely assume that there's no data shared between these
> dataframes.
>
> Greatly appreciate your response on the YARN job submission.
>
> Cheers,
> Arun
>



-- 
Yaniv Rodenski

+61 477 778 405
yaniv@shinto.io