You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by Jay Sen <ja...@apache.org> on 2019/02/28 02:16:15 UTC

Gobblin on Yarn ?

Hi,

anybody running Gobblin on yarn mode in production or even in dev
environment ? can u share pls the experience?

looking for some data points on how it would be beneficial over standalone.

Thanks
Jay

Re: Gobblin on Yarn ?

Posted by Jay Sen <ja...@apache.org>.
Hi Sudarshan

MR mode, will have dependency on hadoop cluster, I am thinking to have
independent gobblin cluster for all the data movement jobs.
also I have tried Hive-Distcp
<https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/> on
cluster mode and managed to run it. ( there are lot of configs are missing
that i was only able to figure out from the code base).

Is there any difference for MR vs Cluster mode in terms of performance or
feature set?

btw, Regarding GOBBLIN-714, I have lost the log, but this couldnt very edge
case, but for GOBBLIN-711
<https://issues.apache.org/jira/browse/GOBBLIN-711> I have captured all the
logs.

Thanks
Jay




On Tue, Apr 2, 2019 at 9:20 PM Sudarshan Vasudevan <su...@linkedin.com>
wrote:

> Hi Jay,
> For your immediate use case, will the MR mode work? If that is the case,
> you can take a look at Hive Distcp:
> https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/
>
> For GOBBLIN-714, can you attach any relevant stacktraces that you see in
> the cluster logs that indicate the failure of the jobs? It is interesting
> that the Job execution state for most of the jobs is shown as COMMITTED as
> opposed to SUCCESSFUL.
>
> Thanks,
> Sudarshan
>
>
> ------------------------------
> *From:* Jay Sen <ja...@apache.org>
> *Sent:* Tuesday, April 2, 2019 8:02 PM
> *To:* Sudarshan Vasudevan; dev@gobblin.incubator.apache.org
> *Subject:* Re: Gobblin on Yarn ?
>
> Thanks Sudarshan for sharing the info.
>
> I started playing around gobblin cluster ( master/worker) mode and came
> across some weird issues, ( GOBBLIN-714
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-714&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=D4n7%2Fu2pZ6a95dwZ0d8%2Fc8ht%2BrbQjQND%2BPpfu%2FM5OdA%3D&reserved=0>
>  & GOBBLIN-711
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-711&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=r8nF3zNWl5D4it5GS0lLk0bWlMDjr%2FZYHWbgyMchyQI%3D&reserved=0>
>  ).
>
> I assume the standalone mode is limited to single node ( may be multi
> process ), so I really need cluster environment capable for tolerating node
> failures, etc...
>
> the immediate use-case i am looking at us hive to hive with overall 10TB a
> day.
>
> Pls let me know ur thoughts.
>
> Thanks
> Jay
>
> On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <
> suvasudevan@linkedin.com> wrote:
>
> Hi Jay,
> We run both Gobblin Cluster and Gobblin Standalone in production, which
> are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in
> production.
>
> There is some recent interest to revive Gobblin-on-Yarn for a few internal
> use cases. We will hopefully have something to share on that front. So stay
> tuned!
>
> If you share more details about your use case (e.g. details about the
> source/sink, volume of data to be moved), that will help us point you in
> the right direction.
>
> Best,
> Sudarshan
> ------------------------------
> *From:* Jay Sen <ja...@apache.org>
> *Sent:* Sunday, March 31, 2019 7:07 PM
> *To:* dev@gobblin.incubator.apache.org
> *Subject:* Re: Gobblin on Yarn ?
>
> Hi All,
>
> What would be the most stable mode in gobblin to run on production ?
> cluster ( master + worker ) or standalone or any other ?
>
> what is the mode you are running on prod ? can u guys pls share ?
>
> Thanks
> Jay
>
>
> On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <ja...@apache.org> wrote:
>
> > Hi,
> >
> > anybody running Gobblin on yarn mode in production or even in dev
> > environment ? can u share pls the experience?
> >
> > looking for some data points on how it would be beneficial over
> standalone.
> >
> > Thanks
> > Jay
> >
>
>

Re: Gobblin on Yarn ?

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Jay,
For your immediate use case, will the MR mode work? If that is the case, you can take a look at Hive Distcp:
https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/

For GOBBLIN-714, can you attach any relevant stacktraces that you see in the cluster logs that indicate the failure of the jobs? It is interesting that the Job execution state for most of the jobs is shown as COMMITTED as opposed to SUCCESSFUL.

Thanks,
Sudarshan


________________________________
From: Jay Sen <ja...@apache.org>
Sent: Tuesday, April 2, 2019 8:02 PM
To: Sudarshan Vasudevan; dev@gobblin.incubator.apache.org
Subject: Re: Gobblin on Yarn ?

Thanks Sudarshan for sharing the info.

I started playing around gobblin cluster ( master/worker) mode and came across some weird issues, ( GOBBLIN-714<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-714&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=D4n7%2Fu2pZ6a95dwZ0d8%2Fc8ht%2BrbQjQND%2BPpfu%2FM5OdA%3D&reserved=0> & GOBBLIN-711<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FGOBBLIN-711&data=02%7C01%7Csuvasudevan%40linkedin.com%7C74cc6467fa994b99451808d6b7e0e273%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636898573835526464&sdata=r8nF3zNWl5D4it5GS0lLk0bWlMDjr%2FZYHWbgyMchyQI%3D&reserved=0> ).

I assume the standalone mode is limited to single node ( may be multi process ), so I really need cluster environment capable for tolerating node failures, etc...

the immediate use-case i am looking at us hive to hive with overall 10TB a day.

Pls let me know ur thoughts.

Thanks
Jay

On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <su...@linkedin.com>> wrote:
Hi Jay,
We run both Gobblin Cluster and Gobblin Standalone in production, which are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in production.

There is some recent interest to revive Gobblin-on-Yarn for a few internal use cases. We will hopefully have something to share on that front. So stay tuned!

If you share more details about your use case (e.g. details about the source/sink, volume of data to be moved), that will help us point you in the right direction.

Best,
Sudarshan
________________________________
From: Jay Sen <ja...@apache.org>>
Sent: Sunday, March 31, 2019 7:07 PM
To: dev@gobblin.incubator.apache.org<ma...@gobblin.incubator.apache.org>
Subject: Re: Gobblin on Yarn ?

Hi All,

What would be the most stable mode in gobblin to run on production ?
cluster ( master + worker ) or standalone or any other ?

what is the mode you are running on prod ? can u guys pls share ?

Thanks
Jay


On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <ja...@apache.org>> wrote:

> Hi,
>
> anybody running Gobblin on yarn mode in production or even in dev
> environment ? can u share pls the experience?
>
> looking for some data points on how it would be beneficial over standalone.
>
> Thanks
> Jay
>

Re: Gobblin on Yarn ?

Posted by Jay Sen <ja...@apache.org>.
Thanks Sudarshan for sharing the info.

I started playing around gobblin cluster ( master/worker) mode and came
across some weird issues, ( GOBBLIN-714
<https://issues.apache.org/jira/browse/GOBBLIN-714> & GOBBLIN-711
<https://issues.apache.org/jira/browse/GOBBLIN-711> ).

I assume the standalone mode is limited to single node ( may be multi
process ), so I really need cluster environment capable for tolerating node
failures, etc...

the immediate use-case i am looking at us hive to hive with overall 10TB a
day.

Pls let me know ur thoughts.

Thanks
Jay

On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <
suvasudevan@linkedin.com> wrote:

> Hi Jay,
> We run both Gobblin Cluster and Gobblin Standalone in production, which
> are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in
> production.
>
> There is some recent interest to revive Gobblin-on-Yarn for a few internal
> use cases. We will hopefully have something to share on that front. So stay
> tuned!
>
> If you share more details about your use case (e.g. details about the
> source/sink, volume of data to be moved), that will help us point you in
> the right direction.
>
> Best,
> Sudarshan
> ------------------------------
> *From:* Jay Sen <ja...@apache.org>
> *Sent:* Sunday, March 31, 2019 7:07 PM
> *To:* dev@gobblin.incubator.apache.org
> *Subject:* Re: Gobblin on Yarn ?
>
> Hi All,
>
> What would be the most stable mode in gobblin to run on production ?
> cluster ( master + worker ) or standalone or any other ?
>
> what is the mode you are running on prod ? can u guys pls share ?
>
> Thanks
> Jay
>
>
> On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <ja...@apache.org> wrote:
>
> > Hi,
> >
> > anybody running Gobblin on yarn mode in production or even in dev
> > environment ? can u share pls the experience?
> >
> > looking for some data points on how it would be beneficial over
> standalone.
> >
> > Thanks
> > Jay
> >
>

Re: Gobblin on Yarn ?

Posted by Sudarshan Vasudevan <su...@linkedin.com>.
Hi Jay,
We run both Gobblin Cluster and Gobblin Standalone in production, which are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in production.

There is some recent interest to revive Gobblin-on-Yarn for a few internal use cases. We will hopefully have something to share on that front. So stay tuned!

If you share more details about your use case (e.g. details about the source/sink, volume of data to be moved), that will help us point you in the right direction.

Best,
Sudarshan
________________________________
From: Jay Sen <ja...@apache.org>
Sent: Sunday, March 31, 2019 7:07 PM
To: dev@gobblin.incubator.apache.org
Subject: Re: Gobblin on Yarn ?

Hi All,

What would be the most stable mode in gobblin to run on production ?
cluster ( master + worker ) or standalone or any other ?

what is the mode you are running on prod ? can u guys pls share ?

Thanks
Jay


On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <ja...@apache.org> wrote:

> Hi,
>
> anybody running Gobblin on yarn mode in production or even in dev
> environment ? can u share pls the experience?
>
> looking for some data points on how it would be beneficial over standalone.
>
> Thanks
> Jay
>

Re: Gobblin on Yarn ?

Posted by Jay Sen <ja...@apache.org>.
Hi All,

What would be the most stable mode in gobblin to run on production ?
cluster ( master + worker ) or standalone or any other ?

what is the mode you are running on prod ? can u guys pls share ?

Thanks
Jay


On Wed, Feb 27, 2019 at 6:16 PM Jay Sen <ja...@apache.org> wrote:

> Hi,
>
> anybody running Gobblin on yarn mode in production or even in dev
> environment ? can u share pls the experience?
>
> looking for some data points on how it would be beneficial over standalone.
>
> Thanks
> Jay
>