You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jun Rao <ju...@confluent.io> on 2015/02/03 06:46:42 UTC

Re: Kafka ETL Camus Question

You can probably ask the Camus mailing list.

Thanks,

Jun

On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Kafka Team or Linked-In  Team,
>
> I would like to know if you guys run Camus ETL job with speculative
> execution true or false.  Does it make sense to set this to false ? Having
> true, it creates additional load on brokers for each map task (create a map
> task to pull same partition twice).  Is there any advantage to this having
> it on vs off ?
>
> mapred.map.tasks.speculative.execution
>
> Thanks,
>
> Bhavesh
>

Re: Kafka ETL Camus Question

Posted by Bhavesh Mistry <mi...@gmail.com>.
Hi All ,

Thanks for input I think I got enough information and also
https://groups.google.com/forum/#!topic/camus_etl/1FcpqCnC5M4 gave me more
info about the this.


Thank you all for entertaining my question.  I am in luck on both form :)

Thanks,

Bhavesh


On Tue, Feb 3, 2015 at 12:56 PM, Joel Koshy <jj...@gmail.com> wrote:

> There was some confusion here - turns out that they do turn it on. I added
> Tu
> to this thread and his response:
>
> <quote>
> We have speculative set to true by default.  With these settings, we are
> seeing about 5-7% of the tasks have speculative tasks launched, other 90%
> finished within the standard deviations difference and thus speculation
> tasks were never launched.  This will ensure if we have a slow datanode,
> our job would not be impacted.
>
> Camus is setup to consume 10 minutes worth of offset/topic/run. If a topic
> has more than 10 minutes of offset to be consumed, speculative will also
> be active for that topic.  We haven't play much with this setting.
> However, if we ever get into a situation where we have to do catchup, it's
> good to have this setting disabled.
>
> mapreduce.job.speculative.slownodethreshold     1.0
> mapreduce.job.speculative.speculativecap        0.1
>
> mapreduce.map.speculative       true
> </quote>
>
> On Tue, Feb 03, 2015 at 05:14:02PM +0000, Aditya Auradkar wrote:
> > Hi Bhavesh,
> >
> > I just checked with one of the devs on the Camus team. We run the Camus
> job with speculative execution disabled.
> >
> > Aditya
> >
> > ________________________________________
> > From: Pradeep Gollakota [pradeepg26@gmail.com]
> > Sent: Monday, February 02, 2015 11:15 PM
> > To: users@kafka.apache.org
> > Subject: Re: Kafka ETL Camus Question
> >
> > Hi Bhavesh,
> >
> > At Lithium, we don't run Camus in our pipelines yet, though we plan to.
> But
> > I just wanted to comment regarding speculative execution. We have it
> > disabled at the cluster level and typically don't need it for most of our
> > jobs. Especially with something like Camus, I don't see any need to run
> > parallel copies of the same task.
> >
> > On Mon, Feb 2, 2015 at 10:36 PM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for info.  I did not get answer  to my question there so I
> thought I
> > > try my luck here :)
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> > > On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao <ju...@confluent.io> wrote:
> > >
> > > > You can probably ask the Camus mailing list.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <
> > > > mistry.p.bhavesh@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Kafka Team or Linked-In  Team,
> > > > >
> > > > > I would like to know if you guys run Camus ETL job with speculative
> > > > > execution true or false.  Does it make sense to set this to false ?
> > > > Having
> > > > > true, it creates additional load on brokers for each map task
> (create a
> > > > map
> > > > > task to pull same partition twice).  Is there any advantage to this
> > > > having
> > > > > it on vs off ?
> > > > >
> > > > > mapred.map.tasks.speculative.execution
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Bhavesh
> > > > >
> > > >
> > >
>
>

Re: Kafka ETL Camus Question

Posted by Joel Koshy <jj...@gmail.com>.
There was some confusion here - turns out that they do turn it on. I added Tu
to this thread and his response:

<quote>
We have speculative set to true by default.  With these settings, we are
seeing about 5-7% of the tasks have speculative tasks launched, other 90%
finished within the standard deviations difference and thus speculation
tasks were never launched.  This will ensure if we have a slow datanode,
our job would not be impacted.

Camus is setup to consume 10 minutes worth of offset/topic/run. If a topic
has more than 10 minutes of offset to be consumed, speculative will also
be active for that topic.  We haven't play much with this setting.
However, if we ever get into a situation where we have to do catchup, it's
good to have this setting disabled.

mapreduce.job.speculative.slownodethreshold     1.0
mapreduce.job.speculative.speculativecap        0.1

mapreduce.map.speculative       true
</quote>

On Tue, Feb 03, 2015 at 05:14:02PM +0000, Aditya Auradkar wrote:
> Hi Bhavesh,
> 
> I just checked with one of the devs on the Camus team. We run the Camus job with speculative execution disabled.
> 
> Aditya
> 
> ________________________________________
> From: Pradeep Gollakota [pradeepg26@gmail.com]
> Sent: Monday, February 02, 2015 11:15 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka ETL Camus Question
> 
> Hi Bhavesh,
> 
> At Lithium, we don't run Camus in our pipelines yet, though we plan to. But
> I just wanted to comment regarding speculative execution. We have it
> disabled at the cluster level and typically don't need it for most of our
> jobs. Especially with something like Camus, I don't see any need to run
> parallel copies of the same task.
> 
> On Mon, Feb 2, 2015 at 10:36 PM, Bhavesh Mistry <mi...@gmail.com>
> wrote:
> 
> > Hi Jun,
> >
> > Thanks for info.  I did not get answer  to my question there so I thought I
> > try my luck here :)
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao <ju...@confluent.io> wrote:
> >
> > > You can probably ask the Camus mailing list.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <
> > > mistry.p.bhavesh@gmail.com>
> > > wrote:
> > >
> > > > Hi Kafka Team or Linked-In  Team,
> > > >
> > > > I would like to know if you guys run Camus ETL job with speculative
> > > > execution true or false.  Does it make sense to set this to false ?
> > > Having
> > > > true, it creates additional load on brokers for each map task (create a
> > > map
> > > > task to pull same partition twice).  Is there any advantage to this
> > > having
> > > > it on vs off ?
> > > >
> > > > mapred.map.tasks.speculative.execution
> > > >
> > > > Thanks,
> > > >
> > > > Bhavesh
> > > >
> > >
> >


RE: Kafka ETL Camus Question

Posted by Aditya Auradkar <aa...@linkedin.com.INVALID>.
Hi Bhavesh,

I just checked with one of the devs on the Camus team. We run the Camus job with speculative execution disabled.

Aditya

________________________________________
From: Pradeep Gollakota [pradeepg26@gmail.com]
Sent: Monday, February 02, 2015 11:15 PM
To: users@kafka.apache.org
Subject: Re: Kafka ETL Camus Question

Hi Bhavesh,

At Lithium, we don't run Camus in our pipelines yet, though we plan to. But
I just wanted to comment regarding speculative execution. We have it
disabled at the cluster level and typically don't need it for most of our
jobs. Especially with something like Camus, I don't see any need to run
parallel copies of the same task.

On Mon, Feb 2, 2015 at 10:36 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Jun,
>
> Thanks for info.  I did not get answer  to my question there so I thought I
> try my luck here :)
>
> Thanks,
>
> Bhavesh
>
> On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > You can probably ask the Camus mailing list.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <
> > mistry.p.bhavesh@gmail.com>
> > wrote:
> >
> > > Hi Kafka Team or Linked-In  Team,
> > >
> > > I would like to know if you guys run Camus ETL job with speculative
> > > execution true or false.  Does it make sense to set this to false ?
> > Having
> > > true, it creates additional load on brokers for each map task (create a
> > map
> > > task to pull same partition twice).  Is there any advantage to this
> > having
> > > it on vs off ?
> > >
> > > mapred.map.tasks.speculative.execution
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> >
>

Re: Kafka ETL Camus Question

Posted by Pradeep Gollakota <pr...@gmail.com>.
Hi Bhavesh,

At Lithium, we don't run Camus in our pipelines yet, though we plan to. But
I just wanted to comment regarding speculative execution. We have it
disabled at the cluster level and typically don't need it for most of our
jobs. Especially with something like Camus, I don't see any need to run
parallel copies of the same task.

On Mon, Feb 2, 2015 at 10:36 PM, Bhavesh Mistry <mi...@gmail.com>
wrote:

> Hi Jun,
>
> Thanks for info.  I did not get answer  to my question there so I thought I
> try my luck here :)
>
> Thanks,
>
> Bhavesh
>
> On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao <ju...@confluent.io> wrote:
>
> > You can probably ask the Camus mailing list.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <
> > mistry.p.bhavesh@gmail.com>
> > wrote:
> >
> > > Hi Kafka Team or Linked-In  Team,
> > >
> > > I would like to know if you guys run Camus ETL job with speculative
> > > execution true or false.  Does it make sense to set this to false ?
> > Having
> > > true, it creates additional load on brokers for each map task (create a
> > map
> > > task to pull same partition twice).  Is there any advantage to this
> > having
> > > it on vs off ?
> > >
> > > mapred.map.tasks.speculative.execution
> > >
> > > Thanks,
> > >
> > > Bhavesh
> > >
> >
>

Re: Kafka ETL Camus Question

Posted by Bhavesh Mistry <mi...@gmail.com>.
Hi Jun,

Thanks for info.  I did not get answer  to my question there so I thought I
try my luck here :)

Thanks,

Bhavesh

On Mon, Feb 2, 2015 at 9:46 PM, Jun Rao <ju...@confluent.io> wrote:

> You can probably ask the Camus mailing list.
>
> Thanks,
>
> Jun
>
> On Thu, Jan 29, 2015 at 1:59 PM, Bhavesh Mistry <
> mistry.p.bhavesh@gmail.com>
> wrote:
>
> > Hi Kafka Team or Linked-In  Team,
> >
> > I would like to know if you guys run Camus ETL job with speculative
> > execution true or false.  Does it make sense to set this to false ?
> Having
> > true, it creates additional load on brokers for each map task (create a
> map
> > task to pull same partition twice).  Is there any advantage to this
> having
> > it on vs off ?
> >
> > mapred.map.tasks.speculative.execution
> >
> > Thanks,
> >
> > Bhavesh
> >
>