You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by Mike Percy <mp...@cloudera.com> on 2016/05/28 00:50:35 UTC

LocalConsensus

I would like to delete LocalConsensus from Kudu before 1.0. Anyone opposed
to this?

Thanks,
Mike

--
Mike Percy
Software Engineer, Cloudera

Re: LocalConsensus

Posted by Mike Percy <mp...@cloudera.com>.
Yeah basically all I did was use the create-demo-table tool to create a
"twitter" table and pounded on it for about a minute with the
insert-loadgen tool from kudu-examples. Then I killed the processes,
swapped the binaries for Raft-only ones and started back up the load
generator. Everything seemed fine.

If no one has concerns I'll wrap up the patches for this and try a longer
run on a cluster with a more thorough verification step.

Mike

--
Mike Percy
Software Engineer, Cloudera


On Tue, Jun 7, 2016 at 6:08 PM, David Alves <da...@gmail.com> wrote:

> I think the slowdown is a reasonable tradeoff for the simplicity and code
> cleaning, so I'd be +1 on merging with it.
> How "quick" was the migration test? Was it a cluster that we'd be hammering
> with writes on local and then boot with raft? Ideally we'd do this a few
> times on a reasonably sized cluster.
>
> -david
>
> On Tue, Jun 7, 2016 at 5:56 PM, Mike Percy <mp...@apache.org> wrote:
>
> > I did some more benchmarking yesterday and today and got the following
> > results:
> >
> > # 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only)
> > local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115]
> > raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327]
> >
> > Min slowdown:  3.9200241327 %
> > Max slowdown:  10.0560772192 %
> > Average slowdown:  7.66171972498 %
> >
> > So it looks like a 4-10% write slowdown on tables with replication
> disabled
> > if we remove LocalConsensus.
> >
> > FWIW, this is a pure insert workload only. When comparing performance on
> > YCSB runs with a mixed read / write workload there is essentially no
> > difference.
> >
> > Worth mentioning the settings used. Same hardware as before, with the
> > following flags:
> >
> >   ycsb_opts:
> >     recordcount:    4000000
> >     operationcount: 1000000
> >     threads:        16
> >     max_execution_time: 1800
> >     load_sync: true
> >   ts_flags:
> >     cfile_do_on_finish: "flush"
> >     flush_threshold_mb: "1000"
> >     maintenance_manager_num_threads: "2"
> >
> > (I also tuned election timeouts to be near zero to make leader election
> > instantaneous)
> >
> > I did a quick test of migrating from a version of Kudu with support for
> > LocalConsensus and a version without support for it and it worked.
> >
> > What do you guys think? Is this too large of a hit to take to remove our
> > "fake" version of Consensus?
> >
> > As mentioned previously, the drawback to keeping LocalConsensus is that
> > there is currently no way to add nodes to a system running with it. It's
> > currently the default choice for people who set replication factor to 1
> on
> > a table.
> >
> > Mike
> >
> >
> > On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <mp...@apache.org> wrote:
> >
> > > To spare you the wall of text let me quickly summarize the scale factor
> > 10
> > > results:
> > >
> > > Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown)
> but
> > > there's quite a bit of variance in there
> > > Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup)
> > but
> > > again, there's a bit of variance
> > >
> > > Doesn't really look any different to me.
> > >
> > > Mike
> > >
> > > On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mp...@apache.org> wrote:
> > >
> > >> I still have to test migration (pretty sure it's a no-op though).
> > >> However, I got all tests passing with LocalConsensus disabled in
> > TabletPeer.
> > >>
> > >> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
> > >> using the tpch.sh default settings (except for scale factor).
> > >> The summary is that the perf looks pretty similar between the two
> > >> Consensus implementations. I don't really see a major difference.
> > >>
> > >> Machine specs:
> > >>
> > >> CPU(s): 48 (4x6 core w/ HT)
> > >> RAM: 96 GB
> > >> OS: Centos 6.6 (final)
> > >> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09
> UTC
> > >> 2015 x86_64 x86_64 GNU/Linux
> > >>
> > >> The numbers:
> > >>
> > >> *INSERT*
> > >>
> > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> > >> averages
> > >> local 1 26.557 26.557 -
> > >> raft 1 25.843 25.843 - 1.027628371
> > >> local 10 271.410
> > >> local 10 282.738
> > >> local 10 283.580 279.243 6.79634029
> > >> raft 10 281.986
> > >> raft 10 281.551
> > >> raft 10 283.049 282.195 0.7706272337 0.9895367984
> > >>
> > >> *QUERY*
> > >>
> > >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> > >> averages
> > >> local 1 1.281
> > >> local 1 1.325
> > >> local 1 1.340
> > >> local 1 1.280 1.31 0.03
> > >> raft 1 1.304
> > >> raft 1 1.334
> > >> raft 1 1.293
> > >> raft 1 1.331 1.32 0.02 0.9931584949
> > >> local 10 14.879
> > >> local 10 14.333
> > >> local 10 14.397
> > >> local 10 14.040
> > >> local 10 13.573
> > >> local 10 13.216
> > >> local 10 13.597
> > >> local 10 13.858 13.99 0.54
> > >> raft 10 12.455
> > >> raft 10 13.998
> > >> raft 10 13.367
> > >> raft 10 13.759
> > >> raft 10 14.301
> > >> raft 10 13.919
> > >> raft 10 13.036
> > >> raft 10 13.410 13.53 0.59 1.033701326
> > >>
> > >> Is there some other measurement I should take or does this seem
> > >> sufficient from a performance perspective?
> > >>
> > >> Thanks,
> > >> Mike
> > >>
> > >>
> > >>
> > >> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mp...@apache.org> wrote:
> > >>
> > >>> I don't think we want to take much of a perf hit. I'll check it out.
> > >>>
> > >>> Another reason to have one version of Consensus is that it's
> currently
> > >>> not possible to go from 1 node to 3.
> > >>>
> > >>> MIke
> > >>>
> > >>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com>
> > wrote:
> > >>>
> > >>>> I'm curious also what kind of perf impact we are willing to take for
> > the
> > >>>> un-replicated case. I think single-node Kudu performing well is
> > actually
> > >>>> nice from an adoption standpoint (many people have workloads which
> fit
> > >>>> on a
> > >>>> single machine). Would be good to have some simple verification that
> > the
> > >>>> write perf of single-node raft isn't substantially worse.
> > >>>>
> > >>>> -Todd
> > >>>>
> > >>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org>
> wrote:
> > >>>>
> > >>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <
> davidralves@gmail.com
> > >
> > >>>> > wrote:
> > >>>> >
> > >>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok
> but
> > >>>> we're
> > >>>> > > not totally sure it works in all cases.
> > >>>> > >
> > >>>> >
> > >>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
> > >>>> doesn't
> > >>>> > work, though.
> > >>>> >
> > >>>> >
> > >>>> > > Regarding the tests, I guess just flip it and see what happens
> on
> > >>>> ctest?
> > >>>> > >
> > >>>> >
> > >>>> > Yeah, it fails of course but mostly for silly reasons related to
> > test
> > >>>> > setup. Working on that.
> > >>>> >
> > >>>> >
> > >>>> > > Regarding the upgrade path, I think we'd need to test this at
> some
> > >>>> scale,
> > >>>> > > i.e. fill up a cluster using the current version, with local
> > >>>> consensus,
> > >>>> > and
> > >>>> > > then replace the binaries with the new version, without it.
> > >>>> > >
> > >>>> >
> > >>>> > +1 SGTM. I don't mind doing that.
> > >>>> >
> > >>>> > Mike
> > >>>> >
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Todd Lipcon
> > >>>> Software Engineer, Cloudera
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: LocalConsensus

Posted by David Alves <da...@gmail.com>.
I think the slowdown is a reasonable tradeoff for the simplicity and code
cleaning, so I'd be +1 on merging with it.
How "quick" was the migration test? Was it a cluster that we'd be hammering
with writes on local and then boot with raft? Ideally we'd do this a few
times on a reasonably sized cluster.

-david

On Tue, Jun 7, 2016 at 5:56 PM, Mike Percy <mp...@apache.org> wrote:

> I did some more benchmarking yesterday and today and got the following
> results:
>
> # 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only)
> local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115]
> raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327]
>
> Min slowdown:  3.9200241327 %
> Max slowdown:  10.0560772192 %
> Average slowdown:  7.66171972498 %
>
> So it looks like a 4-10% write slowdown on tables with replication disabled
> if we remove LocalConsensus.
>
> FWIW, this is a pure insert workload only. When comparing performance on
> YCSB runs with a mixed read / write workload there is essentially no
> difference.
>
> Worth mentioning the settings used. Same hardware as before, with the
> following flags:
>
>   ycsb_opts:
>     recordcount:    4000000
>     operationcount: 1000000
>     threads:        16
>     max_execution_time: 1800
>     load_sync: true
>   ts_flags:
>     cfile_do_on_finish: "flush"
>     flush_threshold_mb: "1000"
>     maintenance_manager_num_threads: "2"
>
> (I also tuned election timeouts to be near zero to make leader election
> instantaneous)
>
> I did a quick test of migrating from a version of Kudu with support for
> LocalConsensus and a version without support for it and it worked.
>
> What do you guys think? Is this too large of a hit to take to remove our
> "fake" version of Consensus?
>
> As mentioned previously, the drawback to keeping LocalConsensus is that
> there is currently no way to add nodes to a system running with it. It's
> currently the default choice for people who set replication factor to 1 on
> a table.
>
> Mike
>
>
> On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <mp...@apache.org> wrote:
>
> > To spare you the wall of text let me quickly summarize the scale factor
> 10
> > results:
> >
> > Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown) but
> > there's quite a bit of variance in there
> > Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup)
> but
> > again, there's a bit of variance
> >
> > Doesn't really look any different to me.
> >
> > Mike
> >
> > On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mp...@apache.org> wrote:
> >
> >> I still have to test migration (pretty sure it's a no-op though).
> >> However, I got all tests passing with LocalConsensus disabled in
> TabletPeer.
> >>
> >> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
> >> using the tpch.sh default settings (except for scale factor).
> >> The summary is that the perf looks pretty similar between the two
> >> Consensus implementations. I don't really see a major difference.
> >>
> >> Machine specs:
> >>
> >> CPU(s): 48 (4x6 core w/ HT)
> >> RAM: 96 GB
> >> OS: Centos 6.6 (final)
> >> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC
> >> 2015 x86_64 x86_64 GNU/Linux
> >>
> >> The numbers:
> >>
> >> *INSERT*
> >>
> >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> >> averages
> >> local 1 26.557 26.557 -
> >> raft 1 25.843 25.843 - 1.027628371
> >> local 10 271.410
> >> local 10 282.738
> >> local 10 283.580 279.243 6.79634029
> >> raft 10 281.986
> >> raft 10 281.551
> >> raft 10 283.049 282.195 0.7706272337 0.9895367984
> >>
> >> *QUERY*
> >>
> >> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> >> averages
> >> local 1 1.281
> >> local 1 1.325
> >> local 1 1.340
> >> local 1 1.280 1.31 0.03
> >> raft 1 1.304
> >> raft 1 1.334
> >> raft 1 1.293
> >> raft 1 1.331 1.32 0.02 0.9931584949
> >> local 10 14.879
> >> local 10 14.333
> >> local 10 14.397
> >> local 10 14.040
> >> local 10 13.573
> >> local 10 13.216
> >> local 10 13.597
> >> local 10 13.858 13.99 0.54
> >> raft 10 12.455
> >> raft 10 13.998
> >> raft 10 13.367
> >> raft 10 13.759
> >> raft 10 14.301
> >> raft 10 13.919
> >> raft 10 13.036
> >> raft 10 13.410 13.53 0.59 1.033701326
> >>
> >> Is there some other measurement I should take or does this seem
> >> sufficient from a performance perspective?
> >>
> >> Thanks,
> >> Mike
> >>
> >>
> >>
> >> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mp...@apache.org> wrote:
> >>
> >>> I don't think we want to take much of a perf hit. I'll check it out.
> >>>
> >>> Another reason to have one version of Consensus is that it's currently
> >>> not possible to go from 1 node to 3.
> >>>
> >>> MIke
> >>>
> >>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> >>>
> >>>> I'm curious also what kind of perf impact we are willing to take for
> the
> >>>> un-replicated case. I think single-node Kudu performing well is
> actually
> >>>> nice from an adoption standpoint (many people have workloads which fit
> >>>> on a
> >>>> single machine). Would be good to have some simple verification that
> the
> >>>> write perf of single-node raft isn't substantially worse.
> >>>>
> >>>> -Todd
> >>>>
> >>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:
> >>>>
> >>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <davidralves@gmail.com
> >
> >>>> > wrote:
> >>>> >
> >>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
> >>>> we're
> >>>> > > not totally sure it works in all cases.
> >>>> > >
> >>>> >
> >>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
> >>>> doesn't
> >>>> > work, though.
> >>>> >
> >>>> >
> >>>> > > Regarding the tests, I guess just flip it and see what happens on
> >>>> ctest?
> >>>> > >
> >>>> >
> >>>> > Yeah, it fails of course but mostly for silly reasons related to
> test
> >>>> > setup. Working on that.
> >>>> >
> >>>> >
> >>>> > > Regarding the upgrade path, I think we'd need to test this at some
> >>>> scale,
> >>>> > > i.e. fill up a cluster using the current version, with local
> >>>> consensus,
> >>>> > and
> >>>> > > then replace the binaries with the new version, without it.
> >>>> > >
> >>>> >
> >>>> > +1 SGTM. I don't mind doing that.
> >>>> >
> >>>> > Mike
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Todd Lipcon
> >>>> Software Engineer, Cloudera
> >>>>
> >>>
> >>>
> >>
> >
>

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
I did some more benchmarking yesterday and today and got the following
results:

# 4 runs each of the YCSB Workload A "load" job, in QPS (inserts only)
local_nums = [72031.838, 73134.16462, 69772.715379, 72666.4971115]
raft_nums = [67037.6080981, 66876.2121313, 65779.7365522, 65876.1528327]

Min slowdown:  3.9200241327 %
Max slowdown:  10.0560772192 %
Average slowdown:  7.66171972498 %

So it looks like a 4-10% write slowdown on tables with replication disabled
if we remove LocalConsensus.

FWIW, this is a pure insert workload only. When comparing performance on
YCSB runs with a mixed read / write workload there is essentially no
difference.

Worth mentioning the settings used. Same hardware as before, with the
following flags:

  ycsb_opts:
    recordcount:    4000000
    operationcount: 1000000
    threads:        16
    max_execution_time: 1800
    load_sync: true
  ts_flags:
    cfile_do_on_finish: "flush"
    flush_threshold_mb: "1000"
    maintenance_manager_num_threads: "2"

(I also tuned election timeouts to be near zero to make leader election
instantaneous)

I did a quick test of migrating from a version of Kudu with support for
LocalConsensus and a version without support for it and it worked.

What do you guys think? Is this too large of a hit to take to remove our
"fake" version of Consensus?

As mentioned previously, the drawback to keeping LocalConsensus is that
there is currently no way to add nodes to a system running with it. It's
currently the default choice for people who set replication factor to 1 on
a table.

Mike


On Thu, Jun 2, 2016 at 12:45 AM, Mike Percy <mp...@apache.org> wrote:

> To spare you the wall of text let me quickly summarize the scale factor 10
> results:
>
> Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown) but
> there's quite a bit of variance in there
> Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup) but
> again, there's a bit of variance
>
> Doesn't really look any different to me.
>
> Mike
>
> On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mp...@apache.org> wrote:
>
>> I still have to test migration (pretty sure it's a no-op though).
>> However, I got all tests passing with LocalConsensus disabled in TabletPeer.
>>
>> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
>> using the tpch.sh default settings (except for scale factor).
>> The summary is that the perf looks pretty similar between the two
>> Consensus implementations. I don't really see a major difference.
>>
>> Machine specs:
>>
>> CPU(s): 48 (4x6 core w/ HT)
>> RAM: 96 GB
>> OS: Centos 6.6 (final)
>> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC
>> 2015 x86_64 x86_64 GNU/Linux
>>
>> The numbers:
>>
>> *INSERT*
>>
>> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
>> averages
>> local 1 26.557 26.557 -
>> raft 1 25.843 25.843 - 1.027628371
>> local 10 271.410
>> local 10 282.738
>> local 10 283.580 279.243 6.79634029
>> raft 10 281.986
>> raft 10 281.551
>> raft 10 283.049 282.195 0.7706272337 0.9895367984
>>
>> *QUERY*
>>
>> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
>> averages
>> local 1 1.281
>> local 1 1.325
>> local 1 1.340
>> local 1 1.280 1.31 0.03
>> raft 1 1.304
>> raft 1 1.334
>> raft 1 1.293
>> raft 1 1.331 1.32 0.02 0.9931584949
>> local 10 14.879
>> local 10 14.333
>> local 10 14.397
>> local 10 14.040
>> local 10 13.573
>> local 10 13.216
>> local 10 13.597
>> local 10 13.858 13.99 0.54
>> raft 10 12.455
>> raft 10 13.998
>> raft 10 13.367
>> raft 10 13.759
>> raft 10 14.301
>> raft 10 13.919
>> raft 10 13.036
>> raft 10 13.410 13.53 0.59 1.033701326
>>
>> Is there some other measurement I should take or does this seem
>> sufficient from a performance perspective?
>>
>> Thanks,
>> Mike
>>
>>
>>
>> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mp...@apache.org> wrote:
>>
>>> I don't think we want to take much of a perf hit. I'll check it out.
>>>
>>> Another reason to have one version of Consensus is that it's currently
>>> not possible to go from 1 node to 3.
>>>
>>> MIke
>>>
>>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>>
>>>> I'm curious also what kind of perf impact we are willing to take for the
>>>> un-replicated case. I think single-node Kudu performing well is actually
>>>> nice from an adoption standpoint (many people have workloads which fit
>>>> on a
>>>> single machine). Would be good to have some simple verification that the
>>>> write perf of single-node raft isn't substantially worse.
>>>>
>>>> -Todd
>>>>
>>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:
>>>>
>>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
>>>> we're
>>>> > > not totally sure it works in all cases.
>>>> > >
>>>> >
>>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
>>>> doesn't
>>>> > work, though.
>>>> >
>>>> >
>>>> > > Regarding the tests, I guess just flip it and see what happens on
>>>> ctest?
>>>> > >
>>>> >
>>>> > Yeah, it fails of course but mostly for silly reasons related to test
>>>> > setup. Working on that.
>>>> >
>>>> >
>>>> > > Regarding the upgrade path, I think we'd need to test this at some
>>>> scale,
>>>> > > i.e. fill up a cluster using the current version, with local
>>>> consensus,
>>>> > and
>>>> > > then replace the binaries with the new version, without it.
>>>> > >
>>>> >
>>>> > +1 SGTM. I don't mind doing that.
>>>> >
>>>> > Mike
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>
>

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
To spare you the wall of text let me quickly summarize the scale factor 10
results:

Insert: local avg 268 sec, raft avg 282 sec (raft has a 1% slowdown) but
there's quite a bit of variance in there
Query: local avg 13.99 sec, raft avg 13.53 sec (raft has a 3% speedup) but
again, there's a bit of variance

Doesn't really look any different to me.

Mike

On Thu, Jun 2, 2016 at 12:37 AM, Mike Percy <mp...@apache.org> wrote:

> I still have to test migration (pretty sure it's a no-op though). However,
> I got all tests passing with LocalConsensus disabled in TabletPeer.
>
> To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
> using the tpch.sh default settings (except for scale factor).
> The summary is that the perf looks pretty similar between the two
> Consensus implementations. I don't really see a major difference.
>
> Machine specs:
>
> CPU(s): 48 (4x6 core w/ HT)
> RAM: 96 GB
> OS: Centos 6.6 (final)
> Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC
> 2015 x86_64 x86_64 GNU/Linux
>
> The numbers:
>
> *INSERT*
>
> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> averages
> local 1 26.557 26.557 -
> raft 1 25.843 25.843 - 1.027628371
> local 10 271.410
> local 10 282.738
> local 10 283.580 279.243 6.79634029
> raft 10 281.986
> raft 10 281.551
> raft 10 283.049 282.195 0.7706272337 0.9895367984
>
> *QUERY*
>
> Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of
> averages
> local 1 1.281
> local 1 1.325
> local 1 1.340
> local 1 1.280 1.31 0.03
> raft 1 1.304
> raft 1 1.334
> raft 1 1.293
> raft 1 1.331 1.32 0.02 0.9931584949
> local 10 14.879
> local 10 14.333
> local 10 14.397
> local 10 14.040
> local 10 13.573
> local 10 13.216
> local 10 13.597
> local 10 13.858 13.99 0.54
> raft 10 12.455
> raft 10 13.998
> raft 10 13.367
> raft 10 13.759
> raft 10 14.301
> raft 10 13.919
> raft 10 13.036
> raft 10 13.410 13.53 0.59 1.033701326
>
> Is there some other measurement I should take or does this seem sufficient
> from a performance perspective?
>
> Thanks,
> Mike
>
>
>
> On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mp...@apache.org> wrote:
>
>> I don't think we want to take much of a perf hit. I'll check it out.
>>
>> Another reason to have one version of Consensus is that it's currently
>> not possible to go from 1 node to 3.
>>
>> MIke
>>
>> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> I'm curious also what kind of perf impact we are willing to take for the
>>> un-replicated case. I think single-node Kudu performing well is actually
>>> nice from an adoption standpoint (many people have workloads which fit
>>> on a
>>> single machine). Would be good to have some simple verification that the
>>> write perf of single-node raft isn't substantially worse.
>>>
>>> -Todd
>>>
>>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:
>>>
>>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com>
>>> > wrote:
>>> >
>>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
>>> we're
>>> > > not totally sure it works in all cases.
>>> > >
>>> >
>>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
>>> doesn't
>>> > work, though.
>>> >
>>> >
>>> > > Regarding the tests, I guess just flip it and see what happens on
>>> ctest?
>>> > >
>>> >
>>> > Yeah, it fails of course but mostly for silly reasons related to test
>>> > setup. Working on that.
>>> >
>>> >
>>> > > Regarding the upgrade path, I think we'd need to test this at some
>>> scale,
>>> > > i.e. fill up a cluster using the current version, with local
>>> consensus,
>>> > and
>>> > > then replace the binaries with the new version, without it.
>>> > >
>>> >
>>> > +1 SGTM. I don't mind doing that.
>>> >
>>> > Mike
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
I still have to test migration (pretty sure it's a no-op though). However,
I got all tests passing with LocalConsensus disabled in TabletPeer.

To test performance, I ran TPC-H Q1 on a single node (via MiniCluster)
using the tpch.sh default settings (except for scale factor).
The summary is that the perf looks pretty similar between the two Consensus
implementations. I don't really see a major difference.

Machine specs:

CPU(s): 48 (4x6 core w/ HT)
RAM: 96 GB
OS: Centos 6.6 (final)
Kernel: Linux 2.6.32-504.30.3.el6.x86_64 #1 SMP Wed Jul 15 10:13:09 UTC
2015 x86_64 x86_64 GNU/Linux

The numbers:

*INSERT*

Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of averages
local 1 26.557 26.557 -
raft 1 25.843 25.843 - 1.027628371
local 10 271.410
local 10 282.738
local 10 283.580 279.243 6.79634029
raft 10 281.986
raft 10 281.551
raft 10 283.049 282.195 0.7706272337 0.9895367984

*QUERY*

Consensus Scale factor Time (sec) Avg (sec) Std. dev (sec) Ratio of averages
local 1 1.281
local 1 1.325
local 1 1.340
local 1 1.280 1.31 0.03
raft 1 1.304
raft 1 1.334
raft 1 1.293
raft 1 1.331 1.32 0.02 0.9931584949
local 10 14.879
local 10 14.333
local 10 14.397
local 10 14.040
local 10 13.573
local 10 13.216
local 10 13.597
local 10 13.858 13.99 0.54
raft 10 12.455
raft 10 13.998
raft 10 13.367
raft 10 13.759
raft 10 14.301
raft 10 13.919
raft 10 13.036
raft 10 13.410 13.53 0.59 1.033701326

Is there some other measurement I should take or does this seem sufficient
from a performance perspective?

Thanks,
Mike



On Wed, Jun 1, 2016 at 2:01 PM, Mike Percy <mp...@apache.org> wrote:

> I don't think we want to take much of a perf hit. I'll check it out.
>
> Another reason to have one version of Consensus is that it's currently not
> possible to go from 1 node to 3.
>
> MIke
>
> On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> I'm curious also what kind of perf impact we are willing to take for the
>> un-replicated case. I think single-node Kudu performing well is actually
>> nice from an adoption standpoint (many people have workloads which fit on
>> a
>> single machine). Would be good to have some simple verification that the
>> write perf of single-node raft isn't substantially worse.
>>
>> -Todd
>>
>> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:
>>
>> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com>
>> > wrote:
>> >
>> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
>> we're
>> > > not totally sure it works in all cases.
>> > >
>> >
>> > Yep, I'm in the same boat. I haven't seen recent evidence that it
>> doesn't
>> > work, though.
>> >
>> >
>> > > Regarding the tests, I guess just flip it and see what happens on
>> ctest?
>> > >
>> >
>> > Yeah, it fails of course but mostly for silly reasons related to test
>> > setup. Working on that.
>> >
>> >
>> > > Regarding the upgrade path, I think we'd need to test this at some
>> scale,
>> > > i.e. fill up a cluster using the current version, with local
>> consensus,
>> > and
>> > > then replace the binaries with the new version, without it.
>> > >
>> >
>> > +1 SGTM. I don't mind doing that.
>> >
>> > Mike
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
I don't think we want to take much of a perf hit. I'll check it out.

Another reason to have one version of Consensus is that it's currently not
possible to go from 1 node to 3.

MIke

On Wed, Jun 1, 2016 at 12:28 PM, Todd Lipcon <to...@cloudera.com> wrote:

> I'm curious also what kind of perf impact we are willing to take for the
> un-replicated case. I think single-node Kudu performing well is actually
> nice from an adoption standpoint (many people have workloads which fit on a
> single machine). Would be good to have some simple verification that the
> write perf of single-node raft isn't substantially worse.
>
> -Todd
>
> On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:
>
> > On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com>
> > wrote:
> >
> > > My (and I suspect Todd's) fear here is that we _think_ it's ok but
> we're
> > > not totally sure it works in all cases.
> > >
> >
> > Yep, I'm in the same boat. I haven't seen recent evidence that it doesn't
> > work, though.
> >
> >
> > > Regarding the tests, I guess just flip it and see what happens on
> ctest?
> > >
> >
> > Yeah, it fails of course but mostly for silly reasons related to test
> > setup. Working on that.
> >
> >
> > > Regarding the upgrade path, I think we'd need to test this at some
> scale,
> > > i.e. fill up a cluster using the current version, with local consensus,
> > and
> > > then replace the binaries with the new version, without it.
> > >
> >
> > +1 SGTM. I don't mind doing that.
> >
> > Mike
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: LocalConsensus

Posted by Todd Lipcon <to...@cloudera.com>.
I'm curious also what kind of perf impact we are willing to take for the
un-replicated case. I think single-node Kudu performing well is actually
nice from an adoption standpoint (many people have workloads which fit on a
single machine). Would be good to have some simple verification that the
write perf of single-node raft isn't substantially worse.

-Todd

On Wed, Jun 1, 2016 at 7:41 PM, Mike Percy <mp...@apache.org> wrote:

> On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com>
> wrote:
>
> > My (and I suspect Todd's) fear here is that we _think_ it's ok but we're
> > not totally sure it works in all cases.
> >
>
> Yep, I'm in the same boat. I haven't seen recent evidence that it doesn't
> work, though.
>
>
> > Regarding the tests, I guess just flip it and see what happens on ctest?
> >
>
> Yeah, it fails of course but mostly for silly reasons related to test
> setup. Working on that.
>
>
> > Regarding the upgrade path, I think we'd need to test this at some scale,
> > i.e. fill up a cluster using the current version, with local consensus,
> and
> > then replace the binaries with the new version, without it.
> >
>
> +1 SGTM. I don't mind doing that.
>
> Mike
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
On Wed, Jun 1, 2016 at 11:20 AM, David Alves <da...@gmail.com> wrote:

> My (and I suspect Todd's) fear here is that we _think_ it's ok but we're
> not totally sure it works in all cases.
>

Yep, I'm in the same boat. I haven't seen recent evidence that it doesn't
work, though.


> Regarding the tests, I guess just flip it and see what happens on ctest?
>

Yeah, it fails of course but mostly for silly reasons related to test
setup. Working on that.


> Regarding the upgrade path, I think we'd need to test this at some scale,
> i.e. fill up a cluster using the current version, with local consensus, and
> then replace the binaries with the new version, without it.
>

+1 SGTM. I don't mind doing that.

Mike

Re: LocalConsensus

Posted by David Alves <da...@gmail.com>.
My (and I suspect Todd's) fear here is that we _think_ it's ok but we're
not totally sure it works in all cases.
Regarding the tests, I guess just flip it and see what happens on ctest?
Regarding the upgrade path, I think we'd need to test this at some scale,
i.e. fill up a cluster using the current version, with local consensus, and
then replace the binaries with the new version, without it.

-david

On Wed, Jun 1, 2016 at 11:14 AM, Mike Percy <mp...@apache.org> wrote:

> You're saying test coverage for single node operation? There is a little
> bit, just not a lot. But if we flip single node to RaftConsensus, I imagine
> we will immediately get a bunch of test coverage.
>
> I think the upgrade path is pretty simple -- the only thing missing AFAIK
> is the last known RPC address of the single member in the cluster. I'm
> investigating whether that's something that we can fill in implicitly
> (don't see a reason why not, since it's the local process).
>
> Mike
>
> On Fri, May 27, 2016 at 8:28 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > The idea is nice, but I also am not sure about the test coverage, etc.
> > Also, we need to make sure there is an upgrade path from LocalConsensus
> to
> > a single-node RaftConsensus.
> >
> > On Fri, May 27, 2016 at 7:59 PM, Mike Percy <mp...@cloudera.com> wrote:
> >
> > > What ever was missing? Do you remember?
> > >
> > > Thanks,
> > > Mike
> > >
> > > > On May 27, 2016, at 6:18 PM, David Alves <da...@gmail.com>
> > wrote:
> > > >
> > > > Hi Mike
> > > >
> > > > I'd be for it if we have the regular consensus implementation working
> > > (as in tested to perform reasonably) for single node.
> > > > Are we there, or close?
> > > >
> > > > Best
> > > > David
> > > >
> > > > On Fri, May 27, 2016 at 5:51 PM, Mike Percy < mpercy@cloudera.com [
> > > mpercy@cloudera.com] > wrote:
> > > > I would like to delete LocalConsensus from Kudu before 1.0. Anyone
> > > opposed
> > > > to this?
> > > >
> > > > Thanks,
> > > > Mike
> > > >
> > > > --
> > > > Mike Percy
> > > > Software Engineer, Cloudera
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

Re: LocalConsensus

Posted by Mike Percy <mp...@apache.org>.
You're saying test coverage for single node operation? There is a little
bit, just not a lot. But if we flip single node to RaftConsensus, I imagine
we will immediately get a bunch of test coverage.

I think the upgrade path is pretty simple -- the only thing missing AFAIK
is the last known RPC address of the single member in the cluster. I'm
investigating whether that's something that we can fill in implicitly
(don't see a reason why not, since it's the local process).

Mike

On Fri, May 27, 2016 at 8:28 PM, Todd Lipcon <to...@cloudera.com> wrote:

> The idea is nice, but I also am not sure about the test coverage, etc.
> Also, we need to make sure there is an upgrade path from LocalConsensus to
> a single-node RaftConsensus.
>
> On Fri, May 27, 2016 at 7:59 PM, Mike Percy <mp...@cloudera.com> wrote:
>
> > What ever was missing? Do you remember?
> >
> > Thanks,
> > Mike
> >
> > > On May 27, 2016, at 6:18 PM, David Alves <da...@gmail.com>
> wrote:
> > >
> > > Hi Mike
> > >
> > > I'd be for it if we have the regular consensus implementation working
> > (as in tested to perform reasonably) for single node.
> > > Are we there, or close?
> > >
> > > Best
> > > David
> > >
> > > On Fri, May 27, 2016 at 5:51 PM, Mike Percy < mpercy@cloudera.com [
> > mpercy@cloudera.com] > wrote:
> > > I would like to delete LocalConsensus from Kudu before 1.0. Anyone
> > opposed
> > > to this?
> > >
> > > Thanks,
> > > Mike
> > >
> > > --
> > > Mike Percy
> > > Software Engineer, Cloudera
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: LocalConsensus

Posted by Todd Lipcon <to...@cloudera.com>.
The idea is nice, but I also am not sure about the test coverage, etc.
Also, we need to make sure there is an upgrade path from LocalConsensus to
a single-node RaftConsensus.

On Fri, May 27, 2016 at 7:59 PM, Mike Percy <mp...@cloudera.com> wrote:

> What ever was missing? Do you remember?
>
> Thanks,
> Mike
>
> > On May 27, 2016, at 6:18 PM, David Alves <da...@gmail.com> wrote:
> >
> > Hi Mike
> >
> > I'd be for it if we have the regular consensus implementation working
> (as in tested to perform reasonably) for single node.
> > Are we there, or close?
> >
> > Best
> > David
> >
> > On Fri, May 27, 2016 at 5:51 PM, Mike Percy < mpercy@cloudera.com [
> mpercy@cloudera.com] > wrote:
> > I would like to delete LocalConsensus from Kudu before 1.0. Anyone
> opposed
> > to this?
> >
> > Thanks,
> > Mike
> >
> > --
> > Mike Percy
> > Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: LocalConsensus

Posted by Mike Percy <mp...@cloudera.com>.
What ever was missing? Do you remember?

Thanks,
Mike

> On May 27, 2016, at 6:18 PM, David Alves <da...@gmail.com> wrote:
> 
> Hi Mike
> 
> I'd be for it if we have the regular consensus implementation working (as in tested to perform reasonably) for single node.
> Are we there, or close?
> 
> Best
> David
> 
> On Fri, May 27, 2016 at 5:51 PM, Mike Percy < mpercy@cloudera.com [mpercy@cloudera.com] > wrote:
> I would like to delete LocalConsensus from Kudu before 1.0. Anyone opposed
> to this?
> 
> Thanks,
> Mike
> 
> --
> Mike Percy
> Software Engineer, Cloudera

Re: LocalConsensus

Posted by David Alves <da...@gmail.com>.
Hi Mike

I'd be for it if we have the regular consensus implementation working (as 
in tested to perform reasonably) for single node.
Are we there, or close?

Best
David

On Fri, May 27, 2016 at 5:51 PM, Mike Percy < mpercy@cloudera.com 
[mpercy@cloudera.com] > wrote:
I would like to delete LocalConsensus from Kudu before 1.0. Anyone opposed
to this?

Thanks,
Mike

--
Mike Percy
Software Engineer, Cloudera