You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2015/07/15 00:18:06 UTC

DISCUSSION: lets do a developer workshop on near-term work

Matteo and I were thinking it time devs got together for a pow-wow. There
is a bunch of stuff in flight at the moment (see below list) and it would
be good to meet and whiteboard, surface goodo ideas that have gone dormant
in JIRA, or revisit designs/proposals out in JIRA-attached google doc that
need socializing.

You can only come if you are wearing your bullshit hat.

Topics we'd go over could include:

+ Our filesystem layout will not work if 1M regions (Matteo/Stack)
+ Current state of the offheaping of read path and alternate KeyValue
implementation (Anoop/Ram)
+ Append rejigger (Elliott)
+ A Pv2-based Assign (Matteo/Steven)
+ Splitting meta/1M regions
+ The revived Backup (Vladimir)
+ Time (Enis)
+ The overloaded SequenceId (Stack)
+ Upstreaming IT testing (Dima/Sean)
+ hbase-2.0.0

I put names by folks I know could talk to the topic. If you want to take
over a topic or put your name by one, just say.  Suggest that discussion
lead off with a 5-10minute on current state of
thought/design/implementation.

What do others think?

What date would suit folks?

Anyone want to host?

Thanks,
Matteo and St.Ack

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <ap...@apache.org>.

On the time topic, let's look at Enis' HLC proposal.

Doc:
https://docs.google.com/document/d/1LL2GAodiYi0waBz5ODGL4LDT4e_bXy8P9h6kWC05Bhw/edit#
JIRA: https://issues.apache.org/jira/browse/HBASE-14070

Covers exactly making space for local bits in the timestamps, but to
implement HLC not something 'user servicable'


On Wed, Jul 15, 2015 at 8:58 AM, lars hofhansl <la...@apache.org> wrote:

> Works for me. I'll be back in the Bay Area the week of August 9th.
> We have done a _lot_ of work on backups as well - ours are more
> complicated as we wanted fast per-tenant restores, so data is "grouped" by
> tenant. Would like to sync up on that (hopefully some of the folks who
> wrote most of the code will be in town, I'll check).
>
> Also interested in the "Time" and "offheap" parts (although you folks
> usually do not like what I think about the offheap efforts :) ).
> Would like to add the following topics:
>
>
> - "Timestamp Resolution". Or making space for more bits in the timestamps
> (happy to cover that, unless it's part of the "Time" topic)
>
>
> - "Replication". We found that replication cannot keep up with high write
> loads, due to the fact that replicated is strictly single threaded per
> regionserver (even though we have multiple region servers on the sink side)
>
>
> - "Spark integration" (Ted Malaska?)
>
>
> OK... Out now to make a "bullshit hat".
>
> -- Lars
>
> ________________________________
> From: Sean Busbey <bu...@cloudera.com>
> To: dev <de...@hbase.apache.org>
> Sent: Tuesday, July 14, 2015 7:11 PM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>
>
> I'm planning to be in the Bay area the week of the 24th of August.
>
> --
> Sean
>
>
>
> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> > I can be up in your area in August.
> >
> > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> > wrote:
> > >
> > > > Sounds good. It has been a while we did the talk-aton.
> > > >
> > > > I'll be off starting 25 of July, so I prefer something next week if
> > > > possible.
> > > >
> > > > You ever coming back? If so, when? I'm back on 10th of August
> (Mikhail
> > on
> > > the 20th).
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Enis
> > > >
> > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > > There
> > > > > is a bunch of stuff in flight at the moment (see below list) and it
> > > would
> > > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > > dormant
> > > > > in JIRA, or revisit designs/proposals out in JIRA-attached google
> doc
> > > > that
> > > > > need socializing.
> > > > >
> > > > > You can only come if you are wearing your bullshit hat.
> > > > >
> > > > > Topics we'd go over could include:
> > > > >
> > > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > > + Current state of the offheaping of read path and alternate
> KeyValue
> > > > > implementation (Anoop/Ram)
> > > > > + Append rejigger (Elliott)
> > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > + Splitting meta/1M regions
> > > > > + The revived Backup (Vladimir)
> > > > > + Time (Enis)
> > > > > + The overloaded SequenceId (Stack)
> > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > + hbase-2.0.0
> > > > >
> > > > > I put names by folks I know could talk to the topic. If you want to
> > > take
> > > > > over a topic or put your name by one, just say.  Suggest that
> > > discussion
> > > > > lead off with a 5-10minute on current state of
> > > > > thought/design/implementation.
> > > > >
> > > > > What do others think?
> > > > >
> > > > > What date would suit folks?
> > > > >
> > > > > Anyone want to host?
> > > > >
> > > > > Thanks,
> > > > > Matteo and St.Ack
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Would definitely like to attend the meeting if it is suitable to IST
timing.
Performance numbers in 11425 are cluster testing done using the basic perf
test tools given by hbase.  We plan to test them using bigger data set
using tools like YCSB and may be we will specifically see what is the n/w
bandwidth impact on the client side.

Our initial tests were done on the server side basically to really see the
benefit of how much these changes help and its benefit.

We would definitely like to take up your concern. BTW, just want to say
that we are like 60 to 70% done. Some major JIRAs like HBASE-12295 is
pending and its in final stage. Apart from that we have already started
working on some minor sub-task which would make the entire flow working
with offheap.

Regards
Ram


On Sat, Jul 18, 2015 at 11:32 PM, Anoop John <an...@gmail.com> wrote:

> No Andy. 11425 having doc attached to it. At the end of it, we have added
> perf numbers in a cluster testing.  This was done using PE get and scan
> tests with filtering all cells at server (to not consider n/w bandwidth
> constraints)
>
> -Anoop-
>
> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
> wrote:
>
> > We have some microbenchmarks, not evidence of differences seen from a
> > client application. I'm not saying that microbenchmarks are not totally
> > necessary and a great start - they are - but that they don't measure an
> end
> > goal. Furthermore unless I've missed one somewhere we don't have a JIRA
> or
> > design doc that states a clear end goal metric like the strawman I threw
> > together in my previous mail. A measurable system level goal and some
> data
> > from full cluster testing would go a lot further toward letting all of us
> > evaluate the potential and payoff of the work. In the meantime we should
> > probably be assembling these changes on a branch instead of in trunk, for
> > as long as the goal is not clearly defined and the payoff and potential
> for
> > perf regressions is untested and unknown.
> >
> >
> > > On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
> > >
> > > Thanks Andy and Lars.  The parent jira has doc attached which contains
> > some
> > > perf gain numbers..  We will be doing more tests in next 2 weeks
> (before
> > > end of this month) and will publish them.   Yes it will be great if it
> is
> > > more IST friendly time :-)
> > >
> > > -Anoop-
> > >
> > > On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > andrew.purtell@gmail.com>
> > > wrote:
> > >
> > >>> I can represent your side Ram (and Anoop). I've been known always
> argue
> > >> both side of a discussion and to never take sides easily (drives some
> > folks
> > >> crazy).
> > >>
> > >> I can vouch for this (smile)
> > >>
> > >> I also can offer support for off heaping there. At the same time we do
> > >> have a gap where we can't point to a timeline of improvements (yet,
> > anyway)
> > >> with benchmarks showing gains where your goals need them. For example,
> > >> stock HBase in one JVM can address max N GB for response time
> > distribution
> > >> D; dev version of HBase in off heap branch can address max N' GB for
> > >> distribution D', where N' > N and D > D' (distribution D'
> statistically
> > >> shows better/lower response times).
> > >>
> > >>
> > >>
> > >>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
> > >>>
> > >>> I'm in favor of anything that improves performance (and preferably
> > >> doesn't set us back into a world that's worse than C due to the lack
> of
> > >> pointers in Java).Never said "I don't like it", it's just that I'm
> > perhaps
> > >> asking for more numbers and justification in weighing the pros and
> cons.
> > >>> I can represent your side Ram (and Anoop). I've been known always
> argue
> > >> both side of a discussion and to never take sides easily (drives some
> > folks
> > >> crazy). And Stack's there too, he yell at me where needed :)
> > >>>
> > >>> Perhaps we can do it a bit later in the evening so there is a
> fighting
> > >> chance that folks on IST can participate. I know that some of our
> folks
> > on
> > >> IST would love to participate in the backup discussion).
> > >>>
> > >>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
> need
> > >> an approx. number of folks.
> > >>>
> > >>> -- Lars
> > >>>
> > >>>     From: ramkrishna vasudevan <ra...@gmail.com>
> > >>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > >> larsh@apache.org>
> > >>> Sent: Wednesday, July 15, 2015 10:10 AM
> > >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> work
> > >>>
> > >>> Hi
> > >>> What time will it be on August 26th?
> > >>> @LarsYa. I know that you are not generally in favour of this
> offheaping
> > >> stuff.  May be if we (from India) can attend this meeting remotely
> your
> > >> thoughts can be discussed and also the current state of this work.
> > >>> RegardsRam
> > >>>
> > >>>
> > >>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
> > wrote:
> > >>>
> > >>> Works for me. I'll be back in the Bay Area the week of August 9th.
> > >>> We have done a _lot_ of work on backups as well - ours are more
> > >> complicated as we wanted fast per-tenant restores, so data is
> "grouped"
> > by
> > >> tenant. Would like to sync up on that (hopefully some of the folks who
> > >> wrote most of the code will be in town, I'll check).
> > >>>
> > >>> Also interested in the "Time" and "offheap" parts (although you folks
> > >> usually do not like what I think about the offheap efforts :) ).
> > >>> Would like to add the following topics:
> > >>>
> > >>>
> > >>> - "Timestamp Resolution". Or making space for more bits in the
> > >> timestamps (happy to cover that, unless it's part of the "Time" topic)
> > >>>
> > >>>
> > >>> - "Replication". We found that replication cannot keep up with high
> > >> write loads, due to the fact that replicated is strictly single
> threaded
> > >> per regionserver (even though we have multiple region servers on the
> > sink
> > >> side)
> > >>>
> > >>>
> > >>> - "Spark integration" (Ted Malaska?)
> > >>>
> > >>>
> > >>> OK... Out now to make a "bullshit hat".
> > >>>
> > >>> -- Lars
> > >>>
> > >>> ________________________________
> > >>> From: Sean Busbey <bu...@cloudera.com>
> > >>> To: dev <de...@hbase.apache.org>
> > >>> Sent: Tuesday, July 14, 2015 7:11 PM
> > >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> work
> > >>>
> > >>>
> > >>> I'm planning to be in the Bay area the week of the 24th of August.
> > >>>
> > >>> --
> > >>> Sean
> > >>>
> > >>>
> > >>>
> > >>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> > wrote:
> > >>>>
> > >>>> I can be up in your area in August.
> > >>>>
> > >>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> > >>>>>>
> > >>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> enis.soz@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Sounds good. It has been a while we did the talk-aton.
> > >>>>>>
> > >>>>>> I'll be off starting 25 of July, so I prefer something next week
> if
> > >>>>>> possible.
> > >>>>>>
> > >>>>>> You ever coming back? If so, when? I'm back on 10th of August
> > (Mikhail
> > >>>> on
> > >>>>> the 20th).
> > >>>>> St.Ack
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Enis
> > >>>>>>
> > >>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > >>>>>>>
> > >>>>>>> Matteo and I were thinking it time devs got together for a
> pow-wow.
> > >>>>> There
> > >>>>>>> is a bunch of stuff in flight at the moment (see below list) and
> it
> > >>>>> would
> > >>>>>>> be good to meet and whiteboard, surface goodo ideas that have
> gone
> > >>>>>> dormant
> > >>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
> > doc
> > >>>>>> that
> > >>>>>>> need socializing.
> > >>>>>>>
> > >>>>>>> You can only come if you are wearing your bullshit hat.
> > >>>>>>>
> > >>>>>>> Topics we'd go over could include:
> > >>>>>>>
> > >>>>>>> + Our filesystem layout will not work if 1M regions
> (Matteo/Stack)
> > >>>>>>> + Current state of the offheaping of read path and alternate
> > KeyValue
> > >>>>>>> implementation (Anoop/Ram)
> > >>>>>>> + Append rejigger (Elliott)
> > >>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > >>>>>>> + Splitting meta/1M regions
> > >>>>>>> + The revived Backup (Vladimir)
> > >>>>>>> + Time (Enis)
> > >>>>>>> + The overloaded SequenceId (Stack)
> > >>>>>>> + Upstreaming IT testing (Dima/Sean)
> > >>>>>>> + hbase-2.0.0
> > >>>>>>>
> > >>>>>>> I put names by folks I know could talk to the topic. If you want
> to
> > >>>>> take
> > >>>>>>> over a topic or put your name by one, just say.  Suggest that
> > >>>>> discussion
> > >>>>>>> lead off with a 5-10minute on current state of
> > >>>>>>> thought/design/implementation.
> > >>>>>>>
> > >>>>>>> What do others think?
> > >>>>>>>
> > >>>>>>> What date would suit folks?
> > >>>>>>>
> > >>>>>>> Anyone want to host?
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Matteo and St.Ack
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>    - Andy
> > >>>>
> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > >>>> (via Tom White)
> > >>
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

And what is the goal or target? What is the criteria for success? 


> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
> 
> No Andy. 11425 having doc attached to it. At the end of it, we have added
> perf numbers in a cluster testing.  This was done using PE get and scan
> tests with filtering all cells at server (to not consider n/w bandwidth
> constraints)
> 
> -Anoop-
> 
> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> We have some microbenchmarks, not evidence of differences seen from a
>> client application. I'm not saying that microbenchmarks are not totally
>> necessary and a great start - they are - but that they don't measure an end
>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
>> design doc that states a clear end goal metric like the strawman I threw
>> together in my previous mail. A measurable system level goal and some data
>> from full cluster testing would go a lot further toward letting all of us
>> evaluate the potential and payoff of the work. In the meantime we should
>> probably be assembling these changes on a branch instead of in trunk, for
>> as long as the goal is not clearly defined and the payoff and potential for
>> perf regressions is untested and unknown.
>> 
>> 
>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
>>> 
>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>> some
>>> perf gain numbers..  We will be doing more tests in next 2 weeks (before
>>> end of this month) and will publish them.   Yes it will be great if it is
>>> more IST friendly time :-)
>>> 
>>> -Anoop-
>>> 
>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy).
>>>> 
>>>> I can vouch for this (smile)
>>>> 
>>>> I also can offer support for off heaping there. At the same time we do
>>>> have a gap where we can't point to a timeline of improvements (yet,
>> anyway)
>>>> with benchmarks showing gains where your goals need them. For example,
>>>> stock HBase in one JVM can address max N GB for response time
>> distribution
>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>> distribution D', where N' > N and D > D' (distribution D' statistically
>>>> shows better/lower response times).
>>>> 
>>>> 
>>>> 
>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>> I'm in favor of anything that improves performance (and preferably
>>>> doesn't set us back into a world that's worse than C due to the lack of
>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>> perhaps
>>>> asking for more numbers and justification in weighing the pros and cons.
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>> 
>>>>> Perhaps we can do it a bit later in the evening so there is a fighting
>>>> chance that folks on IST can participate. I know that some of our folks
>> on
>>>> IST would love to participate in the backup discussion).
>>>>> 
>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>>>> an approx. number of folks.
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>>    From: ramkrishna vasudevan <ra...@gmail.com>
>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>> larsh@apache.org>
>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> Hi
>>>>> What time will it be on August 26th?
>>>>> @LarsYa. I know that you are not generally in favour of this offheaping
>>>> stuff.  May be if we (from India) can attend this meeting remotely your
>>>> thoughts can be discussed and also the current state of this work.
>>>>> RegardsRam
>>>>> 
>>>>> 
>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>> wrote:
>>>>> 
>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>> complicated as we wanted fast per-tenant restores, so data is "grouped"
>> by
>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>> wrote most of the code will be in town, I'll check).
>>>>> 
>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>> usually do not like what I think about the offheap efforts :) ).
>>>>> Would like to add the following topics:
>>>>> 
>>>>> 
>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>> 
>>>>> 
>>>>> - "Replication". We found that replication cannot keep up with high
>>>> write loads, due to the fact that replicated is strictly single threaded
>>>> per regionserver (even though we have multiple region servers on the
>> sink
>>>> side)
>>>>> 
>>>>> 
>>>>> - "Spark integration" (Ted Malaska?)
>>>>> 
>>>>> 
>>>>> OK... Out now to make a "bullshit hat".
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> ________________________________
>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>> To: dev <de...@hbase.apache.org>
>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> 
>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>> 
>>>>> --
>>>>> Sean
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>> wrote:
>>>>>> 
>>>>>> I can be up in your area in August.
>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>> 
>>>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>>>> possible.
>>>>>>>> 
>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>> (Mikhail
>>>>>> on
>>>>>>> the 20th).
>>>>>>> St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Enis
>>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>> 
>>>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>>>> There
>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>>>> would
>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>>>> dormant
>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>> doc
>>>>>>>> that
>>>>>>>>> need socializing.
>>>>>>>>> 
>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>> 
>>>>>>>>> Topics we'd go over could include:
>>>>>>>>> 
>>>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>>>> + Current state of the offheaping of read path and alternate
>> KeyValue
>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>> + Time (Enis)
>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>> + hbase-2.0.0
>>>>>>>>> 
>>>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>>>> take
>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>> discussion
>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>> thought/design/implementation.
>>>>>>>>> 
>>>>>>>>> What do others think?
>>>>>>>>> 
>>>>>>>>> What date would suit folks?
>>>>>>>>> 
>>>>>>>>> Anyone want to host?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Matteo and St.Ack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>>   - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

Returning all cells to a client is the other extreme and I don't think that would be a great test either. 

Personally I think for testing big change sets well we need a range of workloads. The extreme cases (filter all, filter none) are useful data points but not great if measured in isolation. I think YCSB is a reasonable option for that these days now that it is maintained. It comes with 6 or so canned workloads. Not a bad start.


> On Jul 20, 2015, at 6:01 AM, lars hofhansl <la...@apache.org> wrote:
> 
> Personally, I think that is a reasonable way to test the internal friction of the server. I've been doing a lot of tests like that and found a lot of inefficiencies in HBase that way.For cases where we return all Cells back to a (remote) client improving the server by 10 or 20% would mostly go unnoticed.
> 
> Analytics (aggregates via Phoenix of direct coprocessors) will be more important going forward, so improving that part is important.
> I completely agree that end-to-end (by which I mean data shipped to the client) testing is important, it's just I'd expect us to work on different areas (put Protobufs on a diet, have a streaming protocol, etc).
> -- Lars
> 
>     From: Andrew Purtell <an...@gmail.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
> Sent: Saturday, July 18, 2015 11:24 AM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> 
> That's not a realistic or useful test scenario, unless the goal is to accelerate queries where all cells are filtered at the server. 
> 
> 
> 
> 
> 
>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
>> 
>> No Andy. 11425 having doc attached to it. At the end of it, we have added
>> perf numbers in a cluster testing.  This was done using PE get and scan
>> tests with filtering all cells at server (to not consider n/w bandwidth
>> constraints)
>> 
>> -Anoop-
>> 
>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
>> wrote:
>> 
>>> We have some microbenchmarks, not evidence of differences seen from a
>>> client application. I'm not saying that microbenchmarks are not totally
>>> necessary and a great start - they are - but that they don't measure an end
>>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
>>> design doc that states a clear end goal metric like the strawman I threw
>>> together in my previous mail. A measurable system level goal and some data
>>> from full cluster testing would go a lot further toward letting all of us
>>> evaluate the potential and payoff of the work. In the meantime we should
>>> probably be assembling these changes on a branch instead of in trunk, for
>>> as long as the goal is not clearly defined and the payoff and potential for
>>> perf regressions is untested and unknown.
>>> 
>>> 
>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
>>>> 
>>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>>> some
>>>> perf gain numbers..  We will be doing more tests in next 2 weeks (before
>>>> end of this month) and will publish them.  Yes it will be great if it is
>>>> more IST friendly time :-)
>>>> 
>>>> -Anoop-
>>>> 
>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>> andrew.purtell@gmail.com>
>>>> wrote:
>>>> 
>>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>>> both side of a discussion and to never take sides easily (drives some
>>> folks
>>>>> crazy).
>>>>> 
>>>>> I can vouch for this (smile)
>>>>> 
>>>>> I also can offer support for off heaping there. At the same time we do
>>>>> have a gap where we can't point to a timeline of improvements (yet,
>>> anyway)
>>>>> with benchmarks showing gains where your goals need them. For example,
>>>>> stock HBase in one JVM can address max N GB for response time
>>> distribution
>>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>>> distribution D', where N' > N and D > D' (distribution D' statistically
>>>>> shows better/lower response times).
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>>>>> 
>>>>>> I'm in favor of anything that improves performance (and preferably
>>>>> doesn't set us back into a world that's worse than C due to the lack of
>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>>> perhaps
>>>>> asking for more numbers and justification in weighing the pros and cons.
>>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>>> both side of a discussion and to never take sides easily (drives some
>>> folks
>>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>>> 
>>>>>> Perhaps we can do it a bit later in the evening so there is a fighting
>>>>> chance that folks on IST can participate. I know that some of our folks
>>> on
>>>>> IST would love to participate in the backup discussion).
>>>>>> 
>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>>>>> an approx. number of folks.
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>>     From: ramkrishna vasudevan <ra...@gmail.com>
>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>>> larsh@apache.org>
>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>>> 
>>>>>> Hi
>>>>>> What time will it be on August 26th?
>>>>>> @LarsYa. I know that you are not generally in favour of this offheaping
>>>>> stuff.  May be if we (from India) can attend this meeting remotely your
>>>>> thoughts can be discussed and also the current state of this work.
>>>>>> RegardsRam
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>>> complicated as we wanted fast per-tenant restores, so data is "grouped"
>>> by
>>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>>> wrote most of the code will be in town, I'll check).
>>>>>> 
>>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>>> usually do not like what I think about the offheap efforts :) ).
>>>>>> Would like to add the following topics:
>>>>>> 
>>>>>> 
>>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>>> 
>>>>>> 
>>>>>> - "Replication". We found that replication cannot keep up with high
>>>>> write loads, due to the fact that replicated is strictly single threaded
>>>>> per regionserver (even though we have multiple region servers on the
>>> sink
>>>>> side)
>>>>>> 
>>>>>> 
>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>> 
>>>>>> 
>>>>>> OK... Out now to make a "bullshit hat".
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>>> To: dev <de...@hbase.apache.org>
>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>>> 
>>>>>> 
>>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>>> 
>>>>>> --
>>>>>> Sean
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>> I can be up in your area in August.
>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>>> 
>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>>>>> possible.
>>>>>>>>> 
>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>>> (Mikhail
>>>>>>> on
>>>>>>>> the 20th).
>>>>>>>> St.Ack
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Enis
>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>>>>> There
>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>>>>> would
>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>>>>> dormant
>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>>> doc
>>>>>>>>> that
>>>>>>>>>> need socializing.
>>>>>>>>>> 
>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>>> 
>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>> 
>>>>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>>>>> + Current state of the offheaping of read path and alternate
>>> KeyValue
>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>> + Time (Enis)
>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>> 
>>>>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>>>>> take
>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>>> discussion
>>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>>> thought/design/implementation.
>>>>>>>>>> 
>>>>>>>>>> What do others think?
>>>>>>>>>> 
>>>>>>>>>> What date would suit folks?
>>>>>>>>>> 
>>>>>>>>>> Anyone want to host?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Matteo and St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>>   - Andy
>>>>>>> 
>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>>>> (via Tom White)
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by lars hofhansl <la...@apache.org>.

Personally, I think that is a reasonable way to test the internal friction of the server. I've been doing a lot of tests like that and found a lot of inefficiencies in HBase that way.For cases where we return all Cells back to a (remote) client improving the server by 10 or 20% would mostly go unnoticed.

Analytics (aggregates via Phoenix of direct coprocessors) will be more important going forward, so improving that part is important.
I completely agree that end-to-end (by which I mean data shipped to the client) testing is important, it's just I'd expect us to work on different areas (put Protobufs on a diet, have a streaming protocol, etc).
-- Lars

     From: Andrew Purtell <an...@gmail.com>
 To: "dev@hbase.apache.org" <de...@hbase.apache.org> 
 Sent: Saturday, July 18, 2015 11:24 AM
 Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
   
That's not a realistic or useful test scenario, unless the goal is to accelerate queries where all cells are filtered at the server. 





> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
> 
> No Andy. 11425 having doc attached to it. At the end of it, we have added
> perf numbers in a cluster testing.  This was done using PE get and scan
> tests with filtering all cells at server (to not consider n/w bandwidth
> constraints)
> 
> -Anoop-
> 
> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> We have some microbenchmarks, not evidence of differences seen from a
>> client application. I'm not saying that microbenchmarks are not totally
>> necessary and a great start - they are - but that they don't measure an end
>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
>> design doc that states a clear end goal metric like the strawman I threw
>> together in my previous mail. A measurable system level goal and some data
>> from full cluster testing would go a lot further toward letting all of us
>> evaluate the potential and payoff of the work. In the meantime we should
>> probably be assembling these changes on a branch instead of in trunk, for
>> as long as the goal is not clearly defined and the payoff and potential for
>> perf regressions is untested and unknown.
>> 
>> 
>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
>>> 
>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>> some
>>> perf gain numbers..  We will be doing more tests in next 2 weeks (before
>>> end of this month) and will publish them.  Yes it will be great if it is
>>> more IST friendly time :-)
>>> 
>>> -Anoop-
>>> 
>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy).
>>>> 
>>>> I can vouch for this (smile)
>>>> 
>>>> I also can offer support for off heaping there. At the same time we do
>>>> have a gap where we can't point to a timeline of improvements (yet,
>> anyway)
>>>> with benchmarks showing gains where your goals need them. For example,
>>>> stock HBase in one JVM can address max N GB for response time
>> distribution
>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>> distribution D', where N' > N and D > D' (distribution D' statistically
>>>> shows better/lower response times).
>>>> 
>>>> 
>>>> 
>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>> I'm in favor of anything that improves performance (and preferably
>>>> doesn't set us back into a world that's worse than C due to the lack of
>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>> perhaps
>>>> asking for more numbers and justification in weighing the pros and cons.
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>> 
>>>>> Perhaps we can do it a bit later in the evening so there is a fighting
>>>> chance that folks on IST can participate. I know that some of our folks
>> on
>>>> IST would love to participate in the backup discussion).
>>>>> 
>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>>>> an approx. number of folks.
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>>    From: ramkrishna vasudevan <ra...@gmail.com>
>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>> larsh@apache.org>
>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> Hi
>>>>> What time will it be on August 26th?
>>>>> @LarsYa. I know that you are not generally in favour of this offheaping
>>>> stuff.  May be if we (from India) can attend this meeting remotely your
>>>> thoughts can be discussed and also the current state of this work.
>>>>> RegardsRam
>>>>> 
>>>>> 
>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>> wrote:
>>>>> 
>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>> complicated as we wanted fast per-tenant restores, so data is "grouped"
>> by
>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>> wrote most of the code will be in town, I'll check).
>>>>> 
>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>> usually do not like what I think about the offheap efforts :) ).
>>>>> Would like to add the following topics:
>>>>> 
>>>>> 
>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>> 
>>>>> 
>>>>> - "Replication". We found that replication cannot keep up with high
>>>> write loads, due to the fact that replicated is strictly single threaded
>>>> per regionserver (even though we have multiple region servers on the
>> sink
>>>> side)
>>>>> 
>>>>> 
>>>>> - "Spark integration" (Ted Malaska?)
>>>>> 
>>>>> 
>>>>> OK... Out now to make a "bullshit hat".
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> ________________________________
>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>> To: dev <de...@hbase.apache.org>
>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> 
>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>> 
>>>>> --
>>>>> Sean
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>> wrote:
>>>>>> 
>>>>>> I can be up in your area in August.
>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>> 
>>>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>>>> possible.
>>>>>>>> 
>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>> (Mikhail
>>>>>> on
>>>>>>> the 20th).
>>>>>>> St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Enis
>>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>> 
>>>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>>>> There
>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>>>> would
>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>>>> dormant
>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>> doc
>>>>>>>> that
>>>>>>>>> need socializing.
>>>>>>>>> 
>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>> 
>>>>>>>>> Topics we'd go over could include:
>>>>>>>>> 
>>>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>>>> + Current state of the offheaping of read path and alternate
>> KeyValue
>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>> + Time (Enis)
>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>> + hbase-2.0.0
>>>>>>>>> 
>>>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>>>> take
>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>> discussion
>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>> thought/design/implementation.
>>>>>>>>> 
>>>>>>>>> What do others think?
>>>>>>>>> 
>>>>>>>>> What date would suit folks?
>>>>>>>>> 
>>>>>>>>> Anyone want to host?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Matteo and St.Ack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>>  - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Anoop John <an...@gmail.com>.

We will be doing some more large data tests in coming week Andy..   Will
report back more.  Also will do a write up , in what all ways the work
might help us.  As Sean said, we will continue in another thread if any
thing further..  Will soon write back on the test result.  Thanks.

-Anoop-

On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <an...@gmail.com>
wrote:

> Cool, thanks.
>
> Is a 20% latency reduction the most we can expect or do you think there is
> room for more improvement? Just curious.
>
> Is latency reduction the only goal? Anything here about supporting larger
> heaps? Is there something we can measure in that regard?
>
> Hope you see my point and there's enough here to prime a goals and metrics
> discussion at the pow wow or on the relevant JIRAs.
>
> > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > Hi Andy
> >
> > Based on our POCs done, we expect around 20% improvement in latency.  For
> > scans it will be little lesser than 20%.
> >
> > Regards
> > Ram
> >
> >
> > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> Hi Ram,
> >>
> >> Do you have any targets for what you are measuring? What are the goals
> you
> >> guys are working toward with the off heaping changes?
> >>
> >>
> >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> >>>
> >>> Thanks Vladimir.
> >>> Yeah, the reports that were attached specifically captured the 95/99th
> >>> percentile.
> >>> The reason for checking the server side perf was to specifically see
> the
> >>> improvement in the server side and also the client was sending large
> >>> results in multiple threads. So wanted to avoid the n/w interference. I
> >>> think it was a general practice that we were following.
> >>> We Wil do some more tests and get some latest readings with bigger data
> >>> sets.
> >>> Sent from mobile.
> >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com>
> >> wrote:
> >>>>
> >>>> +1
> >>>>
> >>>> Yeah, something like that, with aspirational targets for improvement
> >> from
> >>>> current releases. Then what to measure, the tests to run, and criteria
> >> for
> >>>> evaluation are clear and organized and we're able to better assess how
> >> the
> >>>> work in progress is meeting its goals (or not)
> >>>>
> >>>>
> >>>>
> >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> >>>
> >>>> wrote:
> >>>>
> >>>>>>> Umbrella jira to make sure we can have blocks cached in offheap
> >> backed
> >>>>> cache. In the entire read path, we can refer to this offheap buffer
> and
> >>>>> avoid onheap copying.
> >>>>>
> >>>>> I think, on a read path, the most important improvement we could
> >> imagine
> >>>> is
> >>>>> elimination or reducing of object creations (KVs, iterators etc).
> >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> >>>> etc.
> >>>>> If this is a part of this JIRA, then I would easily define a goal:
> >>>>> improving 95/99% latency of a read operations. Not performance, but
> >>>> latency
> >>>>> matters
> >>>>>
> >>>>> -Vlad
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >>>> andrew.purtell@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> That's not a realistic or useful test scenario, unless the goal is
> to
> >>>>>> accelerate queries where all cells are filtered at the server.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
> >>>> added
> >>>>>>> perf numbers in a cluster testing.  This was done using PE get and
> >> scan
> >>>>>>> tests with filtering all cells at server (to not consider n/w
> >> bandwidth
> >>>>>>> constraints)
> >>>>>>>
> >>>>>>> -Anoop-
> >>>>>>>
> >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >>>>>> andrew.purtell@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> We have some microbenchmarks, not evidence of differences seen
> from
> >> a
> >>>>>>>> client application. I'm not saying that microbenchmarks are not
> >>>> totally
> >>>>>>>> necessary and a great start - they are - but that they don't
> measure
> >>>> an
> >>>>>> end
> >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
> >>>> JIRA
> >>>>>> or
> >>>>>>>> design doc that states a clear end goal metric like the strawman I
> >>>> threw
> >>>>>>>> together in my previous mail. A measurable system level goal and
> >> some
> >>>>>> data
> >>>>>>>> from full cluster testing would go a lot further toward letting
> all
> >> of
> >>>>>> us
> >>>>>>>> evaluate the potential and payoff of the work. In the meantime we
> >>>> should
> >>>>>>>> probably be assembling these changes on a branch instead of in
> >> trunk,
> >>>>>> for
> >>>>>>>> as long as the goal is not clearly defined and the payoff and
> >>>> potential
> >>>>>> for
> >>>>>>>> perf regressions is untested and unknown.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> >>>> contains
> >>>>>>>> some
> >>>>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
> >>>>>> (before
> >>>>>>>>> end of this month) and will publish them.   Yes it will be great
> if
> >>>> it
> >>>>>> is
> >>>>>>>>> more IST friendly time :-)
> >>>>>>>>>
> >>>>>>>>> -Anoop-
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >>>>>>>> andrew.purtell@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> always
> >>>>>> argue
> >>>>>>>>>> both side of a discussion and to never take sides easily (drives
> >>>> some
> >>>>>>>> folks
> >>>>>>>>>> crazy).
> >>>>>>>>>>
> >>>>>>>>>> I can vouch for this (smile)
> >>>>>>>>>>
> >>>>>>>>>> I also can offer support for off heaping there. At the same time
> >> we
> >>>> do
> >>>>>>>>>> have a gap where we can't point to a timeline of improvements
> >> (yet,
> >>>>>>>> anyway)
> >>>>>>>>>> with benchmarks showing gains where your goals need them. For
> >>>> example,
> >>>>>>>>>> stock HBase in one JVM can address max N GB for response time
> >>>>>>>> distribution
> >>>>>>>>>> D; dev version of HBase in off heap branch can address max N' GB
> >> for
> >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> >>>>>> statistically
> >>>>>>>>>> shows better/lower response times).
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
> >>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I'm in favor of anything that improves performance (and
> >> preferably
> >>>>>>>>>> doesn't set us back into a world that's worse than C due to the
> >> lack
> >>>>>> of
> >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just that
> I'm
> >>>>>>>> perhaps
> >>>>>>>>>> asking for more numbers and justification in weighing the pros
> and
> >>>>>> cons.
> >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> always
> >>>>>> argue
> >>>>>>>>>> both side of a discussion and to never take sides easily (drives
> >>>> some
> >>>>>>>> folks
> >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
> >>>>>>>>>>>
> >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
> >>>>>> fighting
> >>>>>>>>>> chance that folks on IST can participate. I know that some of
> our
> >>>>>> folks
> >>>>>>>> on
> >>>>>>>>>> IST would love to participate in the backup discussion).
> >>>>>>>>>>>
> >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd
> just
> >>>>>> need
> >>>>>>>>>> an approx. number of folks.
> >>>>>>>>>>>
> >>>>>>>>>>> -- Lars
> >>>>>>>>>>>
> >>>>>>>>>>> From: ramkrishna vasudevan <ra...@gmail.com>
> >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
> >> hofhansl <
> >>>>>>>>>> larsh@apache.org>
> >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> >> near-term
> >>>>>> work
> >>>>>>>>>>>
> >>>>>>>>>>> Hi
> >>>>>>>>>>> What time will it be on August 26th?
> >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this
> >>>>>> offheaping
> >>>>>>>>>> stuff.  May be if we (from India) can attend this meeting
> remotely
> >>>>>> your
> >>>>>>>>>> thoughts can be discussed and also the current state of this
> work.
> >>>>>>>>>>> RegardsRam
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> larsh@apache.org
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
> >> 9th.
> >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
> >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is
> >>>>>> "grouped"
> >>>>>>>> by
> >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the
> folks
> >>>> who
> >>>>>>>>>> wrote most of the code will be in town, I'll check).
> >>>>>>>>>>>
> >>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
> >>>> folks
> >>>>>>>>>> usually do not like what I think about the offheap efforts :) ).
> >>>>>>>>>>> Would like to add the following topics:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
> >>>>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
> >>>> topic)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Replication". We found that replication cannot keep up with
> >> high
> >>>>>>>>>> write loads, due to the fact that replicated is strictly single
> >>>>>> threaded
> >>>>>>>>>> per regionserver (even though we have multiple region servers on
> >> the
> >>>>>>>> sink
> >>>>>>>>>> side)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> >>>>>>>>>>>
> >>>>>>>>>>> -- Lars
> >>>>>>>>>>>
> >>>>>>>>>>> ________________________________
> >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> >> near-term
> >>>>>> work
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> >> August.
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Sean
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> apurtell@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> I can be up in your area in August.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
> >>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> >>>>>> enis.soz@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next
> >> week
> >>>>>> if
> >>>>>>>>>>>>>> possible.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of
> August
> >>>>>>>> (Mikhail
> >>>>>>>>>>>> on
> >>>>>>>>>>>>> the 20th).
> >>>>>>>>>>>>> St.Ack
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Enis
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
> >>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
> >>>>>> pow-wow.
> >>>>>>>>>>>>> There
> >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below
> list)
> >>>> and
> >>>>>> it
> >>>>>>>>>>>>> would
> >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that
> have
> >>>>>> gone
> >>>>>>>>>>>>>> dormant
> >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
> >>>> google
> >>>>>>>> doc
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> need socializing.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Topics we'd go over could include:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> >>>>>> (Matteo/Stack)
> >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
> alternate
> >>>>>>>> KeyValue
> >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>>>>>>>> + Time (Enis)
> >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>>>>>>>>>> + hbase-2.0.0
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
> >>>> want
> >>>>>> to
> >>>>>>>>>>>>> take
> >>>>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest
> that
> >>>>>>>>>>>>> discussion
> >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> >>>>>>>>>>>>>>> thought/design/implementation.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What do others think?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What date would suit folks?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Anyone want to host?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Matteo and St.Ack
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Andy
> >>>>>>>>>>>>
> >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
> >>>> Piet
> >>>>>>>> Hein
> >>>>>>>>>>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

Cool, thanks. 

Is a 20% latency reduction the most we can expect or do you think there is room for more improvement? Just curious. 

Is latency reduction the only goal? Anything here about supporting larger heaps? Is there something we can measure in that regard?

Hope you see my point and there's enough here to prime a goals and metrics discussion at the pow wow or on the relevant JIRAs. 

> On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <ra...@gmail.com> wrote:
> 
> Hi Andy
> 
> Based on our POCs done, we expect around 20% improvement in latency.  For
> scans it will be little lesser than 20%.
> 
> Regards
> Ram
> 
> 
> On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> Hi Ram,
>> 
>> Do you have any targets for what you are measuring? What are the goals you
>> guys are working toward with the off heaping changes?
>> 
>> 
>>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
>>> ramkrishna.s.vasudevan@gmail.com> wrote:
>>> 
>>> Thanks Vladimir.
>>> Yeah, the reports that were attached specifically captured the 95/99th
>>> percentile.
>>> The reason for checking the server side perf was to specifically see the
>>> improvement in the server side and also the client was sending large
>>> results in multiple threads. So wanted to avoid the n/w interference. I
>>> think it was a general practice that we were following.
>>> We Wil do some more tests and get some latest readings with bigger data
>>> sets.
>>> Sent from mobile.
>>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com>
>> wrote:
>>>> 
>>>> +1
>>>> 
>>>> Yeah, something like that, with aspirational targets for improvement
>> from
>>>> current releases. Then what to measure, the tests to run, and criteria
>> for
>>>> evaluation are clear and organized and we're able to better assess how
>> the
>>>> work in progress is meeting its goals (or not)
>>>> 
>>>> 
>>>> 
>>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vladrodionov@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>>>> Umbrella jira to make sure we can have blocks cached in offheap
>> backed
>>>>> cache. In the entire read path, we can refer to this offheap buffer and
>>>>> avoid onheap copying.
>>>>> 
>>>>> I think, on a read path, the most important improvement we could
>> imagine
>>>> is
>>>>> elimination or reducing of object creations (KVs, iterators etc).
>>>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
>>>> etc.
>>>>> If this is a part of this JIRA, then I would easily define a goal:
>>>>> improving 95/99% latency of a read operations. Not performance, but
>>>> latency
>>>>> matters
>>>>> 
>>>>> -Vlad
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> That's not a realistic or useful test scenario, unless the goal is to
>>>>>> accelerate queries where all cells are filtered at the server.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
>>>> added
>>>>>>> perf numbers in a cluster testing.  This was done using PE get and
>> scan
>>>>>>> tests with filtering all cells at server (to not consider n/w
>> bandwidth
>>>>>>> constraints)
>>>>>>> 
>>>>>>> -Anoop-
>>>>>>> 
>>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>>>>>> andrew.purtell@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> We have some microbenchmarks, not evidence of differences seen from
>> a
>>>>>>>> client application. I'm not saying that microbenchmarks are not
>>>> totally
>>>>>>>> necessary and a great start - they are - but that they don't measure
>>>> an
>>>>>> end
>>>>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
>>>> JIRA
>>>>>> or
>>>>>>>> design doc that states a clear end goal metric like the strawman I
>>>> threw
>>>>>>>> together in my previous mail. A measurable system level goal and
>> some
>>>>>> data
>>>>>>>> from full cluster testing would go a lot further toward letting all
>> of
>>>>>> us
>>>>>>>> evaluate the potential and payoff of the work. In the meantime we
>>>> should
>>>>>>>> probably be assembling these changes on a branch instead of in
>> trunk,
>>>>>> for
>>>>>>>> as long as the goal is not clearly defined and the payoff and
>>>> potential
>>>>>> for
>>>>>>>> perf regressions is untested and unknown.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
>>>> contains
>>>>>>>> some
>>>>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
>>>>>> (before
>>>>>>>>> end of this month) and will publish them.   Yes it will be great if
>>>> it
>>>>>> is
>>>>>>>>> more IST friendly time :-)
>>>>>>>>> 
>>>>>>>>> -Anoop-
>>>>>>>>> 
>>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>>>>>>> andrew.purtell@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>>>>>> argue
>>>>>>>>>> both side of a discussion and to never take sides easily (drives
>>>> some
>>>>>>>> folks
>>>>>>>>>> crazy).
>>>>>>>>>> 
>>>>>>>>>> I can vouch for this (smile)
>>>>>>>>>> 
>>>>>>>>>> I also can offer support for off heaping there. At the same time
>> we
>>>> do
>>>>>>>>>> have a gap where we can't point to a timeline of improvements
>> (yet,
>>>>>>>> anyway)
>>>>>>>>>> with benchmarks showing gains where your goals need them. For
>>>> example,
>>>>>>>>>> stock HBase in one JVM can address max N GB for response time
>>>>>>>> distribution
>>>>>>>>>> D; dev version of HBase in off heap branch can address max N' GB
>> for
>>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
>>>>>> statistically
>>>>>>>>>> shows better/lower response times).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I'm in favor of anything that improves performance (and
>> preferably
>>>>>>>>>> doesn't set us back into a world that's worse than C due to the
>> lack
>>>>>> of
>>>>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>>>>>>>> perhaps
>>>>>>>>>> asking for more numbers and justification in weighing the pros and
>>>>>> cons.
>>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>>>>>> argue
>>>>>>>>>> both side of a discussion and to never take sides easily (drives
>>>> some
>>>>>>>> folks
>>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>>>>>>>> 
>>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
>>>>>> fighting
>>>>>>>>>> chance that folks on IST can participate. I know that some of our
>>>>>> folks
>>>>>>>> on
>>>>>>>>>> IST would love to participate in the backup discussion).
>>>>>>>>>>> 
>>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
>>>>>> need
>>>>>>>>>> an approx. number of folks.
>>>>>>>>>>> 
>>>>>>>>>>> -- Lars
>>>>>>>>>>> 
>>>>>>>>>>> From: ramkrishna vasudevan <ra...@gmail.com>
>>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
>> hofhansl <
>>>>>>>>>> larsh@apache.org>
>>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
>> near-term
>>>>>> work
>>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> What time will it be on August 26th?
>>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this
>>>>>> offheaping
>>>>>>>>>> stuff.  May be if we (from India) can attend this meeting remotely
>>>>>> your
>>>>>>>>>> thoughts can be discussed and also the current state of this work.
>>>>>>>>>>> RegardsRam
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <larsh@apache.org
>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
>> 9th.
>>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is
>>>>>> "grouped"
>>>>>>>> by
>>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks
>>>> who
>>>>>>>>>> wrote most of the code will be in town, I'll check).
>>>>>>>>>>> 
>>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
>>>> folks
>>>>>>>>>> usually do not like what I think about the offheap efforts :) ).
>>>>>>>>>>> Would like to add the following topics:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
>>>> topic)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Replication". We found that replication cannot keep up with
>> high
>>>>>>>>>> write loads, due to the fact that replicated is strictly single
>>>>>> threaded
>>>>>>>>>> per regionserver (even though we have multiple region servers on
>> the
>>>>>>>> sink
>>>>>>>>>> side)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> OK... Out now to make a "bullshit hat".
>>>>>>>>>>> 
>>>>>>>>>>> -- Lars
>>>>>>>>>>> 
>>>>>>>>>>> ________________________________
>>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>>>>>>>> To: dev <de...@hbase.apache.org>
>>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
>> near-term
>>>>>> work
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
>> August.
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Sean
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I can be up in your area in August.
>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
>>>>>> enis.soz@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next
>> week
>>>>>> if
>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>>>>>>>> (Mikhail
>>>>>>>>>>>> on
>>>>>>>>>>>>> the 20th).
>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Enis
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
>>>>>> pow-wow.
>>>>>>>>>>>>> There
>>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list)
>>>> and
>>>>>> it
>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
>>>>>> gone
>>>>>>>>>>>>>> dormant
>>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
>>>> google
>>>>>>>> doc
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> need socializing.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
>>>>>> (Matteo/Stack)
>>>>>>>>>>>>>>> + Current state of the offheaping of read path and alternate
>>>>>>>> KeyValue
>>>>>>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>>>>>>> + Time (Enis)
>>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
>>>> want
>>>>>> to
>>>>>>>>>>>>> take
>>>>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>>>>>>>> discussion
>>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>>>>>>>> thought/design/implementation.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What do others think?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> What date would suit folks?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Anyone want to host?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Matteo and St.Ack
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> 
>>>>>>>>>>>> - Andy
>>>>>>>>>>>> 
>>>>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
>>>> Piet
>>>>>>>> Hein
>>>>>>>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Sean Busbey <bu...@cloudera.com>.

Can y'all move discussion of the off heaping work (or perf feature dev
generally) to a new thread?

-- 
Sean
On Jul 20, 2015 6:44 AM, "ramkrishna vasudevan" <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi Andy
>
> Based on our POCs done, we expect around 20% improvement in latency.  For
> scans it will be little lesser than 20%.
>
> Regards
> Ram
>
>
> On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <andrew.purtell@gmail.com
> >
> wrote:
>
> > Hi Ram,
> >
> > Do you have any targets for what you are measuring? What are the goals
> you
> > guys are working toward with the off heaping changes?
> >
> >
> > > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > Thanks Vladimir.
> > > Yeah, the reports that were attached specifically captured the 95/99th
> > > percentile.
> > > The reason for checking the server side perf was to specifically see
> the
> > > improvement in the server side and also the client was sending large
> > > results in multiple threads. So wanted to avoid the n/w interference. I
> > > think it was a general practice that we were following.
> > > We Wil do some more tests and get some latest readings with bigger data
> > > sets.
> > > Sent from mobile.
> > >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com>
> > wrote:
> > >>
> > >> +1
> > >>
> > >> Yeah, something like that, with aspirational targets for improvement
> > from
> > >> current releases. Then what to measure, the tests to run, and criteria
> > for
> > >> evaluation are clear and organized and we're able to better assess how
> > the
> > >> work in progress is meeting its goals (or not)
> > >>
> > >>
> > >>
> > >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > >> wrote:
> > >>
> > >>>>> Umbrella jira to make sure we can have blocks cached in offheap
> > backed
> > >>> cache. In the entire read path, we can refer to this offheap buffer
> and
> > >>> avoid onheap copying.
> > >>>
> > >>> I think, on a read path, the most important improvement we could
> > imagine
> > >> is
> > >>> elimination or reducing of object creations (KVs, iterators etc).
> > >>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> > >> etc.
> > >>> If this is a part of this JIRA, then I would easily define a goal:
> > >>> improving 95/99% latency of a read operations. Not performance, but
> > >> latency
> > >>> matters
> > >>>
> > >>> -Vlad
> > >>>
> > >>>
> > >>>
> > >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > >> andrew.purtell@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> That's not a realistic or useful test scenario, unless the goal is
> to
> > >>>> accelerate queries where all cells are filtered at the server.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
> > >> added
> > >>>>> perf numbers in a cluster testing.  This was done using PE get and
> > scan
> > >>>>> tests with filtering all cells at server (to not consider n/w
> > bandwidth
> > >>>>> constraints)
> > >>>>>
> > >>>>> -Anoop-
> > >>>>>
> > >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > >>>> andrew.purtell@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> We have some microbenchmarks, not evidence of differences seen
> from
> > a
> > >>>>>> client application. I'm not saying that microbenchmarks are not
> > >> totally
> > >>>>>> necessary and a great start - they are - but that they don't
> measure
> > >> an
> > >>>> end
> > >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
> > >> JIRA
> > >>>> or
> > >>>>>> design doc that states a clear end goal metric like the strawman I
> > >> threw
> > >>>>>> together in my previous mail. A measurable system level goal and
> > some
> > >>>> data
> > >>>>>> from full cluster testing would go a lot further toward letting
> all
> > of
> > >>>> us
> > >>>>>> evaluate the potential and payoff of the work. In the meantime we
> > >> should
> > >>>>>> probably be assembling these changes on a branch instead of in
> > trunk,
> > >>>> for
> > >>>>>> as long as the goal is not clearly defined and the payoff and
> > >> potential
> > >>>> for
> > >>>>>> perf regressions is untested and unknown.
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
> > >> wrote:
> > >>>>>>>
> > >>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> > >> contains
> > >>>>>> some
> > >>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
> > >>>> (before
> > >>>>>>> end of this month) and will publish them.   Yes it will be great
> if
> > >> it
> > >>>> is
> > >>>>>>> more IST friendly time :-)
> > >>>>>>>
> > >>>>>>> -Anoop-
> > >>>>>>>
> > >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > >>>>>> andrew.purtell@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> always
> > >>>> argue
> > >>>>>>>> both side of a discussion and to never take sides easily (drives
> > >> some
> > >>>>>> folks
> > >>>>>>>> crazy).
> > >>>>>>>>
> > >>>>>>>> I can vouch for this (smile)
> > >>>>>>>>
> > >>>>>>>> I also can offer support for off heaping there. At the same time
> > we
> > >> do
> > >>>>>>>> have a gap where we can't point to a timeline of improvements
> > (yet,
> > >>>>>> anyway)
> > >>>>>>>> with benchmarks showing gains where your goals need them. For
> > >> example,
> > >>>>>>>> stock HBase in one JVM can address max N GB for response time
> > >>>>>> distribution
> > >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB
> > for
> > >>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> > >>>> statistically
> > >>>>>>>> shows better/lower response times).
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
> > >> wrote:
> > >>>>>>>>>
> > >>>>>>>>> I'm in favor of anything that improves performance (and
> > preferably
> > >>>>>>>> doesn't set us back into a world that's worse than C due to the
> > lack
> > >>>> of
> > >>>>>>>> pointers in Java).Never said "I don't like it", it's just that
> I'm
> > >>>>>> perhaps
> > >>>>>>>> asking for more numbers and justification in weighing the pros
> and
> > >>>> cons.
> > >>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> always
> > >>>> argue
> > >>>>>>>> both side of a discussion and to never take sides easily (drives
> > >> some
> > >>>>>> folks
> > >>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
> > >>>>>>>>>
> > >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
> > >>>> fighting
> > >>>>>>>> chance that folks on IST can participate. I know that some of
> our
> > >>>> folks
> > >>>>>> on
> > >>>>>>>> IST would love to participate in the backup discussion).
> > >>>>>>>>>
> > >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd
> just
> > >>>> need
> > >>>>>>>> an approx. number of folks.
> > >>>>>>>>>
> > >>>>>>>>> -- Lars
> > >>>>>>>>>
> > >>>>>>>>>  From: ramkrishna vasudevan <ra...@gmail.com>
> > >>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
> > hofhansl <
> > >>>>>>>> larsh@apache.org>
> > >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > near-term
> > >>>> work
> > >>>>>>>>>
> > >>>>>>>>> Hi
> > >>>>>>>>> What time will it be on August 26th?
> > >>>>>>>>> @LarsYa. I know that you are not generally in favour of this
> > >>>> offheaping
> > >>>>>>>> stuff.  May be if we (from India) can attend this meeting
> remotely
> > >>>> your
> > >>>>>>>> thoughts can be discussed and also the current state of this
> work.
> > >>>>>>>>> RegardsRam
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> larsh@apache.org
> > >
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
> > 9th.
> > >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
> > >>>>>>>> complicated as we wanted fast per-tenant restores, so data is
> > >>>> "grouped"
> > >>>>>> by
> > >>>>>>>> tenant. Would like to sync up on that (hopefully some of the
> folks
> > >> who
> > >>>>>>>> wrote most of the code will be in town, I'll check).
> > >>>>>>>>>
> > >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
> > >> folks
> > >>>>>>>> usually do not like what I think about the offheap efforts :) ).
> > >>>>>>>>> Would like to add the following topics:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
> > >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
> > >> topic)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Replication". We found that replication cannot keep up with
> > high
> > >>>>>>>> write loads, due to the fact that replicated is strictly single
> > >>>> threaded
> > >>>>>>>> per regionserver (even though we have multiple region servers on
> > the
> > >>>>>> sink
> > >>>>>>>> side)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> - "Spark integration" (Ted Malaska?)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> OK... Out now to make a "bullshit hat".
> > >>>>>>>>>
> > >>>>>>>>> -- Lars
> > >>>>>>>>>
> > >>>>>>>>> ________________________________
> > >>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> > >>>>>>>>> To: dev <de...@hbase.apache.org>
> > >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > near-term
> > >>>> work
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> > August.
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Sean
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> apurtell@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> I can be up in your area in August.
> > >>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
> > >> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> > >>>> enis.soz@gmail.com>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next
> > week
> > >>>> if
> > >>>>>>>>>>>> possible.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of
> August
> > >>>>>> (Mikhail
> > >>>>>>>>>> on
> > >>>>>>>>>>> the 20th).
> > >>>>>>>>>>> St.Ack
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Enis
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
> > >> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
> > >>>> pow-wow.
> > >>>>>>>>>>> There
> > >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below
> list)
> > >> and
> > >>>> it
> > >>>>>>>>>>> would
> > >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that
> have
> > >>>> gone
> > >>>>>>>>>>>> dormant
> > >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
> > >> google
> > >>>>>> doc
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>> need socializing.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Topics we'd go over could include:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> > >>>> (Matteo/Stack)
> > >>>>>>>>>>>>> + Current state of the offheaping of read path and
> alternate
> > >>>>>> KeyValue
> > >>>>>>>>>>>>> implementation (Anoop/Ram)
> > >>>>>>>>>>>>> + Append rejigger (Elliott)
> > >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > >>>>>>>>>>>>> + Splitting meta/1M regions
> > >>>>>>>>>>>>> + The revived Backup (Vladimir)
> > >>>>>>>>>>>>> + Time (Enis)
> > >>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > >>>>>>>>>>>>> + hbase-2.0.0
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
> > >> want
> > >>>> to
> > >>>>>>>>>>> take
> > >>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest
> that
> > >>>>>>>>>>> discussion
> > >>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > >>>>>>>>>>>>> thought/design/implementation.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What do others think?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What date would suit folks?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Anyone want to host?
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>> Matteo and St.Ack
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Best regards,
> > >>>>>>>>>>
> > >>>>>>>>>> - Andy
> > >>>>>>>>>>
> > >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
> > >> Piet
> > >>>>>> Hein
> > >>>>>>>>>> (via Tom White)
> > >>
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Hi Andy

Based on our POCs done, we expect around 20% improvement in latency.  For
scans it will be little lesser than 20%.

Regards
Ram


On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <an...@gmail.com>
wrote:

> Hi Ram,
>
> Do you have any targets for what you are measuring? What are the goals you
> guys are working toward with the off heaping changes?
>
>
> > On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > Thanks Vladimir.
> > Yeah, the reports that were attached specifically captured the 95/99th
> > percentile.
> > The reason for checking the server side perf was to specifically see the
> > improvement in the server side and also the client was sending large
> > results in multiple threads. So wanted to avoid the n/w interference. I
> > think it was a general practice that we were following.
> > We Wil do some more tests and get some latest readings with bigger data
> > sets.
> > Sent from mobile.
> >> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com>
> wrote:
> >>
> >> +1
> >>
> >> Yeah, something like that, with aspirational targets for improvement
> from
> >> current releases. Then what to measure, the tests to run, and criteria
> for
> >> evaluation are clear and organized and we're able to better assess how
> the
> >> work in progress is meeting its goals (or not)
> >>
> >>
> >>
> >> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vladrodionov@gmail.com
> >
> >> wrote:
> >>
> >>>>> Umbrella jira to make sure we can have blocks cached in offheap
> backed
> >>> cache. In the entire read path, we can refer to this offheap buffer and
> >>> avoid onheap copying.
> >>>
> >>> I think, on a read path, the most important improvement we could
> imagine
> >> is
> >>> elimination or reducing of object creations (KVs, iterators etc).
> >>> object reuse, byte buffers reuse or offheap buffers reuse, API change
> >> etc.
> >>> If this is a part of this JIRA, then I would easily define a goal:
> >>> improving 95/99% latency of a read operations. Not performance, but
> >> latency
> >>> matters
> >>>
> >>> -Vlad
> >>>
> >>>
> >>>
> >>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >> andrew.purtell@gmail.com>
> >>> wrote:
> >>>
> >>>> That's not a realistic or useful test scenario, unless the goal is to
> >>>> accelerate queries where all cells are filtered at the server.
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
> >> wrote:
> >>>>>
> >>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
> >> added
> >>>>> perf numbers in a cluster testing.  This was done using PE get and
> scan
> >>>>> tests with filtering all cells at server (to not consider n/w
> bandwidth
> >>>>> constraints)
> >>>>>
> >>>>> -Anoop-
> >>>>>
> >>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >>>> andrew.purtell@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> We have some microbenchmarks, not evidence of differences seen from
> a
> >>>>>> client application. I'm not saying that microbenchmarks are not
> >> totally
> >>>>>> necessary and a great start - they are - but that they don't measure
> >> an
> >>>> end
> >>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
> >> JIRA
> >>>> or
> >>>>>> design doc that states a clear end goal metric like the strawman I
> >> threw
> >>>>>> together in my previous mail. A measurable system level goal and
> some
> >>>> data
> >>>>>> from full cluster testing would go a lot further toward letting all
> of
> >>>> us
> >>>>>> evaluate the potential and payoff of the work. In the meantime we
> >> should
> >>>>>> probably be assembling these changes on a branch instead of in
> trunk,
> >>>> for
> >>>>>> as long as the goal is not clearly defined and the payoff and
> >> potential
> >>>> for
> >>>>>> perf regressions is untested and unknown.
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> >> contains
> >>>>>> some
> >>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
> >>>> (before
> >>>>>>> end of this month) and will publish them.   Yes it will be great if
> >> it
> >>>> is
> >>>>>>> more IST friendly time :-)
> >>>>>>>
> >>>>>>> -Anoop-
> >>>>>>>
> >>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >>>>>> andrew.purtell@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >>>> argue
> >>>>>>>> both side of a discussion and to never take sides easily (drives
> >> some
> >>>>>> folks
> >>>>>>>> crazy).
> >>>>>>>>
> >>>>>>>> I can vouch for this (smile)
> >>>>>>>>
> >>>>>>>> I also can offer support for off heaping there. At the same time
> we
> >> do
> >>>>>>>> have a gap where we can't point to a timeline of improvements
> (yet,
> >>>>>> anyway)
> >>>>>>>> with benchmarks showing gains where your goals need them. For
> >> example,
> >>>>>>>> stock HBase in one JVM can address max N GB for response time
> >>>>>> distribution
> >>>>>>>> D; dev version of HBase in off heap branch can address max N' GB
> for
> >>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> >>>> statistically
> >>>>>>>> shows better/lower response times).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
> >> wrote:
> >>>>>>>>>
> >>>>>>>>> I'm in favor of anything that improves performance (and
> preferably
> >>>>>>>> doesn't set us back into a world that's worse than C due to the
> lack
> >>>> of
> >>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
> >>>>>> perhaps
> >>>>>>>> asking for more numbers and justification in weighing the pros and
> >>>> cons.
> >>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >>>> argue
> >>>>>>>> both side of a discussion and to never take sides easily (drives
> >> some
> >>>>>> folks
> >>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
> >>>>>>>>>
> >>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
> >>>> fighting
> >>>>>>>> chance that folks on IST can participate. I know that some of our
> >>>> folks
> >>>>>> on
> >>>>>>>> IST would love to participate in the backup discussion).
> >>>>>>>>>
> >>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
> >>>> need
> >>>>>>>> an approx. number of folks.
> >>>>>>>>>
> >>>>>>>>> -- Lars
> >>>>>>>>>
> >>>>>>>>>  From: ramkrishna vasudevan <ra...@gmail.com>
> >>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
> hofhansl <
> >>>>>>>> larsh@apache.org>
> >>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> near-term
> >>>> work
> >>>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>> What time will it be on August 26th?
> >>>>>>>>> @LarsYa. I know that you are not generally in favour of this
> >>>> offheaping
> >>>>>>>> stuff.  May be if we (from India) can attend this meeting remotely
> >>>> your
> >>>>>>>> thoughts can be discussed and also the current state of this work.
> >>>>>>>>> RegardsRam
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <larsh@apache.org
> >
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Works for me. I'll be back in the Bay Area the week of August
> 9th.
> >>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
> >>>>>>>> complicated as we wanted fast per-tenant restores, so data is
> >>>> "grouped"
> >>>>>> by
> >>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks
> >> who
> >>>>>>>> wrote most of the code will be in town, I'll check).
> >>>>>>>>>
> >>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
> >> folks
> >>>>>>>> usually do not like what I think about the offheap efforts :) ).
> >>>>>>>>> Would like to add the following topics:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
> >>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
> >> topic)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Replication". We found that replication cannot keep up with
> high
> >>>>>>>> write loads, due to the fact that replicated is strictly single
> >>>> threaded
> >>>>>>>> per regionserver (even though we have multiple region servers on
> the
> >>>>>> sink
> >>>>>>>> side)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> - "Spark integration" (Ted Malaska?)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> OK... Out now to make a "bullshit hat".
> >>>>>>>>>
> >>>>>>>>> -- Lars
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> >>>>>>>>> To: dev <de...@hbase.apache.org>
> >>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> near-term
> >>>> work
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> August.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Sean
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> >>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I can be up in your area in August.
> >>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
> >> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> >>>> enis.soz@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next
> week
> >>>> if
> >>>>>>>>>>>> possible.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
> >>>>>> (Mikhail
> >>>>>>>>>> on
> >>>>>>>>>>> the 20th).
> >>>>>>>>>>> St.Ack
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> Enis
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
> >> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
> >>>> pow-wow.
> >>>>>>>>>>> There
> >>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list)
> >> and
> >>>> it
> >>>>>>>>>>> would
> >>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
> >>>> gone
> >>>>>>>>>>>> dormant
> >>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
> >> google
> >>>>>> doc
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> need socializing.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Topics we'd go over could include:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> >>>> (Matteo/Stack)
> >>>>>>>>>>>>> + Current state of the offheaping of read path and alternate
> >>>>>> KeyValue
> >>>>>>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>>>>>> + Time (Enis)
> >>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>>>>>>>> + hbase-2.0.0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
> >> want
> >>>> to
> >>>>>>>>>>> take
> >>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
> >>>>>>>>>>> discussion
> >>>>>>>>>>>>> lead off with a 5-10minute on current state of
> >>>>>>>>>>>>> thought/design/implementation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What do others think?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What date would suit folks?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Anyone want to host?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Matteo and St.Ack
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best regards,
> >>>>>>>>>>
> >>>>>>>>>> - Andy
> >>>>>>>>>>
> >>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
> >> Piet
> >>>>>> Hein
> >>>>>>>>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

Hi Ram,

Do you have any targets for what you are measuring? What are the goals you guys are working toward with the off heaping changes? 


> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <ra...@gmail.com> wrote:
> 
> Thanks Vladimir.
> Yeah, the reports that were attached specifically captured the 95/99th
> percentile.
> The reason for checking the server side perf was to specifically see the
> improvement in the server side and also the client was sending large
> results in multiple threads. So wanted to avoid the n/w interference. I
> think it was a general practice that we were following.
> We Wil do some more tests and get some latest readings with bigger data
> sets.
> Sent from mobile.
>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com> wrote:
>> 
>> +1
>> 
>> Yeah, something like that, with aspirational targets for improvement from
>> current releases. Then what to measure, the tests to run, and criteria for
>> evaluation are clear and organized and we're able to better assess how the
>> work in progress is meeting its goals (or not)
>> 
>> 
>> 
>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vl...@gmail.com>
>> wrote:
>> 
>>>>> Umbrella jira to make sure we can have blocks cached in offheap backed
>>> cache. In the entire read path, we can refer to this offheap buffer and
>>> avoid onheap copying.
>>> 
>>> I think, on a read path, the most important improvement we could imagine
>> is
>>> elimination or reducing of object creations (KVs, iterators etc).
>>> object reuse, byte buffers reuse or offheap buffers reuse, API change
>> etc.
>>> If this is a part of this JIRA, then I would easily define a goal:
>>> improving 95/99% latency of a read operations. Not performance, but
>> latency
>>> matters
>>> 
>>> -Vlad
>>> 
>>> 
>>> 
>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>> That's not a realistic or useful test scenario, unless the goal is to
>>>> accelerate queries where all cells are filtered at the server.
>>>> 
>>>> 
>>>> 
>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
>> wrote:
>>>>> 
>>>>> No Andy. 11425 having doc attached to it. At the end of it, we have
>> added
>>>>> perf numbers in a cluster testing.  This was done using PE get and scan
>>>>> tests with filtering all cells at server (to not consider n/w bandwidth
>>>>> constraints)
>>>>> 
>>>>> -Anoop-
>>>>> 
>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> We have some microbenchmarks, not evidence of differences seen from a
>>>>>> client application. I'm not saying that microbenchmarks are not
>> totally
>>>>>> necessary and a great start - they are - but that they don't measure
>> an
>>>> end
>>>>>> goal. Furthermore unless I've missed one somewhere we don't have a
>> JIRA
>>>> or
>>>>>> design doc that states a clear end goal metric like the strawman I
>> threw
>>>>>> together in my previous mail. A measurable system level goal and some
>>>> data
>>>>>> from full cluster testing would go a lot further toward letting all of
>>>> us
>>>>>> evaluate the potential and payoff of the work. In the meantime we
>> should
>>>>>> probably be assembling these changes on a branch instead of in trunk,
>>>> for
>>>>>> as long as the goal is not clearly defined and the payoff and
>> potential
>>>> for
>>>>>> perf regressions is untested and unknown.
>>>>>> 
>>>>>> 
>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
>> contains
>>>>>> some
>>>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
>>>> (before
>>>>>>> end of this month) and will publish them.   Yes it will be great if
>> it
>>>> is
>>>>>>> more IST friendly time :-)
>>>>>>> 
>>>>>>> -Anoop-
>>>>>>> 
>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>>>>> andrew.purtell@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>>>> argue
>>>>>>>> both side of a discussion and to never take sides easily (drives
>> some
>>>>>> folks
>>>>>>>> crazy).
>>>>>>>> 
>>>>>>>> I can vouch for this (smile)
>>>>>>>> 
>>>>>>>> I also can offer support for off heaping there. At the same time we
>> do
>>>>>>>> have a gap where we can't point to a timeline of improvements (yet,
>>>>>> anyway)
>>>>>>>> with benchmarks showing gains where your goals need them. For
>> example,
>>>>>>>> stock HBase in one JVM can address max N GB for response time
>>>>>> distribution
>>>>>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
>>>> statistically
>>>>>>>> shows better/lower response times).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm in favor of anything that improves performance (and preferably
>>>>>>>> doesn't set us back into a world that's worse than C due to the lack
>>>> of
>>>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>>>>>> perhaps
>>>>>>>> asking for more numbers and justification in weighing the pros and
>>>> cons.
>>>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>>>> argue
>>>>>>>> both side of a discussion and to never take sides easily (drives
>> some
>>>>>> folks
>>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>>>>>> 
>>>>>>>>> Perhaps we can do it a bit later in the evening so there is a
>>>> fighting
>>>>>>>> chance that folks on IST can participate. I know that some of our
>>>> folks
>>>>>> on
>>>>>>>> IST would love to participate in the backup discussion).
>>>>>>>>> 
>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
>>>> need
>>>>>>>> an approx. number of folks.
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>>  From: ramkrishna vasudevan <ra...@gmail.com>
>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>>>>>> larsh@apache.org>
>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
>>>> work
>>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> What time will it be on August 26th?
>>>>>>>>> @LarsYa. I know that you are not generally in favour of this
>>>> offheaping
>>>>>>>> stuff.  May be if we (from India) can attend this meeting remotely
>>>> your
>>>>>>>> thoughts can be discussed and also the current state of this work.
>>>>>>>>> RegardsRam
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>>>>>> complicated as we wanted fast per-tenant restores, so data is
>>>> "grouped"
>>>>>> by
>>>>>>>> tenant. Would like to sync up on that (hopefully some of the folks
>> who
>>>>>>>> wrote most of the code will be in town, I'll check).
>>>>>>>>> 
>>>>>>>>> Also interested in the "Time" and "offheap" parts (although you
>> folks
>>>>>>>> usually do not like what I think about the offheap efforts :) ).
>>>>>>>>> Would like to add the following topics:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>>>>>> timestamps (happy to cover that, unless it's part of the "Time"
>> topic)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Replication". We found that replication cannot keep up with high
>>>>>>>> write loads, due to the fact that replicated is strictly single
>>>> threaded
>>>>>>>> per regionserver (even though we have multiple region servers on the
>>>>>> sink
>>>>>>>> side)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> OK... Out now to make a "bullshit hat".
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>>>>>> To: dev <de...@hbase.apache.org>
>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
>>>> work
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sean
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I can be up in your area in August.
>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
>>>> enis.soz@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>>>>>> 
>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week
>>>> if
>>>>>>>>>>>> possible.
>>>>>>>>>>>> 
>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>>>>>> (Mikhail
>>>>>>>>>> on
>>>>>>>>>>> the 20th).
>>>>>>>>>>> St.Ack
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Enis
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
>>>> pow-wow.
>>>>>>>>>>> There
>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list)
>> and
>>>> it
>>>>>>>>>>> would
>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
>>>> gone
>>>>>>>>>>>> dormant
>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
>> google
>>>>>> doc
>>>>>>>>>>>> that
>>>>>>>>>>>>> need socializing.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
>>>> (Matteo/Stack)
>>>>>>>>>>>>> + Current state of the offheaping of read path and alternate
>>>>>> KeyValue
>>>>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>>>>> + Time (Enis)
>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If you
>> want
>>>> to
>>>>>>>>>>> take
>>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>>>>>> discussion
>>>>>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>>>>>> thought/design/implementation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What do others think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> What date would suit folks?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Anyone want to host?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Matteo and St.Ack
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> 
>>>>>>>>>> - Andy
>>>>>>>>>> 
>>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
>> Piet
>>>>>> Hein
>>>>>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Thanks Vladimir.
Yeah, the reports that were attached specifically captured the 95/99th
percentile.
The reason for checking the server side perf was to specifically see the
improvement in the server side and also the client was sending large
results in multiple threads. So wanted to avoid the n/w interference. I
think it was a general practice that we were following.
We Wil do some more tests and get some latest readings with bigger data
sets.
Sent from mobile.
On Jul 19, 2015 1:05 AM, "Andrew Purtell" <an...@gmail.com> wrote:

> +1
>
> Yeah, something like that, with aspirational targets for improvement from
> current releases. Then what to measure, the tests to run, and criteria for
> evaluation are clear and organized and we're able to better assess how the
> work in progress is meeting its goals (or not)
>
>
>
> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vl...@gmail.com>
> wrote:
>
> >>> Umbrella jira to make sure we can have blocks cached in offheap backed
> > cache. In the entire read path, we can refer to this offheap buffer and
> > avoid onheap copying.
> >
> > I think, on a read path, the most important improvement we could imagine
> is
> > elimination or reducing of object creations (KVs, iterators etc).
> > object reuse, byte buffers reuse or offheap buffers reuse, API change
> etc.
> > If this is a part of this JIRA, then I would easily define a goal:
> > improving 95/99% latency of a read operations. Not performance, but
> latency
> > matters
> >
> > -Vlad
> >
> >
> >
> > On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> That's not a realistic or useful test scenario, unless the goal is to
> >> accelerate queries where all cells are filtered at the server.
> >>
> >>
> >>
> >>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com>
> wrote:
> >>>
> >>> No Andy. 11425 having doc attached to it. At the end of it, we have
> added
> >>> perf numbers in a cluster testing.  This was done using PE get and scan
> >>> tests with filtering all cells at server (to not consider n/w bandwidth
> >>> constraints)
> >>>
> >>> -Anoop-
> >>>
> >>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >> andrew.purtell@gmail.com>
> >>> wrote:
> >>>
> >>>> We have some microbenchmarks, not evidence of differences seen from a
> >>>> client application. I'm not saying that microbenchmarks are not
> totally
> >>>> necessary and a great start - they are - but that they don't measure
> an
> >> end
> >>>> goal. Furthermore unless I've missed one somewhere we don't have a
> JIRA
> >> or
> >>>> design doc that states a clear end goal metric like the strawman I
> threw
> >>>> together in my previous mail. A measurable system level goal and some
> >> data
> >>>> from full cluster testing would go a lot further toward letting all of
> >> us
> >>>> evaluate the potential and payoff of the work. In the meantime we
> should
> >>>> probably be assembling these changes on a branch instead of in trunk,
> >> for
> >>>> as long as the goal is not clearly defined and the payoff and
> potential
> >> for
> >>>> perf regressions is untested and unknown.
> >>>>
> >>>>
> >>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com>
> wrote:
> >>>>>
> >>>>> Thanks Andy and Lars.  The parent jira has doc attached which
> contains
> >>>> some
> >>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
> >> (before
> >>>>> end of this month) and will publish them.   Yes it will be great if
> it
> >> is
> >>>>> more IST friendly time :-)
> >>>>>
> >>>>> -Anoop-
> >>>>>
> >>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >>>> andrew.purtell@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >> argue
> >>>>>> both side of a discussion and to never take sides easily (drives
> some
> >>>> folks
> >>>>>> crazy).
> >>>>>>
> >>>>>> I can vouch for this (smile)
> >>>>>>
> >>>>>> I also can offer support for off heaping there. At the same time we
> do
> >>>>>> have a gap where we can't point to a timeline of improvements (yet,
> >>>> anyway)
> >>>>>> with benchmarks showing gains where your goals need them. For
> example,
> >>>>>> stock HBase in one JVM can address max N GB for response time
> >>>> distribution
> >>>>>> D; dev version of HBase in off heap branch can address max N' GB for
> >>>>>> distribution D', where N' > N and D > D' (distribution D'
> >> statistically
> >>>>>> shows better/lower response times).
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org>
> wrote:
> >>>>>>>
> >>>>>>> I'm in favor of anything that improves performance (and preferably
> >>>>>> doesn't set us back into a world that's worse than C due to the lack
> >> of
> >>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
> >>>> perhaps
> >>>>>> asking for more numbers and justification in weighing the pros and
> >> cons.
> >>>>>>> I can represent your side Ram (and Anoop). I've been known always
> >> argue
> >>>>>> both side of a discussion and to never take sides easily (drives
> some
> >>>> folks
> >>>>>> crazy). And Stack's there too, he yell at me where needed :)
> >>>>>>>
> >>>>>>> Perhaps we can do it a bit later in the evening so there is a
> >> fighting
> >>>>>> chance that folks on IST can participate. I know that some of our
> >> folks
> >>>> on
> >>>>>> IST would love to participate in the backup discussion).
> >>>>>>>
> >>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
> >> need
> >>>>>> an approx. number of folks.
> >>>>>>>
> >>>>>>> -- Lars
> >>>>>>>
> >>>>>>>   From: ramkrishna vasudevan <ra...@gmail.com>
> >>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> >>>>>> larsh@apache.org>
> >>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> >> work
> >>>>>>>
> >>>>>>> Hi
> >>>>>>> What time will it be on August 26th?
> >>>>>>> @LarsYa. I know that you are not generally in favour of this
> >> offheaping
> >>>>>> stuff.  May be if we (from India) can attend this meeting remotely
> >> your
> >>>>>> thoughts can be discussed and also the current state of this work.
> >>>>>>> RegardsRam
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
> >>>>>>> We have done a _lot_ of work on backups as well - ours are more
> >>>>>> complicated as we wanted fast per-tenant restores, so data is
> >> "grouped"
> >>>> by
> >>>>>> tenant. Would like to sync up on that (hopefully some of the folks
> who
> >>>>>> wrote most of the code will be in town, I'll check).
> >>>>>>>
> >>>>>>> Also interested in the "Time" and "offheap" parts (although you
> folks
> >>>>>> usually do not like what I think about the offheap efforts :) ).
> >>>>>>> Would like to add the following topics:
> >>>>>>>
> >>>>>>>
> >>>>>>> - "Timestamp Resolution". Or making space for more bits in the
> >>>>>> timestamps (happy to cover that, unless it's part of the "Time"
> topic)
> >>>>>>>
> >>>>>>>
> >>>>>>> - "Replication". We found that replication cannot keep up with high
> >>>>>> write loads, due to the fact that replicated is strictly single
> >> threaded
> >>>>>> per regionserver (even though we have multiple region servers on the
> >>>> sink
> >>>>>> side)
> >>>>>>>
> >>>>>>>
> >>>>>>> - "Spark integration" (Ted Malaska?)
> >>>>>>>
> >>>>>>>
> >>>>>>> OK... Out now to make a "bullshit hat".
> >>>>>>>
> >>>>>>> -- Lars
> >>>>>>>
> >>>>>>> ________________________________
> >>>>>>> From: Sean Busbey <bu...@cloudera.com>
> >>>>>>> To: dev <de...@hbase.apache.org>
> >>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> >> work
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm planning to be in the Bay area the week of the 24th of August.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Sean
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> I can be up in your area in August.
> >>>>>>>>
> >>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net>
> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> >> enis.soz@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>>>>>
> >>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week
> >> if
> >>>>>>>>>> possible.
> >>>>>>>>>>
> >>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
> >>>> (Mikhail
> >>>>>>>> on
> >>>>>>>>> the 20th).
> >>>>>>>>> St.Ack
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Enis
> >>>>>>>>>>
> >>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Matteo and I were thinking it time devs got together for a
> >> pow-wow.
> >>>>>>>>> There
> >>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list)
> and
> >> it
> >>>>>>>>> would
> >>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
> >> gone
> >>>>>>>>>> dormant
> >>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached
> google
> >>>> doc
> >>>>>>>>>> that
> >>>>>>>>>>> need socializing.
> >>>>>>>>>>>
> >>>>>>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>>>>>
> >>>>>>>>>>> Topics we'd go over could include:
> >>>>>>>>>>>
> >>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> >> (Matteo/Stack)
> >>>>>>>>>>> + Current state of the offheaping of read path and alternate
> >>>> KeyValue
> >>>>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>>>> + Time (Enis)
> >>>>>>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>>>>>> + hbase-2.0.0
> >>>>>>>>>>>
> >>>>>>>>>>> I put names by folks I know could talk to the topic. If you
> want
> >> to
> >>>>>>>>> take
> >>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
> >>>>>>>>> discussion
> >>>>>>>>>>> lead off with a 5-10minute on current state of
> >>>>>>>>>>> thought/design/implementation.
> >>>>>>>>>>>
> >>>>>>>>>>> What do others think?
> >>>>>>>>>>>
> >>>>>>>>>>> What date would suit folks?
> >>>>>>>>>>>
> >>>>>>>>>>> Anyone want to host?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Matteo and St.Ack
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>>
> >>>>>>>>  - Andy
> >>>>>>>>
> >>>>>>>> Problems worthy of attack prove their worth by hitting back. -
> Piet
> >>>> Hein
> >>>>>>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

+1

Yeah, something like that, with aspirational targets for improvement from current releases. Then what to measure, the tests to run, and criteria for evaluation are clear and organized and we're able to better assess how the work in progress is meeting its goals (or not) 



On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <vl...@gmail.com> wrote:

>>> Umbrella jira to make sure we can have blocks cached in offheap backed
> cache. In the entire read path, we can refer to this offheap buffer and
> avoid onheap copying.
> 
> I think, on a read path, the most important improvement we could imagine is
> elimination or reducing of object creations (KVs, iterators etc).
> object reuse, byte buffers reuse or offheap buffers reuse, API change etc.
> If this is a part of this JIRA, then I would easily define a goal:
> improving 95/99% latency of a read operations. Not performance, but latency
> matters
> 
> -Vlad
> 
> 
> 
> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> That's not a realistic or useful test scenario, unless the goal is to
>> accelerate queries where all cells are filtered at the server.
>> 
>> 
>> 
>>> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
>>> 
>>> No Andy. 11425 having doc attached to it. At the end of it, we have added
>>> perf numbers in a cluster testing.  This was done using PE get and scan
>>> tests with filtering all cells at server (to not consider n/w bandwidth
>>> constraints)
>>> 
>>> -Anoop-
>>> 
>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>> We have some microbenchmarks, not evidence of differences seen from a
>>>> client application. I'm not saying that microbenchmarks are not totally
>>>> necessary and a great start - they are - but that they don't measure an
>> end
>>>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA
>> or
>>>> design doc that states a clear end goal metric like the strawman I threw
>>>> together in my previous mail. A measurable system level goal and some
>> data
>>>> from full cluster testing would go a lot further toward letting all of
>> us
>>>> evaluate the potential and payoff of the work. In the meantime we should
>>>> probably be assembling these changes on a branch instead of in trunk,
>> for
>>>> as long as the goal is not clearly defined and the payoff and potential
>> for
>>>> perf regressions is untested and unknown.
>>>> 
>>>> 
>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
>>>>> 
>>>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>>>> some
>>>>> perf gain numbers..  We will be doing more tests in next 2 weeks
>> (before
>>>>> end of this month) and will publish them.   Yes it will be great if it
>> is
>>>>> more IST friendly time :-)
>>>>> 
>>>>> -Anoop-
>>>>> 
>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>> argue
>>>>>> both side of a discussion and to never take sides easily (drives some
>>>> folks
>>>>>> crazy).
>>>>>> 
>>>>>> I can vouch for this (smile)
>>>>>> 
>>>>>> I also can offer support for off heaping there. At the same time we do
>>>>>> have a gap where we can't point to a timeline of improvements (yet,
>>>> anyway)
>>>>>> with benchmarks showing gains where your goals need them. For example,
>>>>>> stock HBase in one JVM can address max N GB for response time
>>>> distribution
>>>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>>>> distribution D', where N' > N and D > D' (distribution D'
>> statistically
>>>>>> shows better/lower response times).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>>>>>> 
>>>>>>> I'm in favor of anything that improves performance (and preferably
>>>>>> doesn't set us back into a world that's worse than C due to the lack
>> of
>>>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>>>> perhaps
>>>>>> asking for more numbers and justification in weighing the pros and
>> cons.
>>>>>>> I can represent your side Ram (and Anoop). I've been known always
>> argue
>>>>>> both side of a discussion and to never take sides easily (drives some
>>>> folks
>>>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>>>> 
>>>>>>> Perhaps we can do it a bit later in the evening so there is a
>> fighting
>>>>>> chance that folks on IST can participate. I know that some of our
>> folks
>>>> on
>>>>>> IST would love to participate in the backup discussion).
>>>>>>> 
>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
>> need
>>>>>> an approx. number of folks.
>>>>>>> 
>>>>>>> -- Lars
>>>>>>> 
>>>>>>>   From: ramkrishna vasudevan <ra...@gmail.com>
>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>>>> larsh@apache.org>
>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
>> work
>>>>>>> 
>>>>>>> Hi
>>>>>>> What time will it be on August 26th?
>>>>>>> @LarsYa. I know that you are not generally in favour of this
>> offheaping
>>>>>> stuff.  May be if we (from India) can attend this meeting remotely
>> your
>>>>>> thoughts can be discussed and also the current state of this work.
>>>>>>> RegardsRam
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>>>> complicated as we wanted fast per-tenant restores, so data is
>> "grouped"
>>>> by
>>>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>>>> wrote most of the code will be in town, I'll check).
>>>>>>> 
>>>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>>>> usually do not like what I think about the offheap efforts :) ).
>>>>>>> Would like to add the following topics:
>>>>>>> 
>>>>>>> 
>>>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>>>> 
>>>>>>> 
>>>>>>> - "Replication". We found that replication cannot keep up with high
>>>>>> write loads, due to the fact that replicated is strictly single
>> threaded
>>>>>> per regionserver (even though we have multiple region servers on the
>>>> sink
>>>>>> side)
>>>>>>> 
>>>>>>> 
>>>>>>> - "Spark integration" (Ted Malaska?)
>>>>>>> 
>>>>>>> 
>>>>>>> OK... Out now to make a "bullshit hat".
>>>>>>> 
>>>>>>> -- Lars
>>>>>>> 
>>>>>>> ________________________________
>>>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>>>> To: dev <de...@hbase.apache.org>
>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
>> work
>>>>>>> 
>>>>>>> 
>>>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>>>> 
>>>>>>> --
>>>>>>> Sean
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>>>> wrote:
>>>>>>>> 
>>>>>>>> I can be up in your area in August.
>>>>>>>> 
>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
>> enis.soz@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>>>> 
>>>>>>>>>> I'll be off starting 25 of July, so I prefer something next week
>> if
>>>>>>>>>> possible.
>>>>>>>>>> 
>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>>>> (Mikhail
>>>>>>>> on
>>>>>>>>> the 20th).
>>>>>>>>> St.Ack
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> Enis
>>>>>>>>>> 
>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Matteo and I were thinking it time devs got together for a
>> pow-wow.
>>>>>>>>> There
>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and
>> it
>>>>>>>>> would
>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
>> gone
>>>>>>>>>> dormant
>>>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>>>> doc
>>>>>>>>>> that
>>>>>>>>>>> need socializing.
>>>>>>>>>>> 
>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>>>> 
>>>>>>>>>>> Topics we'd go over could include:
>>>>>>>>>>> 
>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
>> (Matteo/Stack)
>>>>>>>>>>> + Current state of the offheaping of read path and alternate
>>>> KeyValue
>>>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>>>> + Time (Enis)
>>>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>>>> + hbase-2.0.0
>>>>>>>>>>> 
>>>>>>>>>>> I put names by folks I know could talk to the topic. If you want
>> to
>>>>>>>>> take
>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>>>> discussion
>>>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>>>> thought/design/implementation.
>>>>>>>>>>> 
>>>>>>>>>>> What do others think?
>>>>>>>>>>> 
>>>>>>>>>>> What date would suit folks?
>>>>>>>>>>> 
>>>>>>>>>>> Anyone want to host?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Matteo and St.Ack
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> 
>>>>>>>>  - Andy
>>>>>>>> 
>>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein
>>>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Vladimir Rodionov <vl...@gmail.com>.

>> Umbrella jira to make sure we can have blocks cached in offheap backed
cache. In the entire read path, we can refer to this offheap buffer and
avoid onheap copying.

I think, on a read path, the most important improvement we could imagine is
elimination or reducing of object creations (KVs, iterators etc).
object reuse, byte buffers reuse or offheap buffers reuse, API change etc.
If this is a part of this JIRA, then I would easily define a goal:
improving 95/99% latency of a read operations. Not performance, but latency
matters

-Vlad



On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <an...@gmail.com>
wrote:

> That's not a realistic or useful test scenario, unless the goal is to
> accelerate queries where all cells are filtered at the server.
>
>
>
> > On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
> >
> > No Andy. 11425 having doc attached to it. At the end of it, we have added
> > perf numbers in a cluster testing.  This was done using PE get and scan
> > tests with filtering all cells at server (to not consider n/w bandwidth
> > constraints)
> >
> > -Anoop-
> >
> > On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> We have some microbenchmarks, not evidence of differences seen from a
> >> client application. I'm not saying that microbenchmarks are not totally
> >> necessary and a great start - they are - but that they don't measure an
> end
> >> goal. Furthermore unless I've missed one somewhere we don't have a JIRA
> or
> >> design doc that states a clear end goal metric like the strawman I threw
> >> together in my previous mail. A measurable system level goal and some
> data
> >> from full cluster testing would go a lot further toward letting all of
> us
> >> evaluate the potential and payoff of the work. In the meantime we should
> >> probably be assembling these changes on a branch instead of in trunk,
> for
> >> as long as the goal is not clearly defined and the payoff and potential
> for
> >> perf regressions is untested and unknown.
> >>
> >>
> >>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
> >>>
> >>> Thanks Andy and Lars.  The parent jira has doc attached which contains
> >> some
> >>> perf gain numbers..  We will be doing more tests in next 2 weeks
> (before
> >>> end of this month) and will publish them.   Yes it will be great if it
> is
> >>> more IST friendly time :-)
> >>>
> >>> -Anoop-
> >>>
> >>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >> andrew.purtell@gmail.com>
> >>> wrote:
> >>>
> >>>>> I can represent your side Ram (and Anoop). I've been known always
> argue
> >>>> both side of a discussion and to never take sides easily (drives some
> >> folks
> >>>> crazy).
> >>>>
> >>>> I can vouch for this (smile)
> >>>>
> >>>> I also can offer support for off heaping there. At the same time we do
> >>>> have a gap where we can't point to a timeline of improvements (yet,
> >> anyway)
> >>>> with benchmarks showing gains where your goals need them. For example,
> >>>> stock HBase in one JVM can address max N GB for response time
> >> distribution
> >>>> D; dev version of HBase in off heap branch can address max N' GB for
> >>>> distribution D', where N' > N and D > D' (distribution D'
> statistically
> >>>> shows better/lower response times).
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
> >>>>>
> >>>>> I'm in favor of anything that improves performance (and preferably
> >>>> doesn't set us back into a world that's worse than C due to the lack
> of
> >>>> pointers in Java).Never said "I don't like it", it's just that I'm
> >> perhaps
> >>>> asking for more numbers and justification in weighing the pros and
> cons.
> >>>>> I can represent your side Ram (and Anoop). I've been known always
> argue
> >>>> both side of a discussion and to never take sides easily (drives some
> >> folks
> >>>> crazy). And Stack's there too, he yell at me where needed :)
> >>>>>
> >>>>> Perhaps we can do it a bit later in the evening so there is a
> fighting
> >>>> chance that folks on IST can participate. I know that some of our
> folks
> >> on
> >>>> IST would love to participate in the backup discussion).
> >>>>>
> >>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just
> need
> >>>> an approx. number of folks.
> >>>>>
> >>>>> -- Lars
> >>>>>
> >>>>>    From: ramkrishna vasudevan <ra...@gmail.com>
> >>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> >>>> larsh@apache.org>
> >>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> work
> >>>>>
> >>>>> Hi
> >>>>> What time will it be on August 26th?
> >>>>> @LarsYa. I know that you are not generally in favour of this
> offheaping
> >>>> stuff.  May be if we (from India) can attend this meeting remotely
> your
> >>>> thoughts can be discussed and also the current state of this work.
> >>>>> RegardsRam
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
> >> wrote:
> >>>>>
> >>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
> >>>>> We have done a _lot_ of work on backups as well - ours are more
> >>>> complicated as we wanted fast per-tenant restores, so data is
> "grouped"
> >> by
> >>>> tenant. Would like to sync up on that (hopefully some of the folks who
> >>>> wrote most of the code will be in town, I'll check).
> >>>>>
> >>>>> Also interested in the "Time" and "offheap" parts (although you folks
> >>>> usually do not like what I think about the offheap efforts :) ).
> >>>>> Would like to add the following topics:
> >>>>>
> >>>>>
> >>>>> - "Timestamp Resolution". Or making space for more bits in the
> >>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
> >>>>>
> >>>>>
> >>>>> - "Replication". We found that replication cannot keep up with high
> >>>> write loads, due to the fact that replicated is strictly single
> threaded
> >>>> per regionserver (even though we have multiple region servers on the
> >> sink
> >>>> side)
> >>>>>
> >>>>>
> >>>>> - "Spark integration" (Ted Malaska?)
> >>>>>
> >>>>>
> >>>>> OK... Out now to make a "bullshit hat".
> >>>>>
> >>>>> -- Lars
> >>>>>
> >>>>> ________________________________
> >>>>> From: Sean Busbey <bu...@cloudera.com>
> >>>>> To: dev <de...@hbase.apache.org>
> >>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term
> work
> >>>>>
> >>>>>
> >>>>> I'm planning to be in the Bay area the week of the 24th of August.
> >>>>>
> >>>>> --
> >>>>> Sean
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> >> wrote:
> >>>>>>
> >>>>>> I can be up in your area in August.
> >>>>>>
> >>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >>>>>>>>
> >>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> enis.soz@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>>>
> >>>>>>>> I'll be off starting 25 of July, so I prefer something next week
> if
> >>>>>>>> possible.
> >>>>>>>>
> >>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
> >> (Mikhail
> >>>>>> on
> >>>>>>> the 20th).
> >>>>>>> St.Ack
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Enis
> >>>>>>>>
> >>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> >>>>>>>>>
> >>>>>>>>> Matteo and I were thinking it time devs got together for a
> pow-wow.
> >>>>>>> There
> >>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and
> it
> >>>>>>> would
> >>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have
> gone
> >>>>>>>> dormant
> >>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
> >> doc
> >>>>>>>> that
> >>>>>>>>> need socializing.
> >>>>>>>>>
> >>>>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>>>
> >>>>>>>>> Topics we'd go over could include:
> >>>>>>>>>
> >>>>>>>>> + Our filesystem layout will not work if 1M regions
> (Matteo/Stack)
> >>>>>>>>> + Current state of the offheaping of read path and alternate
> >> KeyValue
> >>>>>>>>> implementation (Anoop/Ram)
> >>>>>>>>> + Append rejigger (Elliott)
> >>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>>>> + Splitting meta/1M regions
> >>>>>>>>> + The revived Backup (Vladimir)
> >>>>>>>>> + Time (Enis)
> >>>>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>>>> + hbase-2.0.0
> >>>>>>>>>
> >>>>>>>>> I put names by folks I know could talk to the topic. If you want
> to
> >>>>>>> take
> >>>>>>>>> over a topic or put your name by one, just say.  Suggest that
> >>>>>>> discussion
> >>>>>>>>> lead off with a 5-10minute on current state of
> >>>>>>>>> thought/design/implementation.
> >>>>>>>>>
> >>>>>>>>> What do others think?
> >>>>>>>>>
> >>>>>>>>> What date would suit folks?
> >>>>>>>>>
> >>>>>>>>> Anyone want to host?
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Matteo and St.Ack
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards,
> >>>>>>
> >>>>>>   - Andy
> >>>>>>
> >>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

That's not a realistic or useful test scenario, unless the goal is to accelerate queries where all cells are filtered at the server. 



> On Jul 18, 2015, at 11:02 AM, Anoop John <an...@gmail.com> wrote:
> 
> No Andy. 11425 having doc attached to it. At the end of it, we have added
> perf numbers in a cluster testing.  This was done using PE get and scan
> tests with filtering all cells at server (to not consider n/w bandwidth
> constraints)
> 
> -Anoop-
> 
> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> We have some microbenchmarks, not evidence of differences seen from a
>> client application. I'm not saying that microbenchmarks are not totally
>> necessary and a great start - they are - but that they don't measure an end
>> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
>> design doc that states a clear end goal metric like the strawman I threw
>> together in my previous mail. A measurable system level goal and some data
>> from full cluster testing would go a lot further toward letting all of us
>> evaluate the potential and payoff of the work. In the meantime we should
>> probably be assembling these changes on a branch instead of in trunk, for
>> as long as the goal is not clearly defined and the payoff and potential for
>> perf regressions is untested and unknown.
>> 
>> 
>>> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
>>> 
>>> Thanks Andy and Lars.  The parent jira has doc attached which contains
>> some
>>> perf gain numbers..  We will be doing more tests in next 2 weeks (before
>>> end of this month) and will publish them.   Yes it will be great if it is
>>> more IST friendly time :-)
>>> 
>>> -Anoop-
>>> 
>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy).
>>>> 
>>>> I can vouch for this (smile)
>>>> 
>>>> I also can offer support for off heaping there. At the same time we do
>>>> have a gap where we can't point to a timeline of improvements (yet,
>> anyway)
>>>> with benchmarks showing gains where your goals need them. For example,
>>>> stock HBase in one JVM can address max N GB for response time
>> distribution
>>>> D; dev version of HBase in off heap branch can address max N' GB for
>>>> distribution D', where N' > N and D > D' (distribution D' statistically
>>>> shows better/lower response times).
>>>> 
>>>> 
>>>> 
>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>> I'm in favor of anything that improves performance (and preferably
>>>> doesn't set us back into a world that's worse than C due to the lack of
>>>> pointers in Java).Never said "I don't like it", it's just that I'm
>> perhaps
>>>> asking for more numbers and justification in weighing the pros and cons.
>>>>> I can represent your side Ram (and Anoop). I've been known always argue
>>>> both side of a discussion and to never take sides easily (drives some
>> folks
>>>> crazy). And Stack's there too, he yell at me where needed :)
>>>>> 
>>>>> Perhaps we can do it a bit later in the evening so there is a fighting
>>>> chance that folks on IST can participate. I know that some of our folks
>> on
>>>> IST would love to participate in the backup discussion).
>>>>> 
>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>>>> an approx. number of folks.
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>>    From: ramkrishna vasudevan <ra...@gmail.com>
>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>>>> larsh@apache.org>
>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> Hi
>>>>> What time will it be on August 26th?
>>>>> @LarsYa. I know that you are not generally in favour of this offheaping
>>>> stuff.  May be if we (from India) can attend this meeting remotely your
>>>> thoughts can be discussed and also the current state of this work.
>>>>> RegardsRam
>>>>> 
>>>>> 
>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
>> wrote:
>>>>> 
>>>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>>>> We have done a _lot_ of work on backups as well - ours are more
>>>> complicated as we wanted fast per-tenant restores, so data is "grouped"
>> by
>>>> tenant. Would like to sync up on that (hopefully some of the folks who
>>>> wrote most of the code will be in town, I'll check).
>>>>> 
>>>>> Also interested in the "Time" and "offheap" parts (although you folks
>>>> usually do not like what I think about the offheap efforts :) ).
>>>>> Would like to add the following topics:
>>>>> 
>>>>> 
>>>>> - "Timestamp Resolution". Or making space for more bits in the
>>>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>>>> 
>>>>> 
>>>>> - "Replication". We found that replication cannot keep up with high
>>>> write loads, due to the fact that replicated is strictly single threaded
>>>> per regionserver (even though we have multiple region servers on the
>> sink
>>>> side)
>>>>> 
>>>>> 
>>>>> - "Spark integration" (Ted Malaska?)
>>>>> 
>>>>> 
>>>>> OK... Out now to make a "bullshit hat".
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> ________________________________
>>>>> From: Sean Busbey <bu...@cloudera.com>
>>>>> To: dev <de...@hbase.apache.org>
>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>>>> 
>>>>> 
>>>>> I'm planning to be in the Bay area the week of the 24th of August.
>>>>> 
>>>>> --
>>>>> Sean
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>> wrote:
>>>>>> 
>>>>>> I can be up in your area in August.
>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>>>> 
>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>>>> 
>>>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>>>> possible.
>>>>>>>> 
>>>>>>>> You ever coming back? If so, when? I'm back on 10th of August
>> (Mikhail
>>>>>> on
>>>>>>> the 20th).
>>>>>>> St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Enis
>>>>>>>> 
>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>> 
>>>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>>>> There
>>>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>>>> would
>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>>>> dormant
>>>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
>> doc
>>>>>>>> that
>>>>>>>>> need socializing.
>>>>>>>>> 
>>>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>>>> 
>>>>>>>>> Topics we'd go over could include:
>>>>>>>>> 
>>>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>>>> + Current state of the offheaping of read path and alternate
>> KeyValue
>>>>>>>>> implementation (Anoop/Ram)
>>>>>>>>> + Append rejigger (Elliott)
>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>>>> + Splitting meta/1M regions
>>>>>>>>> + The revived Backup (Vladimir)
>>>>>>>>> + Time (Enis)
>>>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>>>> + hbase-2.0.0
>>>>>>>>> 
>>>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>>>> take
>>>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>>>> discussion
>>>>>>>>> lead off with a 5-10minute on current state of
>>>>>>>>> thought/design/implementation.
>>>>>>>>> 
>>>>>>>>> What do others think?
>>>>>>>>> 
>>>>>>>>> What date would suit folks?
>>>>>>>>> 
>>>>>>>>> Anyone want to host?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Matteo and St.Ack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>>   - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Anoop John <an...@gmail.com>.

No Andy. 11425 having doc attached to it. At the end of it, we have added
perf numbers in a cluster testing.  This was done using PE get and scan
tests with filtering all cells at server (to not consider n/w bandwidth
constraints)

-Anoop-

On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <an...@gmail.com>
wrote:

> We have some microbenchmarks, not evidence of differences seen from a
> client application. I'm not saying that microbenchmarks are not totally
> necessary and a great start - they are - but that they don't measure an end
> goal. Furthermore unless I've missed one somewhere we don't have a JIRA or
> design doc that states a clear end goal metric like the strawman I threw
> together in my previous mail. A measurable system level goal and some data
> from full cluster testing would go a lot further toward letting all of us
> evaluate the potential and payoff of the work. In the meantime we should
> probably be assembling these changes on a branch instead of in trunk, for
> as long as the goal is not clearly defined and the payoff and potential for
> perf regressions is untested and unknown.
>
>
> > On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
> >
> > Thanks Andy and Lars.  The parent jira has doc attached which contains
> some
> > perf gain numbers..  We will be doing more tests in next 2 weeks (before
> > end of this month) and will publish them.   Yes it will be great if it is
> > more IST friendly time :-)
> >
> > -Anoop-
> >
> > On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >>> I can represent your side Ram (and Anoop). I've been known always argue
> >> both side of a discussion and to never take sides easily (drives some
> folks
> >> crazy).
> >>
> >> I can vouch for this (smile)
> >>
> >> I also can offer support for off heaping there. At the same time we do
> >> have a gap where we can't point to a timeline of improvements (yet,
> anyway)
> >> with benchmarks showing gains where your goals need them. For example,
> >> stock HBase in one JVM can address max N GB for response time
> distribution
> >> D; dev version of HBase in off heap branch can address max N' GB for
> >> distribution D', where N' > N and D > D' (distribution D' statistically
> >> shows better/lower response times).
> >>
> >>
> >>
> >>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
> >>>
> >>> I'm in favor of anything that improves performance (and preferably
> >> doesn't set us back into a world that's worse than C due to the lack of
> >> pointers in Java).Never said "I don't like it", it's just that I'm
> perhaps
> >> asking for more numbers and justification in weighing the pros and cons.
> >>> I can represent your side Ram (and Anoop). I've been known always argue
> >> both side of a discussion and to never take sides easily (drives some
> folks
> >> crazy). And Stack's there too, he yell at me where needed :)
> >>>
> >>> Perhaps we can do it a bit later in the evening so there is a fighting
> >> chance that folks on IST can participate. I know that some of our folks
> on
> >> IST would love to participate in the backup discussion).
> >>>
> >>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
> >> an approx. number of folks.
> >>>
> >>> -- Lars
> >>>
> >>>     From: ramkrishna vasudevan <ra...@gmail.com>
> >>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> >> larsh@apache.org>
> >>> Sent: Wednesday, July 15, 2015 10:10 AM
> >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> >>>
> >>> Hi
> >>> What time will it be on August 26th?
> >>> @LarsYa. I know that you are not generally in favour of this offheaping
> >> stuff.  May be if we (from India) can attend this meeting remotely your
> >> thoughts can be discussed and also the current state of this work.
> >>> RegardsRam
> >>>
> >>>
> >>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org>
> wrote:
> >>>
> >>> Works for me. I'll be back in the Bay Area the week of August 9th.
> >>> We have done a _lot_ of work on backups as well - ours are more
> >> complicated as we wanted fast per-tenant restores, so data is "grouped"
> by
> >> tenant. Would like to sync up on that (hopefully some of the folks who
> >> wrote most of the code will be in town, I'll check).
> >>>
> >>> Also interested in the "Time" and "offheap" parts (although you folks
> >> usually do not like what I think about the offheap efforts :) ).
> >>> Would like to add the following topics:
> >>>
> >>>
> >>> - "Timestamp Resolution". Or making space for more bits in the
> >> timestamps (happy to cover that, unless it's part of the "Time" topic)
> >>>
> >>>
> >>> - "Replication". We found that replication cannot keep up with high
> >> write loads, due to the fact that replicated is strictly single threaded
> >> per regionserver (even though we have multiple region servers on the
> sink
> >> side)
> >>>
> >>>
> >>> - "Spark integration" (Ted Malaska?)
> >>>
> >>>
> >>> OK... Out now to make a "bullshit hat".
> >>>
> >>> -- Lars
> >>>
> >>> ________________________________
> >>> From: Sean Busbey <bu...@cloudera.com>
> >>> To: dev <de...@hbase.apache.org>
> >>> Sent: Tuesday, July 14, 2015 7:11 PM
> >>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> >>>
> >>>
> >>> I'm planning to be in the Bay area the week of the 24th of August.
> >>>
> >>> --
> >>> Sean
> >>>
> >>>
> >>>
> >>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> wrote:
> >>>>
> >>>> I can be up in your area in August.
> >>>>
> >>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >>>>>>
> >>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Sounds good. It has been a while we did the talk-aton.
> >>>>>>
> >>>>>> I'll be off starting 25 of July, so I prefer something next week if
> >>>>>> possible.
> >>>>>>
> >>>>>> You ever coming back? If so, when? I'm back on 10th of August
> (Mikhail
> >>>> on
> >>>>> the 20th).
> >>>>> St.Ack
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Enis
> >>>>>>
> >>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> >>>>>>>
> >>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
> >>>>> There
> >>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
> >>>>> would
> >>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
> >>>>>> dormant
> >>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google
> doc
> >>>>>> that
> >>>>>>> need socializing.
> >>>>>>>
> >>>>>>> You can only come if you are wearing your bullshit hat.
> >>>>>>>
> >>>>>>> Topics we'd go over could include:
> >>>>>>>
> >>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> >>>>>>> + Current state of the offheaping of read path and alternate
> KeyValue
> >>>>>>> implementation (Anoop/Ram)
> >>>>>>> + Append rejigger (Elliott)
> >>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>>>> + Splitting meta/1M regions
> >>>>>>> + The revived Backup (Vladimir)
> >>>>>>> + Time (Enis)
> >>>>>>> + The overloaded SequenceId (Stack)
> >>>>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>>>> + hbase-2.0.0
> >>>>>>>
> >>>>>>> I put names by folks I know could talk to the topic. If you want to
> >>>>> take
> >>>>>>> over a topic or put your name by one, just say.  Suggest that
> >>>>> discussion
> >>>>>>> lead off with a 5-10minute on current state of
> >>>>>>> thought/design/implementation.
> >>>>>>>
> >>>>>>> What do others think?
> >>>>>>>
> >>>>>>> What date would suit folks?
> >>>>>>>
> >>>>>>> Anyone want to host?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Matteo and St.Ack
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>>
> >>>>    - Andy
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>>> (via Tom White)
> >>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

We have some microbenchmarks, not evidence of differences seen from a client application. I'm not saying that microbenchmarks are not totally necessary and a great start - they are - but that they don't measure an end goal. Furthermore unless I've missed one somewhere we don't have a JIRA or design doc that states a clear end goal metric like the strawman I threw together in my previous mail. A measurable system level goal and some data from full cluster testing would go a lot further toward letting all of us evaluate the potential and payoff of the work. In the meantime we should probably be assembling these changes on a branch instead of in trunk, for as long as the goal is not clearly defined and the payoff and potential for perf regressions is untested and unknown. 


> On Jul 18, 2015, at 8:05 AM, Anoop John <an...@gmail.com> wrote:
> 
> Thanks Andy and Lars.  The parent jira has doc attached which contains some
> perf gain numbers..  We will be doing more tests in next 2 weeks (before
> end of this month) and will publish them.   Yes it will be great if it is
> more IST friendly time :-)
> 
> -Anoop-
> 
> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <an...@gmail.com>
> wrote:
> 
>>> I can represent your side Ram (and Anoop). I've been known always argue
>> both side of a discussion and to never take sides easily (drives some folks
>> crazy).
>> 
>> I can vouch for this (smile)
>> 
>> I also can offer support for off heaping there. At the same time we do
>> have a gap where we can't point to a timeline of improvements (yet, anyway)
>> with benchmarks showing gains where your goals need them. For example,
>> stock HBase in one JVM can address max N GB for response time distribution
>> D; dev version of HBase in off heap branch can address max N' GB for
>> distribution D', where N' > N and D > D' (distribution D' statistically
>> shows better/lower response times).
>> 
>> 
>> 
>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>> I'm in favor of anything that improves performance (and preferably
>> doesn't set us back into a world that's worse than C due to the lack of
>> pointers in Java).Never said "I don't like it", it's just that I'm perhaps
>> asking for more numbers and justification in weighing the pros and cons.
>>> I can represent your side Ram (and Anoop). I've been known always argue
>> both side of a discussion and to never take sides easily (drives some folks
>> crazy). And Stack's there too, he yell at me where needed :)
>>> 
>>> Perhaps we can do it a bit later in the evening so there is a fighting
>> chance that folks on IST can participate. I know that some of our folks on
>> IST would love to participate in the backup discussion).
>>> 
>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
>> an approx. number of folks.
>>> 
>>> -- Lars
>>> 
>>>     From: ramkrishna vasudevan <ra...@gmail.com>
>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
>> larsh@apache.org>
>>> Sent: Wednesday, July 15, 2015 10:10 AM
>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>> 
>>> Hi
>>> What time will it be on August 26th?
>>> @LarsYa. I know that you are not generally in favour of this offheaping
>> stuff.  May be if we (from India) can attend this meeting remotely your
>> thoughts can be discussed and also the current state of this work.
>>> RegardsRam
>>> 
>>> 
>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>> Works for me. I'll be back in the Bay Area the week of August 9th.
>>> We have done a _lot_ of work on backups as well - ours are more
>> complicated as we wanted fast per-tenant restores, so data is "grouped" by
>> tenant. Would like to sync up on that (hopefully some of the folks who
>> wrote most of the code will be in town, I'll check).
>>> 
>>> Also interested in the "Time" and "offheap" parts (although you folks
>> usually do not like what I think about the offheap efforts :) ).
>>> Would like to add the following topics:
>>> 
>>> 
>>> - "Timestamp Resolution". Or making space for more bits in the
>> timestamps (happy to cover that, unless it's part of the "Time" topic)
>>> 
>>> 
>>> - "Replication". We found that replication cannot keep up with high
>> write loads, due to the fact that replicated is strictly single threaded
>> per regionserver (even though we have multiple region servers on the sink
>> side)
>>> 
>>> 
>>> - "Spark integration" (Ted Malaska?)
>>> 
>>> 
>>> OK... Out now to make a "bullshit hat".
>>> 
>>> -- Lars
>>> 
>>> ________________________________
>>> From: Sean Busbey <bu...@cloudera.com>
>>> To: dev <de...@hbase.apache.org>
>>> Sent: Tuesday, July 14, 2015 7:11 PM
>>> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>>> 
>>> 
>>> I'm planning to be in the Bay area the week of the 24th of August.
>>> 
>>> --
>>> Sean
>>> 
>>> 
>>> 
>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>>>> 
>>>> I can be up in your area in August.
>>>> 
>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>>>>> 
>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Sounds good. It has been a while we did the talk-aton.
>>>>>> 
>>>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>>>> possible.
>>>>>> 
>>>>>> You ever coming back? If so, when? I'm back on 10th of August (Mikhail
>>>> on
>>>>> the 20th).
>>>>> St.Ack
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> Enis
>>>>>> 
>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>>>> 
>>>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>>>> There
>>>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>>>> would
>>>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>>>> dormant
>>>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google doc
>>>>>> that
>>>>>>> need socializing.
>>>>>>> 
>>>>>>> You can only come if you are wearing your bullshit hat.
>>>>>>> 
>>>>>>> Topics we'd go over could include:
>>>>>>> 
>>>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>>>> + Current state of the offheaping of read path and alternate KeyValue
>>>>>>> implementation (Anoop/Ram)
>>>>>>> + Append rejigger (Elliott)
>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>>>> + Splitting meta/1M regions
>>>>>>> + The revived Backup (Vladimir)
>>>>>>> + Time (Enis)
>>>>>>> + The overloaded SequenceId (Stack)
>>>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>>>> + hbase-2.0.0
>>>>>>> 
>>>>>>> I put names by folks I know could talk to the topic. If you want to
>>>>> take
>>>>>>> over a topic or put your name by one, just say.  Suggest that
>>>>> discussion
>>>>>>> lead off with a 5-10minute on current state of
>>>>>>> thought/design/implementation.
>>>>>>> 
>>>>>>> What do others think?
>>>>>>> 
>>>>>>> What date would suit folks?
>>>>>>> 
>>>>>>> Anyone want to host?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Matteo and St.Ack
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>    - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>>> (via Tom White)
>>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Anoop John <an...@gmail.com>.

Thanks Andy and Lars.  The parent jira has doc attached which contains some
perf gain numbers..  We will be doing more tests in next 2 weeks (before
end of this month) and will publish them.   Yes it will be great if it is
more IST friendly time :-)

-Anoop-

On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <an...@gmail.com>
wrote:

> > I can represent your side Ram (and Anoop). I've been known always argue
> both side of a discussion and to never take sides easily (drives some folks
> crazy).
>
> I can vouch for this (smile)
>
> I also can offer support for off heaping there. At the same time we do
> have a gap where we can't point to a timeline of improvements (yet, anyway)
> with benchmarks showing gains where your goals need them. For example,
> stock HBase in one JVM can address max N GB for response time distribution
> D; dev version of HBase in off heap branch can address max N' GB for
> distribution D', where N' > N and D > D' (distribution D' statistically
> shows better/lower response times).
>
>
>
> > On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
> >
> > I'm in favor of anything that improves performance (and preferably
> doesn't set us back into a world that's worse than C due to the lack of
> pointers in Java).Never said "I don't like it", it's just that I'm perhaps
> asking for more numbers and justification in weighing the pros and cons.
> > I can represent your side Ram (and Anoop). I've been known always argue
> both side of a discussion and to never take sides easily (drives some folks
> crazy). And Stack's there too, he yell at me where needed :)
> >
> > Perhaps we can do it a bit later in the evening so there is a fighting
> chance that folks on IST can participate. I know that some of our folks on
> IST would love to participate in the backup discussion).
> >
> > Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need
> an approx. number of folks.
> >
> > -- Lars
> >
> >      From: ramkrishna vasudevan <ra...@gmail.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> larsh@apache.org>
> > Sent: Wednesday, July 15, 2015 10:10 AM
> > Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> >
> > Hi
> > What time will it be on August 26th?
> > @LarsYa. I know that you are not generally in favour of this offheaping
> stuff.  May be if we (from India) can attend this meeting remotely your
> thoughts can be discussed and also the current state of this work.
> > RegardsRam
> >
> >
> > On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org> wrote:
> >
> > Works for me. I'll be back in the Bay Area the week of August 9th.
> > We have done a _lot_ of work on backups as well - ours are more
> complicated as we wanted fast per-tenant restores, so data is "grouped" by
> tenant. Would like to sync up on that (hopefully some of the folks who
> wrote most of the code will be in town, I'll check).
> >
> > Also interested in the "Time" and "offheap" parts (although you folks
> usually do not like what I think about the offheap efforts :) ).
> > Would like to add the following topics:
> >
> >
> > - "Timestamp Resolution". Or making space for more bits in the
> timestamps (happy to cover that, unless it's part of the "Time" topic)
> >
> >
> > - "Replication". We found that replication cannot keep up with high
> write loads, due to the fact that replicated is strictly single threaded
> per regionserver (even though we have multiple region servers on the sink
> side)
> >
> >
> > - "Spark integration" (Ted Malaska?)
> >
> >
> > OK... Out now to make a "bullshit hat".
> >
> > -- Lars
> >
> > ________________________________
> > From: Sean Busbey <bu...@cloudera.com>
> > To: dev <de...@hbase.apache.org>
> > Sent: Tuesday, July 14, 2015 7:11 PM
> > Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> >
> >
> > I'm planning to be in the Bay area the week of the 24th of August.
> >
> > --
> > Sean
> >
> >
> >
> >> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
> >>
> >> I can be up in your area in August.
> >>
> >>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >>>
> >>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> >>> wrote:
> >>>
> >>>> Sounds good. It has been a while we did the talk-aton.
> >>>>
> >>>> I'll be off starting 25 of July, so I prefer something next week if
> >>>> possible.
> >>>>
> >>>> You ever coming back? If so, when? I'm back on 10th of August (Mikhail
> >> on
> >>> the 20th).
> >>> St.Ack
> >>>
> >>>
> >>>
> >>>
> >>>> Enis
> >>>>
> >>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> >>>>>
> >>>>> Matteo and I were thinking it time devs got together for a pow-wow.
> >>> There
> >>>>> is a bunch of stuff in flight at the moment (see below list) and it
> >>> would
> >>>>> be good to meet and whiteboard, surface goodo ideas that have gone
> >>>> dormant
> >>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> >>>> that
> >>>>> need socializing.
> >>>>>
> >>>>> You can only come if you are wearing your bullshit hat.
> >>>>>
> >>>>> Topics we'd go over could include:
> >>>>>
> >>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> >>>>> + Current state of the offheaping of read path and alternate KeyValue
> >>>>> implementation (Anoop/Ram)
> >>>>> + Append rejigger (Elliott)
> >>>>> + A Pv2-based Assign (Matteo/Steven)
> >>>>> + Splitting meta/1M regions
> >>>>> + The revived Backup (Vladimir)
> >>>>> + Time (Enis)
> >>>>> + The overloaded SequenceId (Stack)
> >>>>> + Upstreaming IT testing (Dima/Sean)
> >>>>> + hbase-2.0.0
> >>>>>
> >>>>> I put names by folks I know could talk to the topic. If you want to
> >>> take
> >>>>> over a topic or put your name by one, just say.  Suggest that
> >>> discussion
> >>>>> lead off with a 5-10minute on current state of
> >>>>> thought/design/implementation.
> >>>>>
> >>>>> What do others think?
> >>>>>
> >>>>> What date would suit folks?
> >>>>>
> >>>>> Anyone want to host?
> >>>>>
> >>>>> Thanks,
> >>>>> Matteo and St.Ack
> >>
> >>
> >>
> >> --
> >> Best regards,
> >>
> >>     - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >
> >
> >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <an...@gmail.com>.

> I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy).

I can vouch for this (smile)

I also can offer support for off heaping there. At the same time we do have a gap where we can't point to a timeline of improvements (yet, anyway) with benchmarks showing gains where your goals need them. For example, stock HBase in one JVM can address max N GB for response time distribution D; dev version of HBase in off heap branch can address max N' GB for distribution D', where N' > N and D > D' (distribution D' statistically shows better/lower response times). 



> On Jul 17, 2015, at 6:56 AM, lars hofhansl <la...@apache.org> wrote:
> 
> I'm in favor of anything that improves performance (and preferably doesn't set us back into a world that's worse than C due to the lack of pointers in Java).Never said "I don't like it", it's just that I'm perhaps asking for more numbers and justification in weighing the pros and cons.
> I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy). And Stack's there too, he yell at me where needed :)
> 
> Perhaps we can do it a bit later in the evening so there is a fighting chance that folks on IST can participate. I know that some of our folks on IST would love to participate in the backup discussion).
> 
> Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need an approx. number of folks.
> 
> -- Lars
> 
>      From: ramkrishna vasudevan <ra...@gmail.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
> Sent: Wednesday, July 15, 2015 10:10 AM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> 
> Hi 
> What time will it be on August 26th?
> @LarsYa. I know that you are not generally in favour of this offheaping stuff.  May be if we (from India) can attend this meeting remotely your thoughts can be discussed and also the current state of this work.
> RegardsRam
> 
> 
> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org> wrote:
> 
> Works for me. I'll be back in the Bay Area the week of August 9th.
> We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per-tenant restores, so data is "grouped" by tenant. Would like to sync up on that (hopefully some of the folks who wrote most of the code will be in town, I'll check).
> 
> Also interested in the "Time" and "offheap" parts (although you folks usually do not like what I think about the offheap efforts :) ).
> Would like to add the following topics:
> 
> 
> - "Timestamp Resolution". Or making space for more bits in the timestamps (happy to cover that, unless it's part of the "Time" topic)
> 
> 
> - "Replication". We found that replication cannot keep up with high write loads, due to the fact that replicated is strictly single threaded per regionserver (even though we have multiple region servers on the sink side)
> 
> 
> - "Spark integration" (Ted Malaska?)
> 
> 
> OK... Out now to make a "bullshit hat".
> 
> -- Lars
> 
> ________________________________
> From: Sean Busbey <bu...@cloudera.com>
> To: dev <de...@hbase.apache.org>
> Sent: Tuesday, July 14, 2015 7:11 PM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
> 
> 
> I'm planning to be in the Bay area the week of the 24th of August.
> 
> --
> Sean
> 
> 
> 
>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>> 
>> I can be up in your area in August.
>> 
>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>>> 
>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
>>> wrote:
>>> 
>>>> Sounds good. It has been a while we did the talk-aton.
>>>> 
>>>> I'll be off starting 25 of July, so I prefer something next week if
>>>> possible.
>>>> 
>>>> You ever coming back? If so, when? I'm back on 10th of August (Mikhail
>> on
>>> the 20th).
>>> St.Ack
>>> 
>>> 
>>> 
>>> 
>>>> Enis
>>>> 
>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>>>>> 
>>>>> Matteo and I were thinking it time devs got together for a pow-wow.
>>> There
>>>>> is a bunch of stuff in flight at the moment (see below list) and it
>>> would
>>>>> be good to meet and whiteboard, surface goodo ideas that have gone
>>>> dormant
>>>>> in JIRA, or revisit designs/proposals out in JIRA-attached google doc
>>>> that
>>>>> need socializing.
>>>>> 
>>>>> You can only come if you are wearing your bullshit hat.
>>>>> 
>>>>> Topics we'd go over could include:
>>>>> 
>>>>> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
>>>>> + Current state of the offheaping of read path and alternate KeyValue
>>>>> implementation (Anoop/Ram)
>>>>> + Append rejigger (Elliott)
>>>>> + A Pv2-based Assign (Matteo/Steven)
>>>>> + Splitting meta/1M regions
>>>>> + The revived Backup (Vladimir)
>>>>> + Time (Enis)
>>>>> + The overloaded SequenceId (Stack)
>>>>> + Upstreaming IT testing (Dima/Sean)
>>>>> + hbase-2.0.0
>>>>> 
>>>>> I put names by folks I know could talk to the topic. If you want to
>>> take
>>>>> over a topic or put your name by one, just say.  Suggest that
>>> discussion
>>>>> lead off with a 5-10minute on current state of
>>>>> thought/design/implementation.
>>>>> 
>>>>> What do others think?
>>>>> 
>>>>> What date would suit folks?
>>>>> 
>>>>> Anyone want to host?
>>>>> 
>>>>> Thanks,
>>>>> Matteo and St.Ack
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>     - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
> 
> 
> 
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by lars hofhansl <la...@apache.org>.

I'm in favor of anything that improves performance (and preferably doesn't set us back into a world that's worse than C due to the lack of pointers in Java).Never said "I don't like it", it's just that I'm perhaps asking for more numbers and justification in weighing the pros and cons.
I can represent your side Ram (and Anoop). I've been known always argue both side of a discussion and to never take sides easily (drives some folks crazy). And Stack's there too, he yell at me where needed :)

Perhaps we can do it a bit later in the evening so there is a fighting chance that folks on IST can participate. I know that some of our folks on IST would love to participate in the backup discussion).

Like Enis, I'm also happy to host. We're in Downtown SF. I'd just need an approx. number of folks.

-- Lars

      From: ramkrishna vasudevan <ra...@gmail.com>
 To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
 Sent: Wednesday, July 15, 2015 10:10 AM
 Subject: Re: DISCUSSION: lets do a developer workshop on near-term work

Hi 
What time will it be on August 26th?
@LarsYa. I know that you are not generally in favour of this offheaping stuff.  May be if we (from India) can attend this meeting remotely your thoughts can be discussed and also the current state of this work.
RegardsRam

On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org> wrote:

Works for me. I'll be back in the Bay Area the week of August 9th.
We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per-tenant restores, so data is "grouped" by tenant. Would like to sync up on that (hopefully some of the folks who wrote most of the code will be in town, I'll check).

Also interested in the "Time" and "offheap" parts (although you folks usually do not like what I think about the offheap efforts :) ).
Would like to add the following topics:

- "Timestamp Resolution". Or making space for more bits in the timestamps (happy to cover that, unless it's part of the "Time" topic)

- "Replication". We found that replication cannot keep up with high write loads, due to the fact that replicated is strictly single threaded per regionserver (even though we have multiple region servers on the sink side)

- "Spark integration" (Ted Malaska?)

OK... Out now to make a "bullshit hat".

-- Lars

________________________________
From: Sean Busbey <bu...@cloudera.com>
To: dev <de...@hbase.apache.org>
Sent: Tuesday, July 14, 2015 7:11 PM
Subject: Re: DISCUSSION: lets do a developer workshop on near-term work

I'm planning to be in the Bay area the week of the 24th of August.

--
Sean

On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:

> I can be up in your area in August.
>
> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> wrote:
> >
> > > Sounds good. It has been a while we did the talk-aton.
> > >
> > > I'll be off starting 25 of July, so I prefer something next week if
> > > possible.
> > >
> > > You ever coming back? If so, when? I'm back on 10th of August (Mikhail
> on
> > the 20th).
> > St.Ack
> >
> >
> >
> >
> > > Enis
> > >
> > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > There
> > > > is a bunch of stuff in flight at the moment (see below list) and it
> > would
> > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > dormant
> > > > in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> > > that
> > > > need socializing.
> > > >
> > > > You can only come if you are wearing your bullshit hat.
> > > >
> > > > Topics we'd go over could include:
> > > >
> > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > + Current state of the offheaping of read path and alternate KeyValue
> > > > implementation (Anoop/Ram)
> > > > + Append rejigger (Elliott)
> > > > + A Pv2-based Assign (Matteo/Steven)
> > > > + Splitting meta/1M regions
> > > > + The revived Backup (Vladimir)
> > > > + Time (Enis)
> > > > + The overloaded SequenceId (Stack)
> > > > + Upstreaming IT testing (Dima/Sean)
> > > > + hbase-2.0.0
> > > >
> > > > I put names by folks I know could talk to the topic. If you want to
> > take
> > > > over a topic or put your name by one, just say.  Suggest that
> > discussion
> > > > lead off with a 5-10minute on current state of
> > > > thought/design/implementation.
> > > >
> > > > What do others think?
> > > >
> > > > What date would suit folks?
> > > >
> > > > Anyone want to host?
> > > >
> > > > Thanks,
> > > > Matteo and St.Ack
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Hi

What time will it be on August 26th?

@Lars
Ya. I know that you are not generally in favour of this offheaping stuff.
May be if we (from India) can attend this meeting remotely your thoughts
can be discussed and also the current state of this work.

Regards
Ram

On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <la...@apache.org> wrote:

> Works for me. I'll be back in the Bay Area the week of August 9th.
> We have done a _lot_ of work on backups as well - ours are more
> complicated as we wanted fast per-tenant restores, so data is "grouped" by
> tenant. Would like to sync up on that (hopefully some of the folks who
> wrote most of the code will be in town, I'll check).
>
> Also interested in the "Time" and "offheap" parts (although you folks
> usually do not like what I think about the offheap efforts :) ).
> Would like to add the following topics:
>
>
> - "Timestamp Resolution". Or making space for more bits in the timestamps
> (happy to cover that, unless it's part of the "Time" topic)
>
>
> - "Replication". We found that replication cannot keep up with high write
> loads, due to the fact that replicated is strictly single threaded per
> regionserver (even though we have multiple region servers on the sink side)
>
>
> - "Spark integration" (Ted Malaska?)
>
>
> OK... Out now to make a "bullshit hat".
>
> -- Lars
>
> ________________________________
> From: Sean Busbey <bu...@cloudera.com>
> To: dev <de...@hbase.apache.org>
> Sent: Tuesday, July 14, 2015 7:11 PM
> Subject: Re: DISCUSSION: lets do a developer workshop on near-term work
>
>
> I'm planning to be in the Bay area the week of the 24th of August.
>
> --
> Sean
>
>
>
> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> > I can be up in your area in August.
> >
> > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> > wrote:
> > >
> > > > Sounds good. It has been a while we did the talk-aton.
> > > >
> > > > I'll be off starting 25 of July, so I prefer something next week if
> > > > possible.
> > > >
> > > > You ever coming back? If so, when? I'm back on 10th of August
> (Mikhail
> > on
> > > the 20th).
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Enis
> > > >
> > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > > There
> > > > > is a bunch of stuff in flight at the moment (see below list) and it
> > > would
> > > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > > dormant
> > > > > in JIRA, or revisit designs/proposals out in JIRA-attached google
> doc
> > > > that
> > > > > need socializing.
> > > > >
> > > > > You can only come if you are wearing your bullshit hat.
> > > > >
> > > > > Topics we'd go over could include:
> > > > >
> > > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > > + Current state of the offheaping of read path and alternate
> KeyValue
> > > > > implementation (Anoop/Ram)
> > > > > + Append rejigger (Elliott)
> > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > + Splitting meta/1M regions
> > > > > + The revived Backup (Vladimir)
> > > > > + Time (Enis)
> > > > > + The overloaded SequenceId (Stack)
> > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > + hbase-2.0.0
> > > > >
> > > > > I put names by folks I know could talk to the topic. If you want to
> > > take
> > > > > over a topic or put your name by one, just say.  Suggest that
> > > discussion
> > > > > lead off with a 5-10minute on current state of
> > > > > thought/design/implementation.
> > > > >
> > > > > What do others think?
> > > > >
> > > > > What date would suit folks?
> > > > >
> > > > > Anyone want to host?
> > > > >
> > > > > Thanks,
> > > > > Matteo and St.Ack
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by lars hofhansl <la...@apache.org>.

Works for me. I'll be back in the Bay Area the week of August 9th.
We have done a _lot_ of work on backups as well - ours are more complicated as we wanted fast per-tenant restores, so data is "grouped" by tenant. Would like to sync up on that (hopefully some of the folks who wrote most of the code will be in town, I'll check).

Also interested in the "Time" and "offheap" parts (although you folks usually do not like what I think about the offheap efforts :) ).
Would like to add the following topics:


- "Timestamp Resolution". Or making space for more bits in the timestamps (happy to cover that, unless it's part of the "Time" topic)


- "Replication". We found that replication cannot keep up with high write loads, due to the fact that replicated is strictly single threaded per regionserver (even though we have multiple region servers on the sink side)


- "Spark integration" (Ted Malaska?)


OK... Out now to make a "bullshit hat".

-- Lars

________________________________
From: Sean Busbey <bu...@cloudera.com>
To: dev <de...@hbase.apache.org> 
Sent: Tuesday, July 14, 2015 7:11 PM
Subject: Re: DISCUSSION: lets do a developer workshop on near-term work


I'm planning to be in the Bay area the week of the 24th of August.

-- 
Sean



On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:

> I can be up in your area in August.
>
> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> wrote:
> >
> > > Sounds good. It has been a while we did the talk-aton.
> > >
> > > I'll be off starting 25 of July, so I prefer something next week if
> > > possible.
> > >
> > > You ever coming back? If so, when? I'm back on 10th of August (Mikhail
> on
> > the 20th).
> > St.Ack
> >
> >
> >
> >
> > > Enis
> > >
> > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > There
> > > > is a bunch of stuff in flight at the moment (see below list) and it
> > would
> > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > dormant
> > > > in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> > > that
> > > > need socializing.
> > > >
> > > > You can only come if you are wearing your bullshit hat.
> > > >
> > > > Topics we'd go over could include:
> > > >
> > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > + Current state of the offheaping of read path and alternate KeyValue
> > > > implementation (Anoop/Ram)
> > > > + Append rejigger (Elliott)
> > > > + A Pv2-based Assign (Matteo/Steven)
> > > > + Splitting meta/1M regions
> > > > + The revived Backup (Vladimir)
> > > > + Time (Enis)
> > > > + The overloaded SequenceId (Stack)
> > > > + Upstreaming IT testing (Dima/Sean)
> > > > + hbase-2.0.0
> > > >
> > > > I put names by folks I know could talk to the topic. If you want to
> > take
> > > > over a topic or put your name by one, just say.  Suggest that
> > discussion
> > > > lead off with a 5-10minute on current state of
> > > > thought/design/implementation.
> > > >
> > > > What do others think?
> > > >
> > > > What date would suit folks?
> > > >
> > > > Anyone want to host?
> > > >
> > > > Thanks,
> > > > Matteo and St.Ack
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Enis Söztutar <en...@apache.org>.

BTW, we can look into hosting if no other volunteer.

Enis

On Wed, Jul 15, 2015 at 11:10 AM, Enis Söztutar <en...@gmail.com> wrote:

> Works for me as well. I'll be back by 15th of Aug.
>
> Enis
>
> On Tue, Jul 14, 2015 at 8:52 PM, Stack <st...@duboce.net> wrote:
>
>> On Tue, Jul 14, 2015 at 8:36 PM, Dima Spivak <ds...@cloudera.com>
>> wrote:
>>
>> > Works for me. I'd love to talk about some upstream integration testing
>> > stuff.
>> >
>> >
>> Chalk it down!
>> St.Ack
>>
>>
>>
>> > -Dima
>> >
>> > On Tue, Jul 14, 2015 at 10:11 PM, Stack <st...@duboce.net> wrote:
>> >
>> > > I suggest Weds, August 26th.
>> > > St.Ack
>> > >
>> > > On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com>
>> > wrote:
>> > >
>> > > > I'm planning to be in the Bay area the week of the 24th of August.
>> > > >
>> > > > --
>> > > > Sean
>> > > > On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
>> wrote:
>> > > >
>> > > > > I can be up in your area in August.
>> > > > >
>> > > > > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>> > > > >
>> > > > > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
>> enis.soz@gmail.com
>> > >
>> > > > > wrote:
>> > > > > >
>> > > > > > > Sounds good. It has been a while we did the talk-aton.
>> > > > > > >
>> > > > > > > I'll be off starting 25 of July, so I prefer something next
>> week
>> > if
>> > > > > > > possible.
>> > > > > > >
>> > > > > > > You ever coming back? If so, when? I'm back on 10th of August
>> > > > (Mikhail
>> > > > > on
>> > > > > > the 20th).
>> > > > > > St.Ack
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > > Enis
>> > > > > > >
>> > > > > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
>> wrote:
>> > > > > > >
>> > > > > > > > Matteo and I were thinking it time devs got together for a
>> > > pow-wow.
>> > > > > > There
>> > > > > > > > is a bunch of stuff in flight at the moment (see below list)
>> > and
>> > > it
>> > > > > > would
>> > > > > > > > be good to meet and whiteboard, surface goodo ideas that
>> have
>> > > gone
>> > > > > > > dormant
>> > > > > > > > in JIRA, or revisit designs/proposals out in JIRA-attached
>> > google
>> > > > doc
>> > > > > > > that
>> > > > > > > > need socializing.
>> > > > > > > >
>> > > > > > > > You can only come if you are wearing your bullshit hat.
>> > > > > > > >
>> > > > > > > > Topics we'd go over could include:
>> > > > > > > >
>> > > > > > > > + Our filesystem layout will not work if 1M regions
>> > > (Matteo/Stack)
>> > > > > > > > + Current state of the offheaping of read path and alternate
>> > > > KeyValue
>> > > > > > > > implementation (Anoop/Ram)
>> > > > > > > > + Append rejigger (Elliott)
>> > > > > > > > + A Pv2-based Assign (Matteo/Steven)
>> > > > > > > > + Splitting meta/1M regions
>> > > > > > > > + The revived Backup (Vladimir)
>> > > > > > > > + Time (Enis)
>> > > > > > > > + The overloaded SequenceId (Stack)
>> > > > > > > > + Upstreaming IT testing (Dima/Sean)
>> > > > > > > > + hbase-2.0.0
>> > > > > > > >
>> > > > > > > > I put names by folks I know could talk to the topic. If you
>> > want
>> > > to
>> > > > > > take
>> > > > > > > > over a topic or put your name by one, just say.  Suggest
>> that
>> > > > > > discussion
>> > > > > > > > lead off with a 5-10minute on current state of
>> > > > > > > > thought/design/implementation.
>> > > > > > > >
>> > > > > > > > What do others think?
>> > > > > > > >
>> > > > > > > > What date would suit folks?
>> > > > > > > >
>> > > > > > > > Anyone want to host?
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > Matteo and St.Ack
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards,
>> > > > >
>> > > > >    - Andy
>> > > > >
>> > > > > Problems worthy of attack prove their worth by hitting back. -
>> Piet
>> > > Hein
>> > > > > (via Tom White)
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Enis Söztutar <en...@gmail.com>.

Works for me as well. I'll be back by 15th of Aug.

Enis

On Tue, Jul 14, 2015 at 8:52 PM, Stack <st...@duboce.net> wrote:

> On Tue, Jul 14, 2015 at 8:36 PM, Dima Spivak <ds...@cloudera.com> wrote:
>
> > Works for me. I'd love to talk about some upstream integration testing
> > stuff.
> >
> >
> Chalk it down!
> St.Ack
>
>
>
> > -Dima
> >
> > On Tue, Jul 14, 2015 at 10:11 PM, Stack <st...@duboce.net> wrote:
> >
> > > I suggest Weds, August 26th.
> > > St.Ack
> > >
> > > On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com>
> > wrote:
> > >
> > > > I'm planning to be in the Bay area the week of the 24th of August.
> > > >
> > > > --
> > > > Sean
> > > > On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org>
> wrote:
> > > >
> > > > > I can be up in your area in August.
> > > > >
> > > > > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> enis.soz@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Sounds good. It has been a while we did the talk-aton.
> > > > > > >
> > > > > > > I'll be off starting 25 of July, so I prefer something next
> week
> > if
> > > > > > > possible.
> > > > > > >
> > > > > > > You ever coming back? If so, when? I'm back on 10th of August
> > > > (Mikhail
> > > > > on
> > > > > > the 20th).
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Enis
> > > > > > >
> > > > > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > > > Matteo and I were thinking it time devs got together for a
> > > pow-wow.
> > > > > > There
> > > > > > > > is a bunch of stuff in flight at the moment (see below list)
> > and
> > > it
> > > > > > would
> > > > > > > > be good to meet and whiteboard, surface goodo ideas that have
> > > gone
> > > > > > > dormant
> > > > > > > > in JIRA, or revisit designs/proposals out in JIRA-attached
> > google
> > > > doc
> > > > > > > that
> > > > > > > > need socializing.
> > > > > > > >
> > > > > > > > You can only come if you are wearing your bullshit hat.
> > > > > > > >
> > > > > > > > Topics we'd go over could include:
> > > > > > > >
> > > > > > > > + Our filesystem layout will not work if 1M regions
> > > (Matteo/Stack)
> > > > > > > > + Current state of the offheaping of read path and alternate
> > > > KeyValue
> > > > > > > > implementation (Anoop/Ram)
> > > > > > > > + Append rejigger (Elliott)
> > > > > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > > > > + Splitting meta/1M regions
> > > > > > > > + The revived Backup (Vladimir)
> > > > > > > > + Time (Enis)
> > > > > > > > + The overloaded SequenceId (Stack)
> > > > > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > > > > + hbase-2.0.0
> > > > > > > >
> > > > > > > > I put names by folks I know could talk to the topic. If you
> > want
> > > to
> > > > > > take
> > > > > > > > over a topic or put your name by one, just say.  Suggest that
> > > > > > discussion
> > > > > > > > lead off with a 5-10minute on current state of
> > > > > > > > thought/design/implementation.
> > > > > > > >
> > > > > > > > What do others think?
> > > > > > > >
> > > > > > > > What date would suit folks?
> > > > > > > >
> > > > > > > > Anyone want to host?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Matteo and St.Ack
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Stack <st...@duboce.net>.

On Tue, Jul 14, 2015 at 8:36 PM, Dima Spivak <ds...@cloudera.com> wrote:

> Works for me. I'd love to talk about some upstream integration testing
> stuff.
>
>
Chalk it down!
St.Ack



> -Dima
>
> On Tue, Jul 14, 2015 at 10:11 PM, Stack <st...@duboce.net> wrote:
>
> > I suggest Weds, August 26th.
> > St.Ack
> >
> > On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com>
> wrote:
> >
> > > I'm planning to be in the Bay area the week of the 24th of August.
> > >
> > > --
> > > Sean
> > > On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
> > >
> > > > I can be up in your area in August.
> > > >
> > > > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <enis.soz@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Sounds good. It has been a while we did the talk-aton.
> > > > > >
> > > > > > I'll be off starting 25 of July, so I prefer something next week
> if
> > > > > > possible.
> > > > > >
> > > > > > You ever coming back? If so, when? I'm back on 10th of August
> > > (Mikhail
> > > > on
> > > > > the 20th).
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Enis
> > > > > >
> > > > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > > > Matteo and I were thinking it time devs got together for a
> > pow-wow.
> > > > > There
> > > > > > > is a bunch of stuff in flight at the moment (see below list)
> and
> > it
> > > > > would
> > > > > > > be good to meet and whiteboard, surface goodo ideas that have
> > gone
> > > > > > dormant
> > > > > > > in JIRA, or revisit designs/proposals out in JIRA-attached
> google
> > > doc
> > > > > > that
> > > > > > > need socializing.
> > > > > > >
> > > > > > > You can only come if you are wearing your bullshit hat.
> > > > > > >
> > > > > > > Topics we'd go over could include:
> > > > > > >
> > > > > > > + Our filesystem layout will not work if 1M regions
> > (Matteo/Stack)
> > > > > > > + Current state of the offheaping of read path and alternate
> > > KeyValue
> > > > > > > implementation (Anoop/Ram)
> > > > > > > + Append rejigger (Elliott)
> > > > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > > > + Splitting meta/1M regions
> > > > > > > + The revived Backup (Vladimir)
> > > > > > > + Time (Enis)
> > > > > > > + The overloaded SequenceId (Stack)
> > > > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > > > + hbase-2.0.0
> > > > > > >
> > > > > > > I put names by folks I know could talk to the topic. If you
> want
> > to
> > > > > take
> > > > > > > over a topic or put your name by one, just say.  Suggest that
> > > > > discussion
> > > > > > > lead off with a 5-10minute on current state of
> > > > > > > thought/design/implementation.
> > > > > > >
> > > > > > > What do others think?
> > > > > > >
> > > > > > > What date would suit folks?
> > > > > > >
> > > > > > > Anyone want to host?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Matteo and St.Ack
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Dima Spivak <ds...@cloudera.com>.

Works for me. I'd love to talk about some upstream integration testing
stuff.

-Dima

On Tue, Jul 14, 2015 at 10:11 PM, Stack <st...@duboce.net> wrote:

> I suggest Weds, August 26th.
> St.Ack
>
> On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
> > I'm planning to be in the Bay area the week of the 24th of August.
> >
> > --
> > Sean
> > On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
> >
> > > I can be up in your area in August.
> > >
> > > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> > > wrote:
> > > >
> > > > > Sounds good. It has been a while we did the talk-aton.
> > > > >
> > > > > I'll be off starting 25 of July, so I prefer something next week if
> > > > > possible.
> > > > >
> > > > > You ever coming back? If so, when? I'm back on 10th of August
> > (Mikhail
> > > on
> > > > the 20th).
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Enis
> > > > >
> > > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > > > Matteo and I were thinking it time devs got together for a
> pow-wow.
> > > > There
> > > > > > is a bunch of stuff in flight at the moment (see below list) and
> it
> > > > would
> > > > > > be good to meet and whiteboard, surface goodo ideas that have
> gone
> > > > > dormant
> > > > > > in JIRA, or revisit designs/proposals out in JIRA-attached google
> > doc
> > > > > that
> > > > > > need socializing.
> > > > > >
> > > > > > You can only come if you are wearing your bullshit hat.
> > > > > >
> > > > > > Topics we'd go over could include:
> > > > > >
> > > > > > + Our filesystem layout will not work if 1M regions
> (Matteo/Stack)
> > > > > > + Current state of the offheaping of read path and alternate
> > KeyValue
> > > > > > implementation (Anoop/Ram)
> > > > > > + Append rejigger (Elliott)
> > > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > > + Splitting meta/1M regions
> > > > > > + The revived Backup (Vladimir)
> > > > > > + Time (Enis)
> > > > > > + The overloaded SequenceId (Stack)
> > > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > > + hbase-2.0.0
> > > > > >
> > > > > > I put names by folks I know could talk to the topic. If you want
> to
> > > > take
> > > > > > over a topic or put your name by one, just say.  Suggest that
> > > > discussion
> > > > > > lead off with a 5-10minute on current state of
> > > > > > thought/design/implementation.
> > > > > >
> > > > > > What do others think?
> > > > > >
> > > > > > What date would suit folks?
> > > > > >
> > > > > > Anyone want to host?
> > > > > >
> > > > > > Thanks,
> > > > > > Matteo and St.Ack
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Stack <st...@duboce.net>.

I suggest Weds, August 26th.
St.Ack

On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com> wrote:

> I'm planning to be in the Bay area the week of the 24th of August.
>
> --
> Sean
> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> > I can be up in your area in August.
> >
> > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> > wrote:
> > >
> > > > Sounds good. It has been a while we did the talk-aton.
> > > >
> > > > I'll be off starting 25 of July, so I prefer something next week if
> > > > possible.
> > > >
> > > > You ever coming back? If so, when? I'm back on 10th of August
> (Mikhail
> > on
> > > the 20th).
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Enis
> > > >
> > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > > There
> > > > > is a bunch of stuff in flight at the moment (see below list) and it
> > > would
> > > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > > dormant
> > > > > in JIRA, or revisit designs/proposals out in JIRA-attached google
> doc
> > > > that
> > > > > need socializing.
> > > > >
> > > > > You can only come if you are wearing your bullshit hat.
> > > > >
> > > > > Topics we'd go over could include:
> > > > >
> > > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > > + Current state of the offheaping of read path and alternate
> KeyValue
> > > > > implementation (Anoop/Ram)
> > > > > + Append rejigger (Elliott)
> > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > + Splitting meta/1M regions
> > > > > + The revived Backup (Vladimir)
> > > > > + Time (Enis)
> > > > > + The overloaded SequenceId (Stack)
> > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > + hbase-2.0.0
> > > > >
> > > > > I put names by folks I know could talk to the topic. If you want to
> > > take
> > > > > over a topic or put your name by one, just say.  Suggest that
> > > discussion
> > > > > lead off with a 5-10minute on current state of
> > > > > thought/design/implementation.
> > > > >
> > > > > What do others think?
> > > > >
> > > > > What date would suit folks?
> > > > >
> > > > > Anyone want to host?
> > > > >
> > > > > Thanks,
> > > > > Matteo and St.Ack
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Vladimir Rodionov <vl...@gmail.com>.

I am in Bay Area and have no plans to leave it (next two months), so any
date in July/Aug works for me.

-Vlad


On Tue, Jul 14, 2015 at 7:11 PM, Sean Busbey <bu...@cloudera.com> wrote:

> I'm planning to be in the Bay area the week of the 24th of August.
>
> --
> Sean
> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:
>
> > I can be up in your area in August.
> >
> > On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> > wrote:
> > >
> > > > Sounds good. It has been a while we did the talk-aton.
> > > >
> > > > I'll be off starting 25 of July, so I prefer something next week if
> > > > possible.
> > > >
> > > > You ever coming back? If so, when? I'm back on 10th of August
> (Mikhail
> > on
> > > the 20th).
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Enis
> > > >
> > > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > > There
> > > > > is a bunch of stuff in flight at the moment (see below list) and it
> > > would
> > > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > > dormant
> > > > > in JIRA, or revisit designs/proposals out in JIRA-attached google
> doc
> > > > that
> > > > > need socializing.
> > > > >
> > > > > You can only come if you are wearing your bullshit hat.
> > > > >
> > > > > Topics we'd go over could include:
> > > > >
> > > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > > + Current state of the offheaping of read path and alternate
> KeyValue
> > > > > implementation (Anoop/Ram)
> > > > > + Append rejigger (Elliott)
> > > > > + A Pv2-based Assign (Matteo/Steven)
> > > > > + Splitting meta/1M regions
> > > > > + The revived Backup (Vladimir)
> > > > > + Time (Enis)
> > > > > + The overloaded SequenceId (Stack)
> > > > > + Upstreaming IT testing (Dima/Sean)
> > > > > + hbase-2.0.0
> > > > >
> > > > > I put names by folks I know could talk to the topic. If you want to
> > > take
> > > > > over a topic or put your name by one, just say.  Suggest that
> > > discussion
> > > > > lead off with a 5-10minute on current state of
> > > > > thought/design/implementation.
> > > > >
> > > > > What do others think?
> > > > >
> > > > > What date would suit folks?
> > > > >
> > > > > Anyone want to host?
> > > > >
> > > > > Thanks,
> > > > > Matteo and St.Ack
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Sean Busbey <bu...@cloudera.com>.

I'm planning to be in the Bay area the week of the 24th of August.

-- 
Sean
On Jul 14, 2015 7:53 PM, "Andrew Purtell" <ap...@apache.org> wrote:

> I can be up in your area in August.
>
> On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com>
> wrote:
> >
> > > Sounds good. It has been a while we did the talk-aton.
> > >
> > > I'll be off starting 25 of July, so I prefer something next week if
> > > possible.
> > >
> > > You ever coming back? If so, when? I'm back on 10th of August (Mikhail
> on
> > the 20th).
> > St.Ack
> >
> >
> >
> >
> > > Enis
> > >
> > > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > Matteo and I were thinking it time devs got together for a pow-wow.
> > There
> > > > is a bunch of stuff in flight at the moment (see below list) and it
> > would
> > > > be good to meet and whiteboard, surface goodo ideas that have gone
> > > dormant
> > > > in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> > > that
> > > > need socializing.
> > > >
> > > > You can only come if you are wearing your bullshit hat.
> > > >
> > > > Topics we'd go over could include:
> > > >
> > > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > > + Current state of the offheaping of read path and alternate KeyValue
> > > > implementation (Anoop/Ram)
> > > > + Append rejigger (Elliott)
> > > > + A Pv2-based Assign (Matteo/Steven)
> > > > + Splitting meta/1M regions
> > > > + The revived Backup (Vladimir)
> > > > + Time (Enis)
> > > > + The overloaded SequenceId (Stack)
> > > > + Upstreaming IT testing (Dima/Sean)
> > > > + hbase-2.0.0
> > > >
> > > > I put names by folks I know could talk to the topic. If you want to
> > take
> > > > over a topic or put your name by one, just say.  Suggest that
> > discussion
> > > > lead off with a 5-10minute on current state of
> > > > thought/design/implementation.
> > > >
> > > > What do others think?
> > > >
> > > > What date would suit folks?
> > > >
> > > > Anyone want to host?
> > > >
> > > > Thanks,
> > > > Matteo and St.Ack
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Andrew Purtell <ap...@apache.org>.

I can be up in your area in August.

On Tue, Jul 14, 2015 at 5:31 PM, Stack <st...@duboce.net> wrote:

> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com> wrote:
>
> > Sounds good. It has been a while we did the talk-aton.
> >
> > I'll be off starting 25 of July, so I prefer something next week if
> > possible.
> >
> > You ever coming back? If so, when? I'm back on 10th of August (Mikhail on
> the 20th).
> St.Ack
>
>
>
>
> > Enis
> >
> > On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> >
> > > Matteo and I were thinking it time devs got together for a pow-wow.
> There
> > > is a bunch of stuff in flight at the moment (see below list) and it
> would
> > > be good to meet and whiteboard, surface goodo ideas that have gone
> > dormant
> > > in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> > that
> > > need socializing.
> > >
> > > You can only come if you are wearing your bullshit hat.
> > >
> > > Topics we'd go over could include:
> > >
> > > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > > + Current state of the offheaping of read path and alternate KeyValue
> > > implementation (Anoop/Ram)
> > > + Append rejigger (Elliott)
> > > + A Pv2-based Assign (Matteo/Steven)
> > > + Splitting meta/1M regions
> > > + The revived Backup (Vladimir)
> > > + Time (Enis)
> > > + The overloaded SequenceId (Stack)
> > > + Upstreaming IT testing (Dima/Sean)
> > > + hbase-2.0.0
> > >
> > > I put names by folks I know could talk to the topic. If you want to
> take
> > > over a topic or put your name by one, just say.  Suggest that
> discussion
> > > lead off with a 5-10minute on current state of
> > > thought/design/implementation.
> > >
> > > What do others think?
> > >
> > > What date would suit folks?
> > >
> > > Anyone want to host?
> > >
> > > Thanks,
> > > Matteo and St.Ack
> > >
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Stack <st...@duboce.net>.

On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <en...@gmail.com> wrote:

> Sounds good. It has been a while we did the talk-aton.
>
> I'll be off starting 25 of July, so I prefer something next week if
> possible.
>
> You ever coming back? If so, when? I'm back on 10th of August (Mikhail on
the 20th).
St.Ack




> Enis
>
> On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
>
> > Matteo and I were thinking it time devs got together for a pow-wow. There
> > is a bunch of stuff in flight at the moment (see below list) and it would
> > be good to meet and whiteboard, surface goodo ideas that have gone
> dormant
> > in JIRA, or revisit designs/proposals out in JIRA-attached google doc
> that
> > need socializing.
> >
> > You can only come if you are wearing your bullshit hat.
> >
> > Topics we'd go over could include:
> >
> > + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> > + Current state of the offheaping of read path and alternate KeyValue
> > implementation (Anoop/Ram)
> > + Append rejigger (Elliott)
> > + A Pv2-based Assign (Matteo/Steven)
> > + Splitting meta/1M regions
> > + The revived Backup (Vladimir)
> > + Time (Enis)
> > + The overloaded SequenceId (Stack)
> > + Upstreaming IT testing (Dima/Sean)
> > + hbase-2.0.0
> >
> > I put names by folks I know could talk to the topic. If you want to take
> > over a topic or put your name by one, just say.  Suggest that discussion
> > lead off with a 5-10minute on current state of
> > thought/design/implementation.
> >
> > What do others think?
> >
> > What date would suit folks?
> >
> > Anyone want to host?
> >
> > Thanks,
> > Matteo and St.Ack
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Enis Söztutar <en...@gmail.com>.

Sounds good. It has been a while we did the talk-aton.

I'll be off starting 25 of July, so I prefer something next week if
possible.

Enis

On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:

> Matteo and I were thinking it time devs got together for a pow-wow. There
> is a bunch of stuff in flight at the moment (see below list) and it would
> be good to meet and whiteboard, surface goodo ideas that have gone dormant
> in JIRA, or revisit designs/proposals out in JIRA-attached google doc that
> need socializing.
>
> You can only come if you are wearing your bullshit hat.
>
> Topics we'd go over could include:
>
> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> + Current state of the offheaping of read path and alternate KeyValue
> implementation (Anoop/Ram)
> + Append rejigger (Elliott)
> + A Pv2-based Assign (Matteo/Steven)
> + Splitting meta/1M regions
> + The revived Backup (Vladimir)
> + Time (Enis)
> + The overloaded SequenceId (Stack)
> + Upstreaming IT testing (Dima/Sean)
> + hbase-2.0.0
>
> I put names by folks I know could talk to the topic. If you want to take
> over a topic or put your name by one, just say.  Suggest that discussion
> lead off with a 5-10minute on current state of
> thought/design/implementation.
>
> What do others think?
>
> What date would suit folks?
>
> Anyone want to host?
>
> Thanks,
> Matteo and St.Ack
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Mikhail Antonov <ol...@gmail.com>.

I'd definitely join if I'm in SFBA at the time it all takes place
(will be out of country from 1st to 20th of August).

I'm like to chat about several topics under "scaling to 1M regions"
umbrella which are either in progress, or in investigation phase, or
in review, as well as 2.0 proposed topologies.

-Mikhail

On Tue, Jul 14, 2015 at 3:18 PM, Stack <st...@duboce.net> wrote:
> Matteo and I were thinking it time devs got together for a pow-wow. There
> is a bunch of stuff in flight at the moment (see below list) and it would
> be good to meet and whiteboard, surface goodo ideas that have gone dormant
> in JIRA, or revisit designs/proposals out in JIRA-attached google doc that
> need socializing.
>
> You can only come if you are wearing your bullshit hat.
>
> Topics we'd go over could include:
>
> + Our filesystem layout will not work if 1M regions (Matteo/Stack)
> + Current state of the offheaping of read path and alternate KeyValue
> implementation (Anoop/Ram)
> + Append rejigger (Elliott)
> + A Pv2-based Assign (Matteo/Steven)
> + Splitting meta/1M regions
> + The revived Backup (Vladimir)
> + Time (Enis)
> + The overloaded SequenceId (Stack)
> + Upstreaming IT testing (Dima/Sean)
> + hbase-2.0.0
>
> I put names by folks I know could talk to the topic. If you want to take
> over a topic or put your name by one, just say.  Suggest that discussion
> lead off with a 5-10minute on current state of
> thought/design/implementation.
>
> What do others think?
>
> What date would suit folks?
>
> Anyone want to host?
>
> Thanks,
> Matteo and St.Ack



-- 
Thanks,
Michael Antonov