You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Stack <st...@duboce.net> on 2015/08/11 20:23:13 UTC

Re: DISCUSSION: lets do a developer workshop on near-term work

On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <sy...@gmail.com>
wrote:

> [Let us move back to the main topic - a meeting to talk about the next
> direction on HBASE development]
>
> Are we firm on the *August 26th* meeting date?
>
> Given the long list of topics from St.Ack, even a one day meeting might
> not cover all of them (in depth).  We need to either trim the topic list or
> limit the time to discuss a single topic (30 min for one topic enough?).
>
>
Thanks for bringing us back to topic Stephen.

Yes, lets do 26th. Speak up if this does not suit. I will file a meetup
page in an hour or so. Where should we do it? Enis offered his nice place.
Could try and get space at ours too... in Palo Alto (less 'deep south', a
little easier for the SFers).

As to too many topics, in my experience, a bunch of smelly engineers all in
a room starts to fall apart after a couple of hours especially when ranging
discussion. Suggest we cut the time-per-topic and list of topics so can do
in an afternoon. If some topics are too fat, can do break out or put-off to
another day and smaller, interested group.

St.Ack




> Thanks
> Stephen
>
>
> On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <an...@gmail.com> wrote:
>
>> We will be doing some more large data tests in coming week Andy..   Will
>> report back more.  Also will do a write up , in what all ways the work
>> might help us.  As Sean said, we will continue in another thread if any
>> thing further..  Will soon write back on the test result.  Thanks.
>>
>> -Anoop-
>>
>> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <andrew.purtell@gmail.com
>> >
>> wrote:
>>
>> > Cool, thanks.
>> >
>> > Is a 20% latency reduction the most we can expect or do you think there
>> is
>> > room for more improvement? Just curious.
>> >
>> > Is latency reduction the only goal? Anything here about supporting
>> larger
>> > heaps? Is there something we can measure in that regard?
>> >
>> > Hope you see my point and there's enough here to prime a goals and
>> metrics
>> > discussion at the pow wow or on the relevant JIRAs.
>> >
>> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > Hi Andy
>> > >
>> > > Based on our POCs done, we expect around 20% improvement in latency.
>> For
>> > > scans it will be little lesser than 20%.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > >
>> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
>> > andrew.purtell@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi Ram,
>> > >>
>> > >> Do you have any targets for what you are measuring? What are the
>> goals
>> > you
>> > >> guys are working toward with the off heaping changes?
>> > >>
>> > >>
>> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
>> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >>>
>> > >>> Thanks Vladimir.
>> > >>> Yeah, the reports that were attached specifically captured the
>> 95/99th
>> > >>> percentile.
>> > >>> The reason for checking the server side perf was to specifically see
>> > the
>> > >>> improvement in the server side and also the client was sending large
>> > >>> results in multiple threads. So wanted to avoid the n/w
>> interference. I
>> > >>> think it was a general practice that we were following.
>> > >>> We Wil do some more tests and get some latest readings with bigger
>> data
>> > >>> sets.
>> > >>> Sent from mobile.
>> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
>> andrew.purtell@gmail.com>
>> > >> wrote:
>> > >>>>
>> > >>>> +1
>> > >>>>
>> > >>>> Yeah, something like that, with aspirational targets for
>> improvement
>> > >> from
>> > >>>> current releases. Then what to measure, the tests to run, and
>> criteria
>> > >> for
>> > >>>> evaluation are clear and organized and we're able to better assess
>> how
>> > >> the
>> > >>>> work in progress is meeting its goals (or not)
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
>> > vladrodionov@gmail.com
>> > >>>
>> > >>>> wrote:
>> > >>>>
>> > >>>>>>> Umbrella jira to make sure we can have blocks cached in offheap
>> > >> backed
>> > >>>>> cache. In the entire read path, we can refer to this offheap
>> buffer
>> > and
>> > >>>>> avoid onheap copying.
>> > >>>>>
>> > >>>>> I think, on a read path, the most important improvement we could
>> > >> imagine
>> > >>>> is
>> > >>>>> elimination or reducing of object creations (KVs, iterators etc).
>> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API
>> change
>> > >>>> etc.
>> > >>>>> If this is a part of this JIRA, then I would easily define a goal:
>> > >>>>> improving 95/99% latency of a read operations. Not performance,
>> but
>> > >>>> latency
>> > >>>>> matters
>> > >>>>>
>> > >>>>> -Vlad
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
>> > >>>> andrew.purtell@gmail.com>
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> That's not a realistic or useful test scenario, unless the goal
>> is
>> > to
>> > >>>>>> accelerate queries where all cells are filtered at the server.
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <anoop.hbase@gmail.com
>> >
>> > >>>> wrote:
>> > >>>>>>>
>> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we
>> have
>> > >>>> added
>> > >>>>>>> perf numbers in a cluster testing.  This was done using PE get
>> and
>> > >> scan
>> > >>>>>>> tests with filtering all cells at server (to not consider n/w
>> > >> bandwidth
>> > >>>>>>> constraints)
>> > >>>>>>>
>> > >>>>>>> -Anoop-
>> > >>>>>>>
>> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
>> > >>>>>> andrew.purtell@gmail.com>
>> > >>>>>>> wrote:
>> > >>>>>>>
>> > >>>>>>>> We have some microbenchmarks, not evidence of differences seen
>> > from
>> > >> a
>> > >>>>>>>> client application. I'm not saying that microbenchmarks are not
>> > >>>> totally
>> > >>>>>>>> necessary and a great start - they are - but that they don't
>> > measure
>> > >>>> an
>> > >>>>>> end
>> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't
>> have a
>> > >>>> JIRA
>> > >>>>>> or
>> > >>>>>>>> design doc that states a clear end goal metric like the
>> strawman I
>> > >>>> threw
>> > >>>>>>>> together in my previous mail. A measurable system level goal
>> and
>> > >> some
>> > >>>>>> data
>> > >>>>>>>> from full cluster testing would go a lot further toward letting
>> > all
>> > >> of
>> > >>>>>> us
>> > >>>>>>>> evaluate the potential and payoff of the work. In the meantime
>> we
>> > >>>> should
>> > >>>>>>>> probably be assembling these changes on a branch instead of in
>> > >> trunk,
>> > >>>>>> for
>> > >>>>>>>> as long as the goal is not clearly defined and the payoff and
>> > >>>> potential
>> > >>>>>> for
>> > >>>>>>>> perf regressions is untested and unknown.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
>> anoop.hbase@gmail.com>
>> > >>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached which
>> > >>>> contains
>> > >>>>>>>> some
>> > >>>>>>>>> perf gain numbers..  We will be doing more tests in next 2
>> weeks
>> > >>>>>> (before
>> > >>>>>>>>> end of this month) and will publish them.   Yes it will be
>> great
>> > if
>> > >>>> it
>> > >>>>>> is
>> > >>>>>>>>> more IST friendly time :-)
>> > >>>>>>>>>
>> > >>>>>>>>> -Anoop-
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
>> > >>>>>>>> andrew.purtell@gmail.com>
>> > >>>>>>>>> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
>> > always
>> > >>>>>> argue
>> > >>>>>>>>>> both side of a discussion and to never take sides easily
>> (drives
>> > >>>> some
>> > >>>>>>>> folks
>> > >>>>>>>>>> crazy).
>> > >>>>>>>>>>
>> > >>>>>>>>>> I can vouch for this (smile)
>> > >>>>>>>>>>
>> > >>>>>>>>>> I also can offer support for off heaping there. At the same
>> time
>> > >> we
>> > >>>> do
>> > >>>>>>>>>> have a gap where we can't point to a timeline of improvements
>> > >> (yet,
>> > >>>>>>>> anyway)
>> > >>>>>>>>>> with benchmarks showing gains where your goals need them. For
>> > >>>> example,
>> > >>>>>>>>>> stock HBase in one JVM can address max N GB for response time
>> > >>>>>>>> distribution
>> > >>>>>>>>>> D; dev version of HBase in off heap branch can address max
>> N' GB
>> > >> for
>> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
>> > >>>>>> statistically
>> > >>>>>>>>>> shows better/lower response times).
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
>> larsh@apache.org>
>> > >>>> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I'm in favor of anything that improves performance (and
>> > >> preferably
>> > >>>>>>>>>> doesn't set us back into a world that's worse than C due to
>> the
>> > >> lack
>> > >>>>>> of
>> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just
>> that
>> > I'm
>> > >>>>>>>> perhaps
>> > >>>>>>>>>> asking for more numbers and justification in weighing the
>> pros
>> > and
>> > >>>>>> cons.
>> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
>> > always
>> > >>>>>> argue
>> > >>>>>>>>>> both side of a discussion and to never take sides easily
>> (drives
>> > >>>> some
>> > >>>>>>>> folks
>> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed :)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there is
>> a
>> > >>>>>> fighting
>> > >>>>>>>>>> chance that folks on IST can participate. I know that some of
>> > our
>> > >>>>>> folks
>> > >>>>>>>> on
>> > >>>>>>>>>> IST would love to participate in the backup discussion).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF. I'd
>> > just
>> > >>>>>> need
>> > >>>>>>>>>> an approx. number of folks.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> -- Lars
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> From: ramkrishna vasudevan <
>> ramkrishna.s.vasudevan@gmail.com>
>> > >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
>> > >> hofhansl <
>> > >>>>>>>>>> larsh@apache.org>
>> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
>> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
>> > >> near-term
>> > >>>>>> work
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Hi
>> > >>>>>>>>>>> What time will it be on August 26th?
>> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of this
>> > >>>>>> offheaping
>> > >>>>>>>>>> stuff.  May be if we (from India) can attend this meeting
>> > remotely
>> > >>>>>> your
>> > >>>>>>>>>> thoughts can be discussed and also the current state of this
>> > work.
>> > >>>>>>>>>>> RegardsRam
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
>> > larsh@apache.org
>> > >>>
>> > >>>>>>>> wrote:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of
>> August
>> > >> 9th.
>> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are
>> more
>> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data is
>> > >>>>>> "grouped"
>> > >>>>>>>> by
>> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of the
>> > folks
>> > >>>> who
>> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts (although
>> you
>> > >>>> folks
>> > >>>>>>>>>> usually do not like what I think about the offheap efforts
>> :) ).
>> > >>>>>>>>>>> Would like to add the following topics:
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in
>> the
>> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the
>> "Time"
>> > >>>> topic)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> - "Replication". We found that replication cannot keep up
>> with
>> > >> high
>> > >>>>>>>>>> write loads, due to the fact that replicated is strictly
>> single
>> > >>>>>> threaded
>> > >>>>>>>>>> per regionserver (even though we have multiple region
>> servers on
>> > >> the
>> > >>>>>>>> sink
>> > >>>>>>>>>> side)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> -- Lars
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> ________________________________
>> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
>> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
>> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
>> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
>> > >> near-term
>> > >>>>>> work
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
>> > >> August.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> --
>> > >>>>>>>>>>> Sean
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>
>> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
>> > apurtell@apache.org>
>> > >>>>>>>> wrote:
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> I can be up in your area in August.
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <stack@duboce.net
>> >
>> > >>>> wrote:
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
>> > >>>>>> enis.soz@gmail.com>
>> > >>>>>>>>>>>>> wrote:
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something
>> next
>> > >> week
>> > >>>>>> if
>> > >>>>>>>>>>>>>> possible.
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of
>> > August
>> > >>>>>>>> (Mikhail
>> > >>>>>>>>>>>> on
>> > >>>>>>>>>>>>> the 20th).
>> > >>>>>>>>>>>>> St.Ack
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>
>> > >>>>>>>>>>>>>> Enis
>> > >>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
>> stack@duboce.net>
>> > >>>> wrote:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together
>> for a
>> > >>>>>> pow-wow.
>> > >>>>>>>>>>>>> There
>> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below
>> > list)
>> > >>>> and
>> > >>>>>> it
>> > >>>>>>>>>>>>> would
>> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas that
>> > have
>> > >>>>>> gone
>> > >>>>>>>>>>>>>> dormant
>> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
>> JIRA-attached
>> > >>>> google
>> > >>>>>>>> doc
>> > >>>>>>>>>>>>>> that
>> > >>>>>>>>>>>>>>> need socializing.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit hat.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Topics we'd go over could include:
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
>> > >>>>>> (Matteo/Stack)
>> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
>> > alternate
>> > >>>>>>>> KeyValue
>> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
>> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
>> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
>> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
>> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
>> > >>>>>>>>>>>>>>> + Time (Enis)
>> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
>> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
>> > >>>>>>>>>>>>>>> + hbase-2.0.0
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic. If
>> you
>> > >>>> want
>> > >>>>>> to
>> > >>>>>>>>>>>>> take
>> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.  Suggest
>> > that
>> > >>>>>>>>>>>>> discussion
>> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
>> > >>>>>>>>>>>>>>> thought/design/implementation.
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> What do others think?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> What date would suit folks?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Anyone want to host?
>> > >>>>>>>>>>>>>>>
>> > >>>>>>>>>>>>>>> Thanks,
>> > >>>>>>>>>>>>>>> Matteo and St.Ack
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> --
>> > >>>>>>>>>>>> Best regards,
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> - Andy
>> > >>>>>>>>>>>>
>> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting
>> back. -
>> > >>>> Piet
>> > >>>>>>>> Hein
>> > >>>>>>>>>>>> (via Tom White)
>> > >>
>> >
>>
>
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Biju N <bi...@gmail.com>.
Thanks a lot Stack.

On Thu, Aug 20, 2015 at 2:37 PM, Stack <st...@duboce.net> wrote:

> On Thu, Aug 20, 2015 at 11:13 AM, Biju N <bi...@gmail.com> wrote:
>
> > Is there a way to participate remotely or at least listen in to this
> > meet-up? There will be at least a few who will be interested to dial in
> > from the east coast.
> >
> >
>
> Should be able to get you at least audio. Will post something here on this
> thread and to the meetup page just before the meeting starts.
> St.Ack
>
>
>
> > On Wed, Aug 12, 2015 at 3:29 PM, Stack <st...@duboce.net> wrote:
> >
> > > I posted this meetup notice:
> > > http://www.meetup.com/hackathon/events/224589819/
> > > St.Ack
> > >
> > > On Wed, Aug 12, 2015 at 1:34 AM, Enis Söztutar <en...@apache.org>
> wrote:
> > >
> > > > Agreed, too many fat topics, but all important. I guess we can spend
> > > first
> > > > 10-20 mins on the agenda based on who is in the room and come up
> with a
> > > > shorter list and go from there.
> > > >
> > > > Enis
> > > >
> > > > On Tue, Aug 11, 2015 at 9:23 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <
> > > syuanjiangdev@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > [Let us move back to the main topic - a meeting to talk about the
> > > next
> > > > > > direction on HBASE development]
> > > > > >
> > > > > > Are we firm on the *August 26th* meeting date?
> > > > > >
> > > > > > Given the long list of topics from St.Ack, even a one day meeting
> > > might
> > > > > > not cover all of them (in depth).  We need to either trim the
> topic
> > > > list
> > > > > or
> > > > > > limit the time to discuss a single topic (30 min for one topic
> > > > enough?).
> > > > > >
> > > > > >
> > > > > Thanks for bringing us back to topic Stephen.
> > > > >
> > > > > Yes, lets do 26th. Speak up if this does not suit. I will file a
> > meetup
> > > > > page in an hour or so. Where should we do it? Enis offered his nice
> > > > place.
> > > > > Could try and get space at ours too... in Palo Alto (less 'deep
> > > south', a
> > > > > little easier for the SFers).
> > > > >
> > > > > As to too many topics, in my experience, a bunch of smelly
> engineers
> > > all
> > > > in
> > > > > a room starts to fall apart after a couple of hours especially when
> > > > ranging
> > > > > discussion. Suggest we cut the time-per-topic and list of topics so
> > can
> > > > do
> > > > > in an afternoon. If some topics are too fat, can do break out or
> > > put-off
> > > > to
> > > > > another day and smaller, interested group.
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > Thanks
> > > > > > Stephen
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <
> anoop.hbase@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> We will be doing some more large data tests in coming week
> Andy..
> > > >  Will
> > > > > >> report back more.  Also will do a write up , in what all ways
> the
> > > work
> > > > > >> might help us.  As Sean said, we will continue in another thread
> > if
> > > > any
> > > > > >> thing further..  Will soon write back on the test result.
> Thanks.
> > > > > >>
> > > > > >> -Anoop-
> > > > > >>
> > > > > >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> > > > > andrew.purtell@gmail.com
> > > > > >> >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Cool, thanks.
> > > > > >> >
> > > > > >> > Is a 20% latency reduction the most we can expect or do you
> > think
> > > > > there
> > > > > >> is
> > > > > >> > room for more improvement? Just curious.
> > > > > >> >
> > > > > >> > Is latency reduction the only goal? Anything here about
> > supporting
> > > > > >> larger
> > > > > >> > heaps? Is there something we can measure in that regard?
> > > > > >> >
> > > > > >> > Hope you see my point and there's enough here to prime a goals
> > and
> > > > > >> metrics
> > > > > >> > discussion at the pow wow or on the relevant JIRAs.
> > > > > >> >
> > > > > >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> > > > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > >> > >
> > > > > >> > > Hi Andy
> > > > > >> > >
> > > > > >> > > Based on our POCs done, we expect around 20% improvement in
> > > > latency.
> > > > > >> For
> > > > > >> > > scans it will be little lesser than 20%.
> > > > > >> > >
> > > > > >> > > Regards
> > > > > >> > > Ram
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> > > > > >> > andrew.purtell@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > >> Hi Ram,
> > > > > >> > >>
> > > > > >> > >> Do you have any targets for what you are measuring? What
> are
> > > the
> > > > > >> goals
> > > > > >> > you
> > > > > >> > >> guys are working toward with the off heaping changes?
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > > > > >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > > >> > >>>
> > > > > >> > >>> Thanks Vladimir.
> > > > > >> > >>> Yeah, the reports that were attached specifically captured
> > the
> > > > > >> 95/99th
> > > > > >> > >>> percentile.
> > > > > >> > >>> The reason for checking the server side perf was to
> > > specifically
> > > > > see
> > > > > >> > the
> > > > > >> > >>> improvement in the server side and also the client was
> > sending
> > > > > large
> > > > > >> > >>> results in multiple threads. So wanted to avoid the n/w
> > > > > >> interference. I
> > > > > >> > >>> think it was a general practice that we were following.
> > > > > >> > >>> We Wil do some more tests and get some latest readings
> with
> > > > bigger
> > > > > >> data
> > > > > >> > >>> sets.
> > > > > >> > >>> Sent from mobile.
> > > > > >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> > > > > >> andrew.purtell@gmail.com>
> > > > > >> > >> wrote:
> > > > > >> > >>>>
> > > > > >> > >>>> +1
> > > > > >> > >>>>
> > > > > >> > >>>> Yeah, something like that, with aspirational targets for
> > > > > >> improvement
> > > > > >> > >> from
> > > > > >> > >>>> current releases. Then what to measure, the tests to run,
> > and
> > > > > >> criteria
> > > > > >> > >> for
> > > > > >> > >>>> evaluation are clear and organized and we're able to
> better
> > > > > assess
> > > > > >> how
> > > > > >> > >> the
> > > > > >> > >>>> work in progress is meeting its goals (or not)
> > > > > >> > >>>>
> > > > > >> > >>>>
> > > > > >> > >>>>
> > > > > >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> > > > > >> > vladrodionov@gmail.com
> > > > > >> > >>>
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>
> > > > > >> > >>>>>>> Umbrella jira to make sure we can have blocks cached
> in
> > > > > offheap
> > > > > >> > >> backed
> > > > > >> > >>>>> cache. In the entire read path, we can refer to this
> > offheap
> > > > > >> buffer
> > > > > >> > and
> > > > > >> > >>>>> avoid onheap copying.
> > > > > >> > >>>>>
> > > > > >> > >>>>> I think, on a read path, the most important improvement
> we
> > > > could
> > > > > >> > >> imagine
> > > > > >> > >>>> is
> > > > > >> > >>>>> elimination or reducing of object creations (KVs,
> > iterators
> > > > > etc).
> > > > > >> > >>>>> object reuse, byte buffers reuse or offheap buffers
> reuse,
> > > API
> > > > > >> change
> > > > > >> > >>>> etc.
> > > > > >> > >>>>> If this is a part of this JIRA, then I would easily
> > define a
> > > > > goal:
> > > > > >> > >>>>> improving 95/99% latency of a read operations. Not
> > > > performance,
> > > > > >> but
> > > > > >> > >>>> latency
> > > > > >> > >>>>> matters
> > > > > >> > >>>>>
> > > > > >> > >>>>> -Vlad
> > > > > >> > >>>>>
> > > > > >> > >>>>>
> > > > > >> > >>>>>
> > > > > >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > > > > >> > >>>> andrew.purtell@gmail.com>
> > > > > >> > >>>>> wrote:
> > > > > >> > >>>>>
> > > > > >> > >>>>>> That's not a realistic or useful test scenario, unless
> > the
> > > > goal
> > > > > >> is
> > > > > >> > to
> > > > > >> > >>>>>> accelerate queries where all cells are filtered at the
> > > > server.
> > > > > >> > >>>>>>
> > > > > >> > >>>>>>
> > > > > >> > >>>>>>
> > > > > >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> > > > > anoop.hbase@gmail.com
> > > > > >> >
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>>>>
> > > > > >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end
> of
> > > it,
> > > > we
> > > > > >> have
> > > > > >> > >>>> added
> > > > > >> > >>>>>>> perf numbers in a cluster testing.  This was done
> using
> > PE
> > > > get
> > > > > >> and
> > > > > >> > >> scan
> > > > > >> > >>>>>>> tests with filtering all cells at server (to not
> > consider
> > > > n/w
> > > > > >> > >> bandwidth
> > > > > >> > >>>>>>> constraints)
> > > > > >> > >>>>>>>
> > > > > >> > >>>>>>> -Anoop-
> > > > > >> > >>>>>>>
> > > > > >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > > > > >> > >>>>>> andrew.purtell@gmail.com>
> > > > > >> > >>>>>>> wrote:
> > > > > >> > >>>>>>>
> > > > > >> > >>>>>>>> We have some microbenchmarks, not evidence of
> > differences
> > > > > seen
> > > > > >> > from
> > > > > >> > >> a
> > > > > >> > >>>>>>>> client application. I'm not saying that
> microbenchmarks
> > > are
> > > > > not
> > > > > >> > >>>> totally
> > > > > >> > >>>>>>>> necessary and a great start - they are - but that
> they
> > > > don't
> > > > > >> > measure
> > > > > >> > >>>> an
> > > > > >> > >>>>>> end
> > > > > >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we
> > > don't
> > > > > >> have a
> > > > > >> > >>>> JIRA
> > > > > >> > >>>>>> or
> > > > > >> > >>>>>>>> design doc that states a clear end goal metric like
> the
> > > > > >> strawman I
> > > > > >> > >>>> threw
> > > > > >> > >>>>>>>> together in my previous mail. A measurable system
> level
> > > > goal
> > > > > >> and
> > > > > >> > >> some
> > > > > >> > >>>>>> data
> > > > > >> > >>>>>>>> from full cluster testing would go a lot further
> toward
> > > > > letting
> > > > > >> > all
> > > > > >> > >> of
> > > > > >> > >>>>>> us
> > > > > >> > >>>>>>>> evaluate the potential and payoff of the work. In the
> > > > > meantime
> > > > > >> we
> > > > > >> > >>>> should
> > > > > >> > >>>>>>>> probably be assembling these changes on a branch
> > instead
> > > of
> > > > > in
> > > > > >> > >> trunk,
> > > > > >> > >>>>>> for
> > > > > >> > >>>>>>>> as long as the goal is not clearly defined and the
> > payoff
> > > > and
> > > > > >> > >>>> potential
> > > > > >> > >>>>>> for
> > > > > >> > >>>>>>>> perf regressions is untested and unknown.
> > > > > >> > >>>>>>>>
> > > > > >> > >>>>>>>>
> > > > > >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
> > > > > >> anoop.hbase@gmail.com>
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>>>>>>
> > > > > >> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc
> > attached
> > > > > which
> > > > > >> > >>>> contains
> > > > > >> > >>>>>>>> some
> > > > > >> > >>>>>>>>> perf gain numbers..  We will be doing more tests in
> > > next 2
> > > > > >> weeks
> > > > > >> > >>>>>> (before
> > > > > >> > >>>>>>>>> end of this month) and will publish them.   Yes it
> > will
> > > be
> > > > > >> great
> > > > > >> > if
> > > > > >> > >>>> it
> > > > > >> > >>>>>> is
> > > > > >> > >>>>>>>>> more IST friendly time :-)
> > > > > >> > >>>>>>>>>
> > > > > >> > >>>>>>>>> -Anoop-
> > > > > >> > >>>>>>>>>
> > > > > >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > > > > >> > >>>>>>>> andrew.purtell@gmail.com>
> > > > > >> > >>>>>>>>> wrote:
> > > > > >> > >>>>>>>>>
> > > > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've
> been
> > > > known
> > > > > >> > always
> > > > > >> > >>>>>> argue
> > > > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> > > easily
> > > > > >> (drives
> > > > > >> > >>>> some
> > > > > >> > >>>>>>>> folks
> > > > > >> > >>>>>>>>>> crazy).
> > > > > >> > >>>>>>>>>>
> > > > > >> > >>>>>>>>>> I can vouch for this (smile)
> > > > > >> > >>>>>>>>>>
> > > > > >> > >>>>>>>>>> I also can offer support for off heaping there. At
> > the
> > > > same
> > > > > >> time
> > > > > >> > >> we
> > > > > >> > >>>> do
> > > > > >> > >>>>>>>>>> have a gap where we can't point to a timeline of
> > > > > improvements
> > > > > >> > >> (yet,
> > > > > >> > >>>>>>>> anyway)
> > > > > >> > >>>>>>>>>> with benchmarks showing gains where your goals need
> > > them.
> > > > > For
> > > > > >> > >>>> example,
> > > > > >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for
> > > response
> > > > > time
> > > > > >> > >>>>>>>> distribution
> > > > > >> > >>>>>>>>>> D; dev version of HBase in off heap branch can
> > address
> > > > max
> > > > > >> N' GB
> > > > > >> > >> for
> > > > > >> > >>>>>>>>>> distribution D', where N' > N and D > D'
> > (distribution
> > > D'
> > > > > >> > >>>>>> statistically
> > > > > >> > >>>>>>>>>> shows better/lower response times).
> > > > > >> > >>>>>>>>>>
> > > > > >> > >>>>>>>>>>
> > > > > >> > >>>>>>>>>>
> > > > > >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
> > > > > >> larsh@apache.org>
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> I'm in favor of anything that improves performance
> > > (and
> > > > > >> > >> preferably
> > > > > >> > >>>>>>>>>> doesn't set us back into a world that's worse than
> C
> > > due
> > > > to
> > > > > >> the
> > > > > >> > >> lack
> > > > > >> > >>>>>> of
> > > > > >> > >>>>>>>>>> pointers in Java).Never said "I don't like it",
> it's
> > > just
> > > > > >> that
> > > > > >> > I'm
> > > > > >> > >>>>>>>> perhaps
> > > > > >> > >>>>>>>>>> asking for more numbers and justification in
> weighing
> > > the
> > > > > >> pros
> > > > > >> > and
> > > > > >> > >>>>>> cons.
> > > > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've
> been
> > > > known
> > > > > >> > always
> > > > > >> > >>>>>> argue
> > > > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> > > easily
> > > > > >> (drives
> > > > > >> > >>>> some
> > > > > >> > >>>>>>>> folks
> > > > > >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where
> > > needed
> > > > > :)
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so
> > > there
> > > > > is
> > > > > >> a
> > > > > >> > >>>>>> fighting
> > > > > >> > >>>>>>>>>> chance that folks on IST can participate. I know
> that
> > > > some
> > > > > of
> > > > > >> > our
> > > > > >> > >>>>>> folks
> > > > > >> > >>>>>>>> on
> > > > > >> > >>>>>>>>>> IST would love to participate in the backup
> > > discussion).
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in
> Downtown
> > > SF.
> > > > > I'd
> > > > > >> > just
> > > > > >> > >>>>>> need
> > > > > >> > >>>>>>>>>> an approx. number of folks.
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> -- Lars
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> From: ramkrishna vasudevan <
> > > > > >> ramkrishna.s.vasudevan@gmail.com>
> > > > > >> > >>>>>>>>>>> To: "dev@hbase.apache.org" <dev@hbase.apache.org
> >;
> > > lars
> > > > > >> > >> hofhansl <
> > > > > >> > >>>>>>>>>> larsh@apache.org>
> > > > > >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > > > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
> > workshop
> > > on
> > > > > >> > >> near-term
> > > > > >> > >>>>>> work
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> Hi
> > > > > >> > >>>>>>>>>>> What time will it be on August 26th?
> > > > > >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in
> favour
> > > of
> > > > > this
> > > > > >> > >>>>>> offheaping
> > > > > >> > >>>>>>>>>> stuff.  May be if we (from India) can attend this
> > > meeting
> > > > > >> > remotely
> > > > > >> > >>>>>> your
> > > > > >> > >>>>>>>>>> thoughts can be discussed and also the current
> state
> > of
> > > > > this
> > > > > >> > work.
> > > > > >> > >>>>>>>>>>> RegardsRam
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> > > > > >> > larsh@apache.org
> > > > > >> > >>>
> > > > > >> > >>>>>>>> wrote:
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the
> week
> > of
> > > > > >> August
> > > > > >> > >> 9th.
> > > > > >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well -
> > ours
> > > > are
> > > > > >> more
> > > > > >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores,
> so
> > > > data
> > > > > is
> > > > > >> > >>>>>> "grouped"
> > > > > >> > >>>>>>>> by
> > > > > >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully
> some
> > > of
> > > > > the
> > > > > >> > folks
> > > > > >> > >>>> who
> > > > > >> > >>>>>>>>>> wrote most of the code will be in town, I'll
> check).
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts
> > > > > (although
> > > > > >> you
> > > > > >> > >>>> folks
> > > > > >> > >>>>>>>>>> usually do not like what I think about the offheap
> > > > efforts
> > > > > >> :) ).
> > > > > >> > >>>>>>>>>>> Would like to add the following topics:
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more
> > > bits
> > > > in
> > > > > >> the
> > > > > >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part
> of
> > > the
> > > > > >> "Time"
> > > > > >> > >>>> topic)
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> - "Replication". We found that replication cannot
> > keep
> > > > up
> > > > > >> with
> > > > > >> > >> high
> > > > > >> > >>>>>>>>>> write loads, due to the fact that replicated is
> > > strictly
> > > > > >> single
> > > > > >> > >>>>>> threaded
> > > > > >> > >>>>>>>>>> per regionserver (even though we have multiple
> region
> > > > > >> servers on
> > > > > >> > >> the
> > > > > >> > >>>>>>>> sink
> > > > > >> > >>>>>>>>>> side)
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> -- Lars
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> ________________________________
> > > > > >> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> > > > > >> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> > > > > >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > > > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
> > workshop
> > > on
> > > > > >> > >> near-term
> > > > > >> > >>>>>> work
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the
> > > 24th
> > > > of
> > > > > >> > >> August.
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>> --
> > > > > >> > >>>>>>>>>>> Sean
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> > > > > >> > apurtell@apache.org>
> > > > > >> > >>>>>>>> wrote:
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>> I can be up in your area in August.
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <
> > > > > stack@duboce.net
> > > > > >> >
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar
> <
> > > > > >> > >>>>>> enis.soz@gmail.com>
> > > > > >> > >>>>>>>>>>>>> wrote:
> > > > > >> > >>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the
> > > > talk-aton.
> > > > > >> > >>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer
> > > > something
> > > > > >> next
> > > > > >> > >> week
> > > > > >> > >>>>>> if
> > > > > >> > >>>>>>>>>>>>>> possible.
> > > > > >> > >>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on
> > 10th
> > > > of
> > > > > >> > August
> > > > > >> > >>>>>>>> (Mikhail
> > > > > >> > >>>>>>>>>>>> on
> > > > > >> > >>>>>>>>>>>>> the 20th).
> > > > > >> > >>>>>>>>>>>>> St.Ack
> > > > > >> > >>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>> Enis
> > > > > >> > >>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
> > > > > >> stack@duboce.net>
> > > > > >> > >>>> wrote:
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got
> > > together
> > > > > >> for a
> > > > > >> > >>>>>> pow-wow.
> > > > > >> > >>>>>>>>>>>>> There
> > > > > >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment
> (see
> > > > below
> > > > > >> > list)
> > > > > >> > >>>> and
> > > > > >> > >>>>>> it
> > > > > >> > >>>>>>>>>>>>> would
> > > > > >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo
> > > ideas
> > > > > that
> > > > > >> > have
> > > > > >> > >>>>>> gone
> > > > > >> > >>>>>>>>>>>>>> dormant
> > > > > >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> > > > > >> JIRA-attached
> > > > > >> > >>>> google
> > > > > >> > >>>>>>>> doc
> > > > > >> > >>>>>>>>>>>>>> that
> > > > > >> > >>>>>>>>>>>>>>> need socializing.
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> You can only come if you are wearing your
> > bullshit
> > > > > hat.
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M
> > > regions
> > > > > >> > >>>>>> (Matteo/Stack)
> > > > > >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path
> > and
> > > > > >> > alternate
> > > > > >> > >>>>>>>> KeyValue
> > > > > >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> > > > > >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> > > > > >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > > > > >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> > > > > >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> > > > > >> > >>>>>>>>>>>>>>> + Time (Enis)
> > > > > >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > > > > >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > > > > >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the
> > > topic.
> > > > > If
> > > > > >> you
> > > > > >> > >>>> want
> > > > > >> > >>>>>> to
> > > > > >> > >>>>>>>>>>>>> take
> > > > > >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just
> say.
> > > > > Suggest
> > > > > >> > that
> > > > > >> > >>>>>>>>>>>>> discussion
> > > > > >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > > > > >> > >>>>>>>>>>>>>>> thought/design/implementation.
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> What do others think?
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> What date would suit folks?
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> Anyone want to host?
> > > > > >> > >>>>>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>>>> Thanks,
> > > > > >> > >>>>>>>>>>>>>>> Matteo and St.Ack
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>> --
> > > > > >> > >>>>>>>>>>>> Best regards,
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>> - Andy
> > > > > >> > >>>>>>>>>>>>
> > > > > >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by
> > > hitting
> > > > > >> back. -
> > > > > >> > >>>> Piet
> > > > > >> > >>>>>>>> Hein
> > > > > >> > >>>>>>>>>>>> (via Tom White)
> > > > > >> > >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Stack <st...@duboce.net>.
On Thu, Aug 20, 2015 at 11:13 AM, Biju N <bi...@gmail.com> wrote:

> Is there a way to participate remotely or at least listen in to this
> meet-up? There will be at least a few who will be interested to dial in
> from the east coast.
>
>

Should be able to get you at least audio. Will post something here on this
thread and to the meetup page just before the meeting starts.
St.Ack



> On Wed, Aug 12, 2015 at 3:29 PM, Stack <st...@duboce.net> wrote:
>
> > I posted this meetup notice:
> > http://www.meetup.com/hackathon/events/224589819/
> > St.Ack
> >
> > On Wed, Aug 12, 2015 at 1:34 AM, Enis Söztutar <en...@apache.org> wrote:
> >
> > > Agreed, too many fat topics, but all important. I guess we can spend
> > first
> > > 10-20 mins on the agenda based on who is in the room and come up with a
> > > shorter list and go from there.
> > >
> > > Enis
> > >
> > > On Tue, Aug 11, 2015 at 9:23 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <
> > syuanjiangdev@gmail.com>
> > > > wrote:
> > > >
> > > > > [Let us move back to the main topic - a meeting to talk about the
> > next
> > > > > direction on HBASE development]
> > > > >
> > > > > Are we firm on the *August 26th* meeting date?
> > > > >
> > > > > Given the long list of topics from St.Ack, even a one day meeting
> > might
> > > > > not cover all of them (in depth).  We need to either trim the topic
> > > list
> > > > or
> > > > > limit the time to discuss a single topic (30 min for one topic
> > > enough?).
> > > > >
> > > > >
> > > > Thanks for bringing us back to topic Stephen.
> > > >
> > > > Yes, lets do 26th. Speak up if this does not suit. I will file a
> meetup
> > > > page in an hour or so. Where should we do it? Enis offered his nice
> > > place.
> > > > Could try and get space at ours too... in Palo Alto (less 'deep
> > south', a
> > > > little easier for the SFers).
> > > >
> > > > As to too many topics, in my experience, a bunch of smelly engineers
> > all
> > > in
> > > > a room starts to fall apart after a couple of hours especially when
> > > ranging
> > > > discussion. Suggest we cut the time-per-topic and list of topics so
> can
> > > do
> > > > in an afternoon. If some topics are too fat, can do break out or
> > put-off
> > > to
> > > > another day and smaller, interested group.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > >
> > > > > Thanks
> > > > > Stephen
> > > > >
> > > > >
> > > > > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <anoop.hbase@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> We will be doing some more large data tests in coming week Andy..
> > >  Will
> > > > >> report back more.  Also will do a write up , in what all ways the
> > work
> > > > >> might help us.  As Sean said, we will continue in another thread
> if
> > > any
> > > > >> thing further..  Will soon write back on the test result.  Thanks.
> > > > >>
> > > > >> -Anoop-
> > > > >>
> > > > >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> > > > andrew.purtell@gmail.com
> > > > >> >
> > > > >> wrote:
> > > > >>
> > > > >> > Cool, thanks.
> > > > >> >
> > > > >> > Is a 20% latency reduction the most we can expect or do you
> think
> > > > there
> > > > >> is
> > > > >> > room for more improvement? Just curious.
> > > > >> >
> > > > >> > Is latency reduction the only goal? Anything here about
> supporting
> > > > >> larger
> > > > >> > heaps? Is there something we can measure in that regard?
> > > > >> >
> > > > >> > Hope you see my point and there's enough here to prime a goals
> and
> > > > >> metrics
> > > > >> > discussion at the pow wow or on the relevant JIRAs.
> > > > >> >
> > > > >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> > > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >> > >
> > > > >> > > Hi Andy
> > > > >> > >
> > > > >> > > Based on our POCs done, we expect around 20% improvement in
> > > latency.
> > > > >> For
> > > > >> > > scans it will be little lesser than 20%.
> > > > >> > >
> > > > >> > > Regards
> > > > >> > > Ram
> > > > >> > >
> > > > >> > >
> > > > >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> > > > >> > andrew.purtell@gmail.com>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > >> Hi Ram,
> > > > >> > >>
> > > > >> > >> Do you have any targets for what you are measuring? What are
> > the
> > > > >> goals
> > > > >> > you
> > > > >> > >> guys are working toward with the off heaping changes?
> > > > >> > >>
> > > > >> > >>
> > > > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > > > >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >> > >>>
> > > > >> > >>> Thanks Vladimir.
> > > > >> > >>> Yeah, the reports that were attached specifically captured
> the
> > > > >> 95/99th
> > > > >> > >>> percentile.
> > > > >> > >>> The reason for checking the server side perf was to
> > specifically
> > > > see
> > > > >> > the
> > > > >> > >>> improvement in the server side and also the client was
> sending
> > > > large
> > > > >> > >>> results in multiple threads. So wanted to avoid the n/w
> > > > >> interference. I
> > > > >> > >>> think it was a general practice that we were following.
> > > > >> > >>> We Wil do some more tests and get some latest readings with
> > > bigger
> > > > >> data
> > > > >> > >>> sets.
> > > > >> > >>> Sent from mobile.
> > > > >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> > > > >> andrew.purtell@gmail.com>
> > > > >> > >> wrote:
> > > > >> > >>>>
> > > > >> > >>>> +1
> > > > >> > >>>>
> > > > >> > >>>> Yeah, something like that, with aspirational targets for
> > > > >> improvement
> > > > >> > >> from
> > > > >> > >>>> current releases. Then what to measure, the tests to run,
> and
> > > > >> criteria
> > > > >> > >> for
> > > > >> > >>>> evaluation are clear and organized and we're able to better
> > > > assess
> > > > >> how
> > > > >> > >> the
> > > > >> > >>>> work in progress is meeting its goals (or not)
> > > > >> > >>>>
> > > > >> > >>>>
> > > > >> > >>>>
> > > > >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> > > > >> > vladrodionov@gmail.com
> > > > >> > >>>
> > > > >> > >>>> wrote:
> > > > >> > >>>>
> > > > >> > >>>>>>> Umbrella jira to make sure we can have blocks cached in
> > > > offheap
> > > > >> > >> backed
> > > > >> > >>>>> cache. In the entire read path, we can refer to this
> offheap
> > > > >> buffer
> > > > >> > and
> > > > >> > >>>>> avoid onheap copying.
> > > > >> > >>>>>
> > > > >> > >>>>> I think, on a read path, the most important improvement we
> > > could
> > > > >> > >> imagine
> > > > >> > >>>> is
> > > > >> > >>>>> elimination or reducing of object creations (KVs,
> iterators
> > > > etc).
> > > > >> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse,
> > API
> > > > >> change
> > > > >> > >>>> etc.
> > > > >> > >>>>> If this is a part of this JIRA, then I would easily
> define a
> > > > goal:
> > > > >> > >>>>> improving 95/99% latency of a read operations. Not
> > > performance,
> > > > >> but
> > > > >> > >>>> latency
> > > > >> > >>>>> matters
> > > > >> > >>>>>
> > > > >> > >>>>> -Vlad
> > > > >> > >>>>>
> > > > >> > >>>>>
> > > > >> > >>>>>
> > > > >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > > > >> > >>>> andrew.purtell@gmail.com>
> > > > >> > >>>>> wrote:
> > > > >> > >>>>>
> > > > >> > >>>>>> That's not a realistic or useful test scenario, unless
> the
> > > goal
> > > > >> is
> > > > >> > to
> > > > >> > >>>>>> accelerate queries where all cells are filtered at the
> > > server.
> > > > >> > >>>>>>
> > > > >> > >>>>>>
> > > > >> > >>>>>>
> > > > >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> > > > anoop.hbase@gmail.com
> > > > >> >
> > > > >> > >>>> wrote:
> > > > >> > >>>>>>>
> > > > >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of
> > it,
> > > we
> > > > >> have
> > > > >> > >>>> added
> > > > >> > >>>>>>> perf numbers in a cluster testing.  This was done using
> PE
> > > get
> > > > >> and
> > > > >> > >> scan
> > > > >> > >>>>>>> tests with filtering all cells at server (to not
> consider
> > > n/w
> > > > >> > >> bandwidth
> > > > >> > >>>>>>> constraints)
> > > > >> > >>>>>>>
> > > > >> > >>>>>>> -Anoop-
> > > > >> > >>>>>>>
> > > > >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > > > >> > >>>>>> andrew.purtell@gmail.com>
> > > > >> > >>>>>>> wrote:
> > > > >> > >>>>>>>
> > > > >> > >>>>>>>> We have some microbenchmarks, not evidence of
> differences
> > > > seen
> > > > >> > from
> > > > >> > >> a
> > > > >> > >>>>>>>> client application. I'm not saying that microbenchmarks
> > are
> > > > not
> > > > >> > >>>> totally
> > > > >> > >>>>>>>> necessary and a great start - they are - but that they
> > > don't
> > > > >> > measure
> > > > >> > >>>> an
> > > > >> > >>>>>> end
> > > > >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we
> > don't
> > > > >> have a
> > > > >> > >>>> JIRA
> > > > >> > >>>>>> or
> > > > >> > >>>>>>>> design doc that states a clear end goal metric like the
> > > > >> strawman I
> > > > >> > >>>> threw
> > > > >> > >>>>>>>> together in my previous mail. A measurable system level
> > > goal
> > > > >> and
> > > > >> > >> some
> > > > >> > >>>>>> data
> > > > >> > >>>>>>>> from full cluster testing would go a lot further toward
> > > > letting
> > > > >> > all
> > > > >> > >> of
> > > > >> > >>>>>> us
> > > > >> > >>>>>>>> evaluate the potential and payoff of the work. In the
> > > > meantime
> > > > >> we
> > > > >> > >>>> should
> > > > >> > >>>>>>>> probably be assembling these changes on a branch
> instead
> > of
> > > > in
> > > > >> > >> trunk,
> > > > >> > >>>>>> for
> > > > >> > >>>>>>>> as long as the goal is not clearly defined and the
> payoff
> > > and
> > > > >> > >>>> potential
> > > > >> > >>>>>> for
> > > > >> > >>>>>>>> perf regressions is untested and unknown.
> > > > >> > >>>>>>>>
> > > > >> > >>>>>>>>
> > > > >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
> > > > >> anoop.hbase@gmail.com>
> > > > >> > >>>> wrote:
> > > > >> > >>>>>>>>>
> > > > >> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc
> attached
> > > > which
> > > > >> > >>>> contains
> > > > >> > >>>>>>>> some
> > > > >> > >>>>>>>>> perf gain numbers..  We will be doing more tests in
> > next 2
> > > > >> weeks
> > > > >> > >>>>>> (before
> > > > >> > >>>>>>>>> end of this month) and will publish them.   Yes it
> will
> > be
> > > > >> great
> > > > >> > if
> > > > >> > >>>> it
> > > > >> > >>>>>> is
> > > > >> > >>>>>>>>> more IST friendly time :-)
> > > > >> > >>>>>>>>>
> > > > >> > >>>>>>>>> -Anoop-
> > > > >> > >>>>>>>>>
> > > > >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > > > >> > >>>>>>>> andrew.purtell@gmail.com>
> > > > >> > >>>>>>>>> wrote:
> > > > >> > >>>>>>>>>
> > > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> > > known
> > > > >> > always
> > > > >> > >>>>>> argue
> > > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> > easily
> > > > >> (drives
> > > > >> > >>>> some
> > > > >> > >>>>>>>> folks
> > > > >> > >>>>>>>>>> crazy).
> > > > >> > >>>>>>>>>>
> > > > >> > >>>>>>>>>> I can vouch for this (smile)
> > > > >> > >>>>>>>>>>
> > > > >> > >>>>>>>>>> I also can offer support for off heaping there. At
> the
> > > same
> > > > >> time
> > > > >> > >> we
> > > > >> > >>>> do
> > > > >> > >>>>>>>>>> have a gap where we can't point to a timeline of
> > > > improvements
> > > > >> > >> (yet,
> > > > >> > >>>>>>>> anyway)
> > > > >> > >>>>>>>>>> with benchmarks showing gains where your goals need
> > them.
> > > > For
> > > > >> > >>>> example,
> > > > >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for
> > response
> > > > time
> > > > >> > >>>>>>>> distribution
> > > > >> > >>>>>>>>>> D; dev version of HBase in off heap branch can
> address
> > > max
> > > > >> N' GB
> > > > >> > >> for
> > > > >> > >>>>>>>>>> distribution D', where N' > N and D > D'
> (distribution
> > D'
> > > > >> > >>>>>> statistically
> > > > >> > >>>>>>>>>> shows better/lower response times).
> > > > >> > >>>>>>>>>>
> > > > >> > >>>>>>>>>>
> > > > >> > >>>>>>>>>>
> > > > >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
> > > > >> larsh@apache.org>
> > > > >> > >>>> wrote:
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> I'm in favor of anything that improves performance
> > (and
> > > > >> > >> preferably
> > > > >> > >>>>>>>>>> doesn't set us back into a world that's worse than C
> > due
> > > to
> > > > >> the
> > > > >> > >> lack
> > > > >> > >>>>>> of
> > > > >> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's
> > just
> > > > >> that
> > > > >> > I'm
> > > > >> > >>>>>>>> perhaps
> > > > >> > >>>>>>>>>> asking for more numbers and justification in weighing
> > the
> > > > >> pros
> > > > >> > and
> > > > >> > >>>>>> cons.
> > > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> > > known
> > > > >> > always
> > > > >> > >>>>>> argue
> > > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> > easily
> > > > >> (drives
> > > > >> > >>>> some
> > > > >> > >>>>>>>> folks
> > > > >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where
> > needed
> > > > :)
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so
> > there
> > > > is
> > > > >> a
> > > > >> > >>>>>> fighting
> > > > >> > >>>>>>>>>> chance that folks on IST can participate. I know that
> > > some
> > > > of
> > > > >> > our
> > > > >> > >>>>>> folks
> > > > >> > >>>>>>>> on
> > > > >> > >>>>>>>>>> IST would love to participate in the backup
> > discussion).
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown
> > SF.
> > > > I'd
> > > > >> > just
> > > > >> > >>>>>> need
> > > > >> > >>>>>>>>>> an approx. number of folks.
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> -- Lars
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> From: ramkrishna vasudevan <
> > > > >> ramkrishna.s.vasudevan@gmail.com>
> > > > >> > >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>;
> > lars
> > > > >> > >> hofhansl <
> > > > >> > >>>>>>>>>> larsh@apache.org>
> > > > >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
> workshop
> > on
> > > > >> > >> near-term
> > > > >> > >>>>>> work
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> Hi
> > > > >> > >>>>>>>>>>> What time will it be on August 26th?
> > > > >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour
> > of
> > > > this
> > > > >> > >>>>>> offheaping
> > > > >> > >>>>>>>>>> stuff.  May be if we (from India) can attend this
> > meeting
> > > > >> > remotely
> > > > >> > >>>>>> your
> > > > >> > >>>>>>>>>> thoughts can be discussed and also the current state
> of
> > > > this
> > > > >> > work.
> > > > >> > >>>>>>>>>>> RegardsRam
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> > > > >> > larsh@apache.org
> > > > >> > >>>
> > > > >> > >>>>>>>> wrote:
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week
> of
> > > > >> August
> > > > >> > >> 9th.
> > > > >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well -
> ours
> > > are
> > > > >> more
> > > > >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so
> > > data
> > > > is
> > > > >> > >>>>>> "grouped"
> > > > >> > >>>>>>>> by
> > > > >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some
> > of
> > > > the
> > > > >> > folks
> > > > >> > >>>> who
> > > > >> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts
> > > > (although
> > > > >> you
> > > > >> > >>>> folks
> > > > >> > >>>>>>>>>> usually do not like what I think about the offheap
> > > efforts
> > > > >> :) ).
> > > > >> > >>>>>>>>>>> Would like to add the following topics:
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more
> > bits
> > > in
> > > > >> the
> > > > >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of
> > the
> > > > >> "Time"
> > > > >> > >>>> topic)
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> - "Replication". We found that replication cannot
> keep
> > > up
> > > > >> with
> > > > >> > >> high
> > > > >> > >>>>>>>>>> write loads, due to the fact that replicated is
> > strictly
> > > > >> single
> > > > >> > >>>>>> threaded
> > > > >> > >>>>>>>>>> per regionserver (even though we have multiple region
> > > > >> servers on
> > > > >> > >> the
> > > > >> > >>>>>>>> sink
> > > > >> > >>>>>>>>>> side)
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> -- Lars
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> ________________________________
> > > > >> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> > > > >> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> > > > >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer
> workshop
> > on
> > > > >> > >> near-term
> > > > >> > >>>>>> work
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the
> > 24th
> > > of
> > > > >> > >> August.
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>> --
> > > > >> > >>>>>>>>>>> Sean
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>
> > > > >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> > > > >> > apurtell@apache.org>
> > > > >> > >>>>>>>> wrote:
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>> I can be up in your area in August.
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <
> > > > stack@duboce.net
> > > > >> >
> > > > >> > >>>> wrote:
> > > > >> > >>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> > > > >> > >>>>>> enis.soz@gmail.com>
> > > > >> > >>>>>>>>>>>>> wrote:
> > > > >> > >>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the
> > > talk-aton.
> > > > >> > >>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer
> > > something
> > > > >> next
> > > > >> > >> week
> > > > >> > >>>>>> if
> > > > >> > >>>>>>>>>>>>>> possible.
> > > > >> > >>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on
> 10th
> > > of
> > > > >> > August
> > > > >> > >>>>>>>> (Mikhail
> > > > >> > >>>>>>>>>>>> on
> > > > >> > >>>>>>>>>>>>> the 20th).
> > > > >> > >>>>>>>>>>>>> St.Ack
> > > > >> > >>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>> Enis
> > > > >> > >>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
> > > > >> stack@duboce.net>
> > > > >> > >>>> wrote:
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got
> > together
> > > > >> for a
> > > > >> > >>>>>> pow-wow.
> > > > >> > >>>>>>>>>>>>> There
> > > > >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see
> > > below
> > > > >> > list)
> > > > >> > >>>> and
> > > > >> > >>>>>> it
> > > > >> > >>>>>>>>>>>>> would
> > > > >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo
> > ideas
> > > > that
> > > > >> > have
> > > > >> > >>>>>> gone
> > > > >> > >>>>>>>>>>>>>> dormant
> > > > >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> > > > >> JIRA-attached
> > > > >> > >>>> google
> > > > >> > >>>>>>>> doc
> > > > >> > >>>>>>>>>>>>>> that
> > > > >> > >>>>>>>>>>>>>>> need socializing.
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> You can only come if you are wearing your
> bullshit
> > > > hat.
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M
> > regions
> > > > >> > >>>>>> (Matteo/Stack)
> > > > >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path
> and
> > > > >> > alternate
> > > > >> > >>>>>>>> KeyValue
> > > > >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> > > > >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> > > > >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > > > >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> > > > >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> > > > >> > >>>>>>>>>>>>>>> + Time (Enis)
> > > > >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > > > >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > > > >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the
> > topic.
> > > > If
> > > > >> you
> > > > >> > >>>> want
> > > > >> > >>>>>> to
> > > > >> > >>>>>>>>>>>>> take
> > > > >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.
> > > > Suggest
> > > > >> > that
> > > > >> > >>>>>>>>>>>>> discussion
> > > > >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > > > >> > >>>>>>>>>>>>>>> thought/design/implementation.
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> What do others think?
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> What date would suit folks?
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> Anyone want to host?
> > > > >> > >>>>>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>>>> Thanks,
> > > > >> > >>>>>>>>>>>>>>> Matteo and St.Ack
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>> --
> > > > >> > >>>>>>>>>>>> Best regards,
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>> - Andy
> > > > >> > >>>>>>>>>>>>
> > > > >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by
> > hitting
> > > > >> back. -
> > > > >> > >>>> Piet
> > > > >> > >>>>>>>> Hein
> > > > >> > >>>>>>>>>>>> (via Tom White)
> > > > >> > >>
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Biju N <bi...@gmail.com>.
Is there a way to participate remotely or at least listen in to this
meet-up? There will be at least a few who will be interested to dial in
from the east coast.

On Wed, Aug 12, 2015 at 3:29 PM, Stack <st...@duboce.net> wrote:

> I posted this meetup notice:
> http://www.meetup.com/hackathon/events/224589819/
> St.Ack
>
> On Wed, Aug 12, 2015 at 1:34 AM, Enis Söztutar <en...@apache.org> wrote:
>
> > Agreed, too many fat topics, but all important. I guess we can spend
> first
> > 10-20 mins on the agenda based on who is in the room and come up with a
> > shorter list and go from there.
> >
> > Enis
> >
> > On Tue, Aug 11, 2015 at 9:23 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <
> syuanjiangdev@gmail.com>
> > > wrote:
> > >
> > > > [Let us move back to the main topic - a meeting to talk about the
> next
> > > > direction on HBASE development]
> > > >
> > > > Are we firm on the *August 26th* meeting date?
> > > >
> > > > Given the long list of topics from St.Ack, even a one day meeting
> might
> > > > not cover all of them (in depth).  We need to either trim the topic
> > list
> > > or
> > > > limit the time to discuss a single topic (30 min for one topic
> > enough?).
> > > >
> > > >
> > > Thanks for bringing us back to topic Stephen.
> > >
> > > Yes, lets do 26th. Speak up if this does not suit. I will file a meetup
> > > page in an hour or so. Where should we do it? Enis offered his nice
> > place.
> > > Could try and get space at ours too... in Palo Alto (less 'deep
> south', a
> > > little easier for the SFers).
> > >
> > > As to too many topics, in my experience, a bunch of smelly engineers
> all
> > in
> > > a room starts to fall apart after a couple of hours especially when
> > ranging
> > > discussion. Suggest we cut the time-per-topic and list of topics so can
> > do
> > > in an afternoon. If some topics are too fat, can do break out or
> put-off
> > to
> > > another day and smaller, interested group.
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > > Thanks
> > > > Stephen
> > > >
> > > >
> > > > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <an...@gmail.com>
> > > wrote:
> > > >
> > > >> We will be doing some more large data tests in coming week Andy..
> >  Will
> > > >> report back more.  Also will do a write up , in what all ways the
> work
> > > >> might help us.  As Sean said, we will continue in another thread if
> > any
> > > >> thing further..  Will soon write back on the test result.  Thanks.
> > > >>
> > > >> -Anoop-
> > > >>
> > > >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> > > andrew.purtell@gmail.com
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > Cool, thanks.
> > > >> >
> > > >> > Is a 20% latency reduction the most we can expect or do you think
> > > there
> > > >> is
> > > >> > room for more improvement? Just curious.
> > > >> >
> > > >> > Is latency reduction the only goal? Anything here about supporting
> > > >> larger
> > > >> > heaps? Is there something we can measure in that regard?
> > > >> >
> > > >> > Hope you see my point and there's enough here to prime a goals and
> > > >> metrics
> > > >> > discussion at the pow wow or on the relevant JIRAs.
> > > >> >
> > > >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> > > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >> > >
> > > >> > > Hi Andy
> > > >> > >
> > > >> > > Based on our POCs done, we expect around 20% improvement in
> > latency.
> > > >> For
> > > >> > > scans it will be little lesser than 20%.
> > > >> > >
> > > >> > > Regards
> > > >> > > Ram
> > > >> > >
> > > >> > >
> > > >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> > > >> > andrew.purtell@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > >> Hi Ram,
> > > >> > >>
> > > >> > >> Do you have any targets for what you are measuring? What are
> the
> > > >> goals
> > > >> > you
> > > >> > >> guys are working toward with the off heaping changes?
> > > >> > >>
> > > >> > >>
> > > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > > >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >> > >>>
> > > >> > >>> Thanks Vladimir.
> > > >> > >>> Yeah, the reports that were attached specifically captured the
> > > >> 95/99th
> > > >> > >>> percentile.
> > > >> > >>> The reason for checking the server side perf was to
> specifically
> > > see
> > > >> > the
> > > >> > >>> improvement in the server side and also the client was sending
> > > large
> > > >> > >>> results in multiple threads. So wanted to avoid the n/w
> > > >> interference. I
> > > >> > >>> think it was a general practice that we were following.
> > > >> > >>> We Wil do some more tests and get some latest readings with
> > bigger
> > > >> data
> > > >> > >>> sets.
> > > >> > >>> Sent from mobile.
> > > >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> > > >> andrew.purtell@gmail.com>
> > > >> > >> wrote:
> > > >> > >>>>
> > > >> > >>>> +1
> > > >> > >>>>
> > > >> > >>>> Yeah, something like that, with aspirational targets for
> > > >> improvement
> > > >> > >> from
> > > >> > >>>> current releases. Then what to measure, the tests to run, and
> > > >> criteria
> > > >> > >> for
> > > >> > >>>> evaluation are clear and organized and we're able to better
> > > assess
> > > >> how
> > > >> > >> the
> > > >> > >>>> work in progress is meeting its goals (or not)
> > > >> > >>>>
> > > >> > >>>>
> > > >> > >>>>
> > > >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> > > >> > vladrodionov@gmail.com
> > > >> > >>>
> > > >> > >>>> wrote:
> > > >> > >>>>
> > > >> > >>>>>>> Umbrella jira to make sure we can have blocks cached in
> > > offheap
> > > >> > >> backed
> > > >> > >>>>> cache. In the entire read path, we can refer to this offheap
> > > >> buffer
> > > >> > and
> > > >> > >>>>> avoid onheap copying.
> > > >> > >>>>>
> > > >> > >>>>> I think, on a read path, the most important improvement we
> > could
> > > >> > >> imagine
> > > >> > >>>> is
> > > >> > >>>>> elimination or reducing of object creations (KVs, iterators
> > > etc).
> > > >> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse,
> API
> > > >> change
> > > >> > >>>> etc.
> > > >> > >>>>> If this is a part of this JIRA, then I would easily define a
> > > goal:
> > > >> > >>>>> improving 95/99% latency of a read operations. Not
> > performance,
> > > >> but
> > > >> > >>>> latency
> > > >> > >>>>> matters
> > > >> > >>>>>
> > > >> > >>>>> -Vlad
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > > >> > >>>> andrew.purtell@gmail.com>
> > > >> > >>>>> wrote:
> > > >> > >>>>>
> > > >> > >>>>>> That's not a realistic or useful test scenario, unless the
> > goal
> > > >> is
> > > >> > to
> > > >> > >>>>>> accelerate queries where all cells are filtered at the
> > server.
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> > > anoop.hbase@gmail.com
> > > >> >
> > > >> > >>>> wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of
> it,
> > we
> > > >> have
> > > >> > >>>> added
> > > >> > >>>>>>> perf numbers in a cluster testing.  This was done using PE
> > get
> > > >> and
> > > >> > >> scan
> > > >> > >>>>>>> tests with filtering all cells at server (to not consider
> > n/w
> > > >> > >> bandwidth
> > > >> > >>>>>>> constraints)
> > > >> > >>>>>>>
> > > >> > >>>>>>> -Anoop-
> > > >> > >>>>>>>
> > > >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > > >> > >>>>>> andrew.purtell@gmail.com>
> > > >> > >>>>>>> wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>>> We have some microbenchmarks, not evidence of differences
> > > seen
> > > >> > from
> > > >> > >> a
> > > >> > >>>>>>>> client application. I'm not saying that microbenchmarks
> are
> > > not
> > > >> > >>>> totally
> > > >> > >>>>>>>> necessary and a great start - they are - but that they
> > don't
> > > >> > measure
> > > >> > >>>> an
> > > >> > >>>>>> end
> > > >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we
> don't
> > > >> have a
> > > >> > >>>> JIRA
> > > >> > >>>>>> or
> > > >> > >>>>>>>> design doc that states a clear end goal metric like the
> > > >> strawman I
> > > >> > >>>> threw
> > > >> > >>>>>>>> together in my previous mail. A measurable system level
> > goal
> > > >> and
> > > >> > >> some
> > > >> > >>>>>> data
> > > >> > >>>>>>>> from full cluster testing would go a lot further toward
> > > letting
> > > >> > all
> > > >> > >> of
> > > >> > >>>>>> us
> > > >> > >>>>>>>> evaluate the potential and payoff of the work. In the
> > > meantime
> > > >> we
> > > >> > >>>> should
> > > >> > >>>>>>>> probably be assembling these changes on a branch instead
> of
> > > in
> > > >> > >> trunk,
> > > >> > >>>>>> for
> > > >> > >>>>>>>> as long as the goal is not clearly defined and the payoff
> > and
> > > >> > >>>> potential
> > > >> > >>>>>> for
> > > >> > >>>>>>>> perf regressions is untested and unknown.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
> > > >> anoop.hbase@gmail.com>
> > > >> > >>>> wrote:
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
> > > which
> > > >> > >>>> contains
> > > >> > >>>>>>>> some
> > > >> > >>>>>>>>> perf gain numbers..  We will be doing more tests in
> next 2
> > > >> weeks
> > > >> > >>>>>> (before
> > > >> > >>>>>>>>> end of this month) and will publish them.   Yes it will
> be
> > > >> great
> > > >> > if
> > > >> > >>>> it
> > > >> > >>>>>> is
> > > >> > >>>>>>>>> more IST friendly time :-)
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> -Anoop-
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > > >> > >>>>>>>> andrew.purtell@gmail.com>
> > > >> > >>>>>>>>> wrote:
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> > known
> > > >> > always
> > > >> > >>>>>> argue
> > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> easily
> > > >> (drives
> > > >> > >>>> some
> > > >> > >>>>>>>> folks
> > > >> > >>>>>>>>>> crazy).
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I can vouch for this (smile)
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>> I also can offer support for off heaping there. At the
> > same
> > > >> time
> > > >> > >> we
> > > >> > >>>> do
> > > >> > >>>>>>>>>> have a gap where we can't point to a timeline of
> > > improvements
> > > >> > >> (yet,
> > > >> > >>>>>>>> anyway)
> > > >> > >>>>>>>>>> with benchmarks showing gains where your goals need
> them.
> > > For
> > > >> > >>>> example,
> > > >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for
> response
> > > time
> > > >> > >>>>>>>> distribution
> > > >> > >>>>>>>>>> D; dev version of HBase in off heap branch can address
> > max
> > > >> N' GB
> > > >> > >> for
> > > >> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution
> D'
> > > >> > >>>>>> statistically
> > > >> > >>>>>>>>>> shows better/lower response times).
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>>
> > > >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
> > > >> larsh@apache.org>
> > > >> > >>>> wrote:
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> I'm in favor of anything that improves performance
> (and
> > > >> > >> preferably
> > > >> > >>>>>>>>>> doesn't set us back into a world that's worse than C
> due
> > to
> > > >> the
> > > >> > >> lack
> > > >> > >>>>>> of
> > > >> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's
> just
> > > >> that
> > > >> > I'm
> > > >> > >>>>>>>> perhaps
> > > >> > >>>>>>>>>> asking for more numbers and justification in weighing
> the
> > > >> pros
> > > >> > and
> > > >> > >>>>>> cons.
> > > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> > known
> > > >> > always
> > > >> > >>>>>> argue
> > > >> > >>>>>>>>>> both side of a discussion and to never take sides
> easily
> > > >> (drives
> > > >> > >>>> some
> > > >> > >>>>>>>> folks
> > > >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where
> needed
> > > :)
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so
> there
> > > is
> > > >> a
> > > >> > >>>>>> fighting
> > > >> > >>>>>>>>>> chance that folks on IST can participate. I know that
> > some
> > > of
> > > >> > our
> > > >> > >>>>>> folks
> > > >> > >>>>>>>> on
> > > >> > >>>>>>>>>> IST would love to participate in the backup
> discussion).
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown
> SF.
> > > I'd
> > > >> > just
> > > >> > >>>>>> need
> > > >> > >>>>>>>>>> an approx. number of folks.
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> -- Lars
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> From: ramkrishna vasudevan <
> > > >> ramkrishna.s.vasudevan@gmail.com>
> > > >> > >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>;
> lars
> > > >> > >> hofhansl <
> > > >> > >>>>>>>>>> larsh@apache.org>
> > > >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop
> on
> > > >> > >> near-term
> > > >> > >>>>>> work
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> Hi
> > > >> > >>>>>>>>>>> What time will it be on August 26th?
> > > >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour
> of
> > > this
> > > >> > >>>>>> offheaping
> > > >> > >>>>>>>>>> stuff.  May be if we (from India) can attend this
> meeting
> > > >> > remotely
> > > >> > >>>>>> your
> > > >> > >>>>>>>>>> thoughts can be discussed and also the current state of
> > > this
> > > >> > work.
> > > >> > >>>>>>>>>>> RegardsRam
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> > > >> > larsh@apache.org
> > > >> > >>>
> > > >> > >>>>>>>> wrote:
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of
> > > >> August
> > > >> > >> 9th.
> > > >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours
> > are
> > > >> more
> > > >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so
> > data
> > > is
> > > >> > >>>>>> "grouped"
> > > >> > >>>>>>>> by
> > > >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some
> of
> > > the
> > > >> > folks
> > > >> > >>>> who
> > > >> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts
> > > (although
> > > >> you
> > > >> > >>>> folks
> > > >> > >>>>>>>>>> usually do not like what I think about the offheap
> > efforts
> > > >> :) ).
> > > >> > >>>>>>>>>>> Would like to add the following topics:
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more
> bits
> > in
> > > >> the
> > > >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of
> the
> > > >> "Time"
> > > >> > >>>> topic)
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> - "Replication". We found that replication cannot keep
> > up
> > > >> with
> > > >> > >> high
> > > >> > >>>>>>>>>> write loads, due to the fact that replicated is
> strictly
> > > >> single
> > > >> > >>>>>> threaded
> > > >> > >>>>>>>>>> per regionserver (even though we have multiple region
> > > >> servers on
> > > >> > >> the
> > > >> > >>>>>>>> sink
> > > >> > >>>>>>>>>> side)
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> -- Lars
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> ________________________________
> > > >> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> > > >> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> > > >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop
> on
> > > >> > >> near-term
> > > >> > >>>>>> work
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the
> 24th
> > of
> > > >> > >> August.
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>> --
> > > >> > >>>>>>>>>>> Sean
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>
> > > >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> > > >> > apurtell@apache.org>
> > > >> > >>>>>>>> wrote:
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>> I can be up in your area in August.
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <
> > > stack@duboce.net
> > > >> >
> > > >> > >>>> wrote:
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> > > >> > >>>>>> enis.soz@gmail.com>
> > > >> > >>>>>>>>>>>>> wrote:
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the
> > talk-aton.
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer
> > something
> > > >> next
> > > >> > >> week
> > > >> > >>>>>> if
> > > >> > >>>>>>>>>>>>>> possible.
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th
> > of
> > > >> > August
> > > >> > >>>>>>>> (Mikhail
> > > >> > >>>>>>>>>>>> on
> > > >> > >>>>>>>>>>>>> the 20th).
> > > >> > >>>>>>>>>>>>> St.Ack
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>> Enis
> > > >> > >>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
> > > >> stack@duboce.net>
> > > >> > >>>> wrote:
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got
> together
> > > >> for a
> > > >> > >>>>>> pow-wow.
> > > >> > >>>>>>>>>>>>> There
> > > >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see
> > below
> > > >> > list)
> > > >> > >>>> and
> > > >> > >>>>>> it
> > > >> > >>>>>>>>>>>>> would
> > > >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo
> ideas
> > > that
> > > >> > have
> > > >> > >>>>>> gone
> > > >> > >>>>>>>>>>>>>> dormant
> > > >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> > > >> JIRA-attached
> > > >> > >>>> google
> > > >> > >>>>>>>> doc
> > > >> > >>>>>>>>>>>>>> that
> > > >> > >>>>>>>>>>>>>>> need socializing.
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit
> > > hat.
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M
> regions
> > > >> > >>>>>> (Matteo/Stack)
> > > >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
> > > >> > alternate
> > > >> > >>>>>>>> KeyValue
> > > >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> > > >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> > > >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > > >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> > > >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> > > >> > >>>>>>>>>>>>>>> + Time (Enis)
> > > >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > > >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > > >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the
> topic.
> > > If
> > > >> you
> > > >> > >>>> want
> > > >> > >>>>>> to
> > > >> > >>>>>>>>>>>>> take
> > > >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.
> > > Suggest
> > > >> > that
> > > >> > >>>>>>>>>>>>> discussion
> > > >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > > >> > >>>>>>>>>>>>>>> thought/design/implementation.
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> What do others think?
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> What date would suit folks?
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> Anyone want to host?
> > > >> > >>>>>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>>>> Thanks,
> > > >> > >>>>>>>>>>>>>>> Matteo and St.Ack
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>> --
> > > >> > >>>>>>>>>>>> Best regards,
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>> - Andy
> > > >> > >>>>>>>>>>>>
> > > >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by
> hitting
> > > >> back. -
> > > >> > >>>> Piet
> > > >> > >>>>>>>> Hein
> > > >> > >>>>>>>>>>>> (via Tom White)
> > > >> > >>
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Stack <st...@duboce.net>.
I posted this meetup notice:
http://www.meetup.com/hackathon/events/224589819/
St.Ack

On Wed, Aug 12, 2015 at 1:34 AM, Enis Söztutar <en...@apache.org> wrote:

> Agreed, too many fat topics, but all important. I guess we can spend first
> 10-20 mins on the agenda based on who is in the room and come up with a
> shorter list and go from there.
>
> Enis
>
> On Tue, Aug 11, 2015 at 9:23 PM, Stack <st...@duboce.net> wrote:
>
> > On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <sy...@gmail.com>
> > wrote:
> >
> > > [Let us move back to the main topic - a meeting to talk about the next
> > > direction on HBASE development]
> > >
> > > Are we firm on the *August 26th* meeting date?
> > >
> > > Given the long list of topics from St.Ack, even a one day meeting might
> > > not cover all of them (in depth).  We need to either trim the topic
> list
> > or
> > > limit the time to discuss a single topic (30 min for one topic
> enough?).
> > >
> > >
> > Thanks for bringing us back to topic Stephen.
> >
> > Yes, lets do 26th. Speak up if this does not suit. I will file a meetup
> > page in an hour or so. Where should we do it? Enis offered his nice
> place.
> > Could try and get space at ours too... in Palo Alto (less 'deep south', a
> > little easier for the SFers).
> >
> > As to too many topics, in my experience, a bunch of smelly engineers all
> in
> > a room starts to fall apart after a couple of hours especially when
> ranging
> > discussion. Suggest we cut the time-per-topic and list of topics so can
> do
> > in an afternoon. If some topics are too fat, can do break out or put-off
> to
> > another day and smaller, interested group.
> >
> > St.Ack
> >
> >
> >
> >
> > > Thanks
> > > Stephen
> > >
> > >
> > > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <an...@gmail.com>
> > wrote:
> > >
> > >> We will be doing some more large data tests in coming week Andy..
>  Will
> > >> report back more.  Also will do a write up , in what all ways the work
> > >> might help us.  As Sean said, we will continue in another thread if
> any
> > >> thing further..  Will soon write back on the test result.  Thanks.
> > >>
> > >> -Anoop-
> > >>
> > >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> > andrew.purtell@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > Cool, thanks.
> > >> >
> > >> > Is a 20% latency reduction the most we can expect or do you think
> > there
> > >> is
> > >> > room for more improvement? Just curious.
> > >> >
> > >> > Is latency reduction the only goal? Anything here about supporting
> > >> larger
> > >> > heaps? Is there something we can measure in that regard?
> > >> >
> > >> > Hope you see my point and there's enough here to prime a goals and
> > >> metrics
> > >> > discussion at the pow wow or on the relevant JIRAs.
> > >> >
> > >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> > >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >> > >
> > >> > > Hi Andy
> > >> > >
> > >> > > Based on our POCs done, we expect around 20% improvement in
> latency.
> > >> For
> > >> > > scans it will be little lesser than 20%.
> > >> > >
> > >> > > Regards
> > >> > > Ram
> > >> > >
> > >> > >
> > >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> > >> > andrew.purtell@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > >> Hi Ram,
> > >> > >>
> > >> > >> Do you have any targets for what you are measuring? What are the
> > >> goals
> > >> > you
> > >> > >> guys are working toward with the off heaping changes?
> > >> > >>
> > >> > >>
> > >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> > >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> > >> > >>>
> > >> > >>> Thanks Vladimir.
> > >> > >>> Yeah, the reports that were attached specifically captured the
> > >> 95/99th
> > >> > >>> percentile.
> > >> > >>> The reason for checking the server side perf was to specifically
> > see
> > >> > the
> > >> > >>> improvement in the server side and also the client was sending
> > large
> > >> > >>> results in multiple threads. So wanted to avoid the n/w
> > >> interference. I
> > >> > >>> think it was a general practice that we were following.
> > >> > >>> We Wil do some more tests and get some latest readings with
> bigger
> > >> data
> > >> > >>> sets.
> > >> > >>> Sent from mobile.
> > >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> > >> andrew.purtell@gmail.com>
> > >> > >> wrote:
> > >> > >>>>
> > >> > >>>> +1
> > >> > >>>>
> > >> > >>>> Yeah, something like that, with aspirational targets for
> > >> improvement
> > >> > >> from
> > >> > >>>> current releases. Then what to measure, the tests to run, and
> > >> criteria
> > >> > >> for
> > >> > >>>> evaluation are clear and organized and we're able to better
> > assess
> > >> how
> > >> > >> the
> > >> > >>>> work in progress is meeting its goals (or not)
> > >> > >>>>
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> > >> > vladrodionov@gmail.com
> > >> > >>>
> > >> > >>>> wrote:
> > >> > >>>>
> > >> > >>>>>>> Umbrella jira to make sure we can have blocks cached in
> > offheap
> > >> > >> backed
> > >> > >>>>> cache. In the entire read path, we can refer to this offheap
> > >> buffer
> > >> > and
> > >> > >>>>> avoid onheap copying.
> > >> > >>>>>
> > >> > >>>>> I think, on a read path, the most important improvement we
> could
> > >> > >> imagine
> > >> > >>>> is
> > >> > >>>>> elimination or reducing of object creations (KVs, iterators
> > etc).
> > >> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API
> > >> change
> > >> > >>>> etc.
> > >> > >>>>> If this is a part of this JIRA, then I would easily define a
> > goal:
> > >> > >>>>> improving 95/99% latency of a read operations. Not
> performance,
> > >> but
> > >> > >>>> latency
> > >> > >>>>> matters
> > >> > >>>>>
> > >> > >>>>> -Vlad
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>>
> > >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> > >> > >>>> andrew.purtell@gmail.com>
> > >> > >>>>> wrote:
> > >> > >>>>>
> > >> > >>>>>> That's not a realistic or useful test scenario, unless the
> goal
> > >> is
> > >> > to
> > >> > >>>>>> accelerate queries where all cells are filtered at the
> server.
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>
> > >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> > anoop.hbase@gmail.com
> > >> >
> > >> > >>>> wrote:
> > >> > >>>>>>>
> > >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it,
> we
> > >> have
> > >> > >>>> added
> > >> > >>>>>>> perf numbers in a cluster testing.  This was done using PE
> get
> > >> and
> > >> > >> scan
> > >> > >>>>>>> tests with filtering all cells at server (to not consider
> n/w
> > >> > >> bandwidth
> > >> > >>>>>>> constraints)
> > >> > >>>>>>>
> > >> > >>>>>>> -Anoop-
> > >> > >>>>>>>
> > >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> > >> > >>>>>> andrew.purtell@gmail.com>
> > >> > >>>>>>> wrote:
> > >> > >>>>>>>
> > >> > >>>>>>>> We have some microbenchmarks, not evidence of differences
> > seen
> > >> > from
> > >> > >> a
> > >> > >>>>>>>> client application. I'm not saying that microbenchmarks are
> > not
> > >> > >>>> totally
> > >> > >>>>>>>> necessary and a great start - they are - but that they
> don't
> > >> > measure
> > >> > >>>> an
> > >> > >>>>>> end
> > >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't
> > >> have a
> > >> > >>>> JIRA
> > >> > >>>>>> or
> > >> > >>>>>>>> design doc that states a clear end goal metric like the
> > >> strawman I
> > >> > >>>> threw
> > >> > >>>>>>>> together in my previous mail. A measurable system level
> goal
> > >> and
> > >> > >> some
> > >> > >>>>>> data
> > >> > >>>>>>>> from full cluster testing would go a lot further toward
> > letting
> > >> > all
> > >> > >> of
> > >> > >>>>>> us
> > >> > >>>>>>>> evaluate the potential and payoff of the work. In the
> > meantime
> > >> we
> > >> > >>>> should
> > >> > >>>>>>>> probably be assembling these changes on a branch instead of
> > in
> > >> > >> trunk,
> > >> > >>>>>> for
> > >> > >>>>>>>> as long as the goal is not clearly defined and the payoff
> and
> > >> > >>>> potential
> > >> > >>>>>> for
> > >> > >>>>>>>> perf regressions is untested and unknown.
> > >> > >>>>>>>>
> > >> > >>>>>>>>
> > >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
> > >> anoop.hbase@gmail.com>
> > >> > >>>> wrote:
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
> > which
> > >> > >>>> contains
> > >> > >>>>>>>> some
> > >> > >>>>>>>>> perf gain numbers..  We will be doing more tests in next 2
> > >> weeks
> > >> > >>>>>> (before
> > >> > >>>>>>>>> end of this month) and will publish them.   Yes it will be
> > >> great
> > >> > if
> > >> > >>>> it
> > >> > >>>>>> is
> > >> > >>>>>>>>> more IST friendly time :-)
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> -Anoop-
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> > >> > >>>>>>>> andrew.purtell@gmail.com>
> > >> > >>>>>>>>> wrote:
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> known
> > >> > always
> > >> > >>>>>> argue
> > >> > >>>>>>>>>> both side of a discussion and to never take sides easily
> > >> (drives
> > >> > >>>> some
> > >> > >>>>>>>> folks
> > >> > >>>>>>>>>> crazy).
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I can vouch for this (smile)
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> I also can offer support for off heaping there. At the
> same
> > >> time
> > >> > >> we
> > >> > >>>> do
> > >> > >>>>>>>>>> have a gap where we can't point to a timeline of
> > improvements
> > >> > >> (yet,
> > >> > >>>>>>>> anyway)
> > >> > >>>>>>>>>> with benchmarks showing gains where your goals need them.
> > For
> > >> > >>>> example,
> > >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for response
> > time
> > >> > >>>>>>>> distribution
> > >> > >>>>>>>>>> D; dev version of HBase in off heap branch can address
> max
> > >> N' GB
> > >> > >> for
> > >> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> > >> > >>>>>> statistically
> > >> > >>>>>>>>>> shows better/lower response times).
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
> > >> larsh@apache.org>
> > >> > >>>> wrote:
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> I'm in favor of anything that improves performance (and
> > >> > >> preferably
> > >> > >>>>>>>>>> doesn't set us back into a world that's worse than C due
> to
> > >> the
> > >> > >> lack
> > >> > >>>>>> of
> > >> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just
> > >> that
> > >> > I'm
> > >> > >>>>>>>> perhaps
> > >> > >>>>>>>>>> asking for more numbers and justification in weighing the
> > >> pros
> > >> > and
> > >> > >>>>>> cons.
> > >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been
> known
> > >> > always
> > >> > >>>>>> argue
> > >> > >>>>>>>>>> both side of a discussion and to never take sides easily
> > >> (drives
> > >> > >>>> some
> > >> > >>>>>>>> folks
> > >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed
> > :)
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there
> > is
> > >> a
> > >> > >>>>>> fighting
> > >> > >>>>>>>>>> chance that folks on IST can participate. I know that
> some
> > of
> > >> > our
> > >> > >>>>>> folks
> > >> > >>>>>>>> on
> > >> > >>>>>>>>>> IST would love to participate in the backup discussion).
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF.
> > I'd
> > >> > just
> > >> > >>>>>> need
> > >> > >>>>>>>>>> an approx. number of folks.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> -- Lars
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> From: ramkrishna vasudevan <
> > >> ramkrishna.s.vasudevan@gmail.com>
> > >> > >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
> > >> > >> hofhansl <
> > >> > >>>>>>>>>> larsh@apache.org>
> > >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > >> > >> near-term
> > >> > >>>>>> work
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> Hi
> > >> > >>>>>>>>>>> What time will it be on August 26th?
> > >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of
> > this
> > >> > >>>>>> offheaping
> > >> > >>>>>>>>>> stuff.  May be if we (from India) can attend this meeting
> > >> > remotely
> > >> > >>>>>> your
> > >> > >>>>>>>>>> thoughts can be discussed and also the current state of
> > this
> > >> > work.
> > >> > >>>>>>>>>>> RegardsRam
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> > >> > larsh@apache.org
> > >> > >>>
> > >> > >>>>>>>> wrote:
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of
> > >> August
> > >> > >> 9th.
> > >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours
> are
> > >> more
> > >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so
> data
> > is
> > >> > >>>>>> "grouped"
> > >> > >>>>>>>> by
> > >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of
> > the
> > >> > folks
> > >> > >>>> who
> > >> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts
> > (although
> > >> you
> > >> > >>>> folks
> > >> > >>>>>>>>>> usually do not like what I think about the offheap
> efforts
> > >> :) ).
> > >> > >>>>>>>>>>> Would like to add the following topics:
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits
> in
> > >> the
> > >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the
> > >> "Time"
> > >> > >>>> topic)
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> - "Replication". We found that replication cannot keep
> up
> > >> with
> > >> > >> high
> > >> > >>>>>>>>>> write loads, due to the fact that replicated is strictly
> > >> single
> > >> > >>>>>> threaded
> > >> > >>>>>>>>>> per regionserver (even though we have multiple region
> > >> servers on
> > >> > >> the
> > >> > >>>>>>>> sink
> > >> > >>>>>>>>>> side)
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> -- Lars
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> ________________________________
> > >> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> > >> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> > >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> > >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> > >> > >> near-term
> > >> > >>>>>> work
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th
> of
> > >> > >> August.
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>> --
> > >> > >>>>>>>>>>> Sean
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>
> > >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> > >> > apurtell@apache.org>
> > >> > >>>>>>>> wrote:
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> I can be up in your area in August.
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <
> > stack@duboce.net
> > >> >
> > >> > >>>> wrote:
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> > >> > >>>>>> enis.soz@gmail.com>
> > >> > >>>>>>>>>>>>> wrote:
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the
> talk-aton.
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer
> something
> > >> next
> > >> > >> week
> > >> > >>>>>> if
> > >> > >>>>>>>>>>>>>> possible.
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th
> of
> > >> > August
> > >> > >>>>>>>> (Mikhail
> > >> > >>>>>>>>>>>> on
> > >> > >>>>>>>>>>>>> the 20th).
> > >> > >>>>>>>>>>>>> St.Ack
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>> Enis
> > >> > >>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
> > >> stack@duboce.net>
> > >> > >>>> wrote:
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together
> > >> for a
> > >> > >>>>>> pow-wow.
> > >> > >>>>>>>>>>>>> There
> > >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see
> below
> > >> > list)
> > >> > >>>> and
> > >> > >>>>>> it
> > >> > >>>>>>>>>>>>> would
> > >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas
> > that
> > >> > have
> > >> > >>>>>> gone
> > >> > >>>>>>>>>>>>>> dormant
> > >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> > >> JIRA-attached
> > >> > >>>> google
> > >> > >>>>>>>> doc
> > >> > >>>>>>>>>>>>>> that
> > >> > >>>>>>>>>>>>>>> need socializing.
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit
> > hat.
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> > >> > >>>>>> (Matteo/Stack)
> > >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
> > >> > alternate
> > >> > >>>>>>>> KeyValue
> > >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> > >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> > >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> > >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> > >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> > >> > >>>>>>>>>>>>>>> + Time (Enis)
> > >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> > >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> > >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic.
> > If
> > >> you
> > >> > >>>> want
> > >> > >>>>>> to
> > >> > >>>>>>>>>>>>> take
> > >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.
> > Suggest
> > >> > that
> > >> > >>>>>>>>>>>>> discussion
> > >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> > >> > >>>>>>>>>>>>>>> thought/design/implementation.
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> What do others think?
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> What date would suit folks?
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Anyone want to host?
> > >> > >>>>>>>>>>>>>>>
> > >> > >>>>>>>>>>>>>>> Thanks,
> > >> > >>>>>>>>>>>>>>> Matteo and St.Ack
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> --
> > >> > >>>>>>>>>>>> Best regards,
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> - Andy
> > >> > >>>>>>>>>>>>
> > >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting
> > >> back. -
> > >> > >>>> Piet
> > >> > >>>>>>>> Hein
> > >> > >>>>>>>>>>>> (via Tom White)
> > >> > >>
> > >> >
> > >>
> > >
> > >
> >
>

Re: DISCUSSION: lets do a developer workshop on near-term work

Posted by Enis Söztutar <en...@apache.org>.
Agreed, too many fat topics, but all important. I guess we can spend first
10-20 mins on the agenda based on who is in the room and come up with a
shorter list and go from there.

Enis

On Tue, Aug 11, 2015 at 9:23 PM, Stack <st...@duboce.net> wrote:

> On Mon, Jul 20, 2015 at 1:04 PM, Stephen Jiang <sy...@gmail.com>
> wrote:
>
> > [Let us move back to the main topic - a meeting to talk about the next
> > direction on HBASE development]
> >
> > Are we firm on the *August 26th* meeting date?
> >
> > Given the long list of topics from St.Ack, even a one day meeting might
> > not cover all of them (in depth).  We need to either trim the topic list
> or
> > limit the time to discuss a single topic (30 min for one topic enough?).
> >
> >
> Thanks for bringing us back to topic Stephen.
>
> Yes, lets do 26th. Speak up if this does not suit. I will file a meetup
> page in an hour or so. Where should we do it? Enis offered his nice place.
> Could try and get space at ours too... in Palo Alto (less 'deep south', a
> little easier for the SFers).
>
> As to too many topics, in my experience, a bunch of smelly engineers all in
> a room starts to fall apart after a couple of hours especially when ranging
> discussion. Suggest we cut the time-per-topic and list of topics so can do
> in an afternoon. If some topics are too fat, can do break out or put-off to
> another day and smaller, interested group.
>
> St.Ack
>
>
>
>
> > Thanks
> > Stephen
> >
> >
> > On Mon, Jul 20, 2015 at 9:50 AM, Anoop John <an...@gmail.com>
> wrote:
> >
> >> We will be doing some more large data tests in coming week Andy..   Will
> >> report back more.  Also will do a write up , in what all ways the work
> >> might help us.  As Sean said, we will continue in another thread if any
> >> thing further..  Will soon write back on the test result.  Thanks.
> >>
> >> -Anoop-
> >>
> >> On Mon, Jul 20, 2015 at 9:59 PM, Andrew Purtell <
> andrew.purtell@gmail.com
> >> >
> >> wrote:
> >>
> >> > Cool, thanks.
> >> >
> >> > Is a 20% latency reduction the most we can expect or do you think
> there
> >> is
> >> > room for more improvement? Just curious.
> >> >
> >> > Is latency reduction the only goal? Anything here about supporting
> >> larger
> >> > heaps? Is there something we can measure in that regard?
> >> >
> >> > Hope you see my point and there's enough here to prime a goals and
> >> metrics
> >> > discussion at the pow wow or on the relevant JIRAs.
> >> >
> >> > > On Jul 20, 2015, at 4:43 AM, ramkrishna vasudevan <
> >> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > Hi Andy
> >> > >
> >> > > Based on our POCs done, we expect around 20% improvement in latency.
> >> For
> >> > > scans it will be little lesser than 20%.
> >> > >
> >> > > Regards
> >> > > Ram
> >> > >
> >> > >
> >> > > On Sun, Jul 19, 2015 at 10:20 AM, Andrew Purtell <
> >> > andrew.purtell@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> Hi Ram,
> >> > >>
> >> > >> Do you have any targets for what you are measuring? What are the
> >> goals
> >> > you
> >> > >> guys are working toward with the off heaping changes?
> >> > >>
> >> > >>
> >> > >>>> On Jul 18, 2015, at 9:16 PM, ramkrishna vasudevan <
> >> > >>> ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >>>
> >> > >>> Thanks Vladimir.
> >> > >>> Yeah, the reports that were attached specifically captured the
> >> 95/99th
> >> > >>> percentile.
> >> > >>> The reason for checking the server side perf was to specifically
> see
> >> > the
> >> > >>> improvement in the server side and also the client was sending
> large
> >> > >>> results in multiple threads. So wanted to avoid the n/w
> >> interference. I
> >> > >>> think it was a general practice that we were following.
> >> > >>> We Wil do some more tests and get some latest readings with bigger
> >> data
> >> > >>> sets.
> >> > >>> Sent from mobile.
> >> > >>>> On Jul 19, 2015 1:05 AM, "Andrew Purtell" <
> >> andrew.purtell@gmail.com>
> >> > >> wrote:
> >> > >>>>
> >> > >>>> +1
> >> > >>>>
> >> > >>>> Yeah, something like that, with aspirational targets for
> >> improvement
> >> > >> from
> >> > >>>> current releases. Then what to measure, the tests to run, and
> >> criteria
> >> > >> for
> >> > >>>> evaluation are clear and organized and we're able to better
> assess
> >> how
> >> > >> the
> >> > >>>> work in progress is meeting its goals (or not)
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> On Jul 18, 2015, at 12:05 PM, Vladimir Rodionov <
> >> > vladrodionov@gmail.com
> >> > >>>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>>>> Umbrella jira to make sure we can have blocks cached in
> offheap
> >> > >> backed
> >> > >>>>> cache. In the entire read path, we can refer to this offheap
> >> buffer
> >> > and
> >> > >>>>> avoid onheap copying.
> >> > >>>>>
> >> > >>>>> I think, on a read path, the most important improvement we could
> >> > >> imagine
> >> > >>>> is
> >> > >>>>> elimination or reducing of object creations (KVs, iterators
> etc).
> >> > >>>>> object reuse, byte buffers reuse or offheap buffers reuse, API
> >> change
> >> > >>>> etc.
> >> > >>>>> If this is a part of this JIRA, then I would easily define a
> goal:
> >> > >>>>> improving 95/99% latency of a read operations. Not performance,
> >> but
> >> > >>>> latency
> >> > >>>>> matters
> >> > >>>>>
> >> > >>>>> -Vlad
> >> > >>>>>
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> On Sat, Jul 18, 2015 at 11:24 AM, Andrew Purtell <
> >> > >>>> andrew.purtell@gmail.com>
> >> > >>>>> wrote:
> >> > >>>>>
> >> > >>>>>> That's not a realistic or useful test scenario, unless the goal
> >> is
> >> > to
> >> > >>>>>> accelerate queries where all cells are filtered at the server.
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>> On Jul 18, 2015, at 11:02 AM, Anoop John <
> anoop.hbase@gmail.com
> >> >
> >> > >>>> wrote:
> >> > >>>>>>>
> >> > >>>>>>> No Andy. 11425 having doc attached to it. At the end of it, we
> >> have
> >> > >>>> added
> >> > >>>>>>> perf numbers in a cluster testing.  This was done using PE get
> >> and
> >> > >> scan
> >> > >>>>>>> tests with filtering all cells at server (to not consider n/w
> >> > >> bandwidth
> >> > >>>>>>> constraints)
> >> > >>>>>>>
> >> > >>>>>>> -Anoop-
> >> > >>>>>>>
> >> > >>>>>>> On Sat, Jul 18, 2015 at 9:30 PM, Andrew Purtell <
> >> > >>>>>> andrew.purtell@gmail.com>
> >> > >>>>>>> wrote:
> >> > >>>>>>>
> >> > >>>>>>>> We have some microbenchmarks, not evidence of differences
> seen
> >> > from
> >> > >> a
> >> > >>>>>>>> client application. I'm not saying that microbenchmarks are
> not
> >> > >>>> totally
> >> > >>>>>>>> necessary and a great start - they are - but that they don't
> >> > measure
> >> > >>>> an
> >> > >>>>>> end
> >> > >>>>>>>> goal. Furthermore unless I've missed one somewhere we don't
> >> have a
> >> > >>>> JIRA
> >> > >>>>>> or
> >> > >>>>>>>> design doc that states a clear end goal metric like the
> >> strawman I
> >> > >>>> threw
> >> > >>>>>>>> together in my previous mail. A measurable system level goal
> >> and
> >> > >> some
> >> > >>>>>> data
> >> > >>>>>>>> from full cluster testing would go a lot further toward
> letting
> >> > all
> >> > >> of
> >> > >>>>>> us
> >> > >>>>>>>> evaluate the potential and payoff of the work. In the
> meantime
> >> we
> >> > >>>> should
> >> > >>>>>>>> probably be assembling these changes on a branch instead of
> in
> >> > >> trunk,
> >> > >>>>>> for
> >> > >>>>>>>> as long as the goal is not clearly defined and the payoff and
> >> > >>>> potential
> >> > >>>>>> for
> >> > >>>>>>>> perf regressions is untested and unknown.
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>>>> On Jul 18, 2015, at 8:05 AM, Anoop John <
> >> anoop.hbase@gmail.com>
> >> > >>>> wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>> Thanks Andy and Lars.  The parent jira has doc attached
> which
> >> > >>>> contains
> >> > >>>>>>>> some
> >> > >>>>>>>>> perf gain numbers..  We will be doing more tests in next 2
> >> weeks
> >> > >>>>>> (before
> >> > >>>>>>>>> end of this month) and will publish them.   Yes it will be
> >> great
> >> > if
> >> > >>>> it
> >> > >>>>>> is
> >> > >>>>>>>>> more IST friendly time :-)
> >> > >>>>>>>>>
> >> > >>>>>>>>> -Anoop-
> >> > >>>>>>>>>
> >> > >>>>>>>>> On Fri, Jul 17, 2015 at 9:44 PM, Andrew Purtell <
> >> > >>>>>>>> andrew.purtell@gmail.com>
> >> > >>>>>>>>> wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> >> > always
> >> > >>>>>> argue
> >> > >>>>>>>>>> both side of a discussion and to never take sides easily
> >> (drives
> >> > >>>> some
> >> > >>>>>>>> folks
> >> > >>>>>>>>>> crazy).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I can vouch for this (smile)
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> I also can offer support for off heaping there. At the same
> >> time
> >> > >> we
> >> > >>>> do
> >> > >>>>>>>>>> have a gap where we can't point to a timeline of
> improvements
> >> > >> (yet,
> >> > >>>>>>>> anyway)
> >> > >>>>>>>>>> with benchmarks showing gains where your goals need them.
> For
> >> > >>>> example,
> >> > >>>>>>>>>> stock HBase in one JVM can address max N GB for response
> time
> >> > >>>>>>>> distribution
> >> > >>>>>>>>>> D; dev version of HBase in off heap branch can address max
> >> N' GB
> >> > >> for
> >> > >>>>>>>>>> distribution D', where N' > N and D > D' (distribution D'
> >> > >>>>>> statistically
> >> > >>>>>>>>>> shows better/lower response times).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>> On Jul 17, 2015, at 6:56 AM, lars hofhansl <
> >> larsh@apache.org>
> >> > >>>> wrote:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> I'm in favor of anything that improves performance (and
> >> > >> preferably
> >> > >>>>>>>>>> doesn't set us back into a world that's worse than C due to
> >> the
> >> > >> lack
> >> > >>>>>> of
> >> > >>>>>>>>>> pointers in Java).Never said "I don't like it", it's just
> >> that
> >> > I'm
> >> > >>>>>>>> perhaps
> >> > >>>>>>>>>> asking for more numbers and justification in weighing the
> >> pros
> >> > and
> >> > >>>>>> cons.
> >> > >>>>>>>>>>> I can represent your side Ram (and Anoop). I've been known
> >> > always
> >> > >>>>>> argue
> >> > >>>>>>>>>> both side of a discussion and to never take sides easily
> >> (drives
> >> > >>>> some
> >> > >>>>>>>> folks
> >> > >>>>>>>>>> crazy). And Stack's there too, he yell at me where needed
> :)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Perhaps we can do it a bit later in the evening so there
> is
> >> a
> >> > >>>>>> fighting
> >> > >>>>>>>>>> chance that folks on IST can participate. I know that some
> of
> >> > our
> >> > >>>>>> folks
> >> > >>>>>>>> on
> >> > >>>>>>>>>> IST would love to participate in the backup discussion).
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Like Enis, I'm also happy to host. We're in Downtown SF.
> I'd
> >> > just
> >> > >>>>>> need
> >> > >>>>>>>>>> an approx. number of folks.
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> -- Lars
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> From: ramkrishna vasudevan <
> >> ramkrishna.s.vasudevan@gmail.com>
> >> > >>>>>>>>>>> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars
> >> > >> hofhansl <
> >> > >>>>>>>>>> larsh@apache.org>
> >> > >>>>>>>>>>> Sent: Wednesday, July 15, 2015 10:10 AM
> >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> >> > >> near-term
> >> > >>>>>> work
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Hi
> >> > >>>>>>>>>>> What time will it be on August 26th?
> >> > >>>>>>>>>>> @LarsYa. I know that you are not generally in favour of
> this
> >> > >>>>>> offheaping
> >> > >>>>>>>>>> stuff.  May be if we (from India) can attend this meeting
> >> > remotely
> >> > >>>>>> your
> >> > >>>>>>>>>> thoughts can be discussed and also the current state of
> this
> >> > work.
> >> > >>>>>>>>>>> RegardsRam
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> On Wed, Jul 15, 2015 at 9:28 PM, lars hofhansl <
> >> > larsh@apache.org
> >> > >>>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Works for me. I'll be back in the Bay Area the week of
> >> August
> >> > >> 9th.
> >> > >>>>>>>>>>> We have done a _lot_ of work on backups as well - ours are
> >> more
> >> > >>>>>>>>>> complicated as we wanted fast per-tenant restores, so data
> is
> >> > >>>>>> "grouped"
> >> > >>>>>>>> by
> >> > >>>>>>>>>> tenant. Would like to sync up on that (hopefully some of
> the
> >> > folks
> >> > >>>> who
> >> > >>>>>>>>>> wrote most of the code will be in town, I'll check).
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> Also interested in the "Time" and "offheap" parts
> (although
> >> you
> >> > >>>> folks
> >> > >>>>>>>>>> usually do not like what I think about the offheap efforts
> >> :) ).
> >> > >>>>>>>>>>> Would like to add the following topics:
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Timestamp Resolution". Or making space for more bits in
> >> the
> >> > >>>>>>>>>> timestamps (happy to cover that, unless it's part of the
> >> "Time"
> >> > >>>> topic)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Replication". We found that replication cannot keep up
> >> with
> >> > >> high
> >> > >>>>>>>>>> write loads, due to the fact that replicated is strictly
> >> single
> >> > >>>>>> threaded
> >> > >>>>>>>>>> per regionserver (even though we have multiple region
> >> servers on
> >> > >> the
> >> > >>>>>>>> sink
> >> > >>>>>>>>>> side)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> - "Spark integration" (Ted Malaska?)
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> OK... Out now to make a "bullshit hat".
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> -- Lars
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> ________________________________
> >> > >>>>>>>>>>> From: Sean Busbey <bu...@cloudera.com>
> >> > >>>>>>>>>>> To: dev <de...@hbase.apache.org>
> >> > >>>>>>>>>>> Sent: Tuesday, July 14, 2015 7:11 PM
> >> > >>>>>>>>>>> Subject: Re: DISCUSSION: lets do a developer workshop on
> >> > >> near-term
> >> > >>>>>> work
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> I'm planning to be in the Bay area the week of the 24th of
> >> > >> August.
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>> --
> >> > >>>>>>>>>>> Sean
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>
> >> > >>>>>>>>>>>> On Jul 14, 2015 7:53 PM, "Andrew Purtell" <
> >> > apurtell@apache.org>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> I can be up in your area in August.
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 5:31 PM, Stack <
> stack@duboce.net
> >> >
> >> > >>>> wrote:
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:39 PM, Enis Söztutar <
> >> > >>>>>> enis.soz@gmail.com>
> >> > >>>>>>>>>>>>> wrote:
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> Sounds good. It has been a while we did the talk-aton.
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> I'll be off starting 25 of July, so I prefer something
> >> next
> >> > >> week
> >> > >>>>>> if
> >> > >>>>>>>>>>>>>> possible.
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> You ever coming back? If so, when? I'm back on 10th of
> >> > August
> >> > >>>>>>>> (Mikhail
> >> > >>>>>>>>>>>> on
> >> > >>>>>>>>>>>>> the 20th).
> >> > >>>>>>>>>>>>> St.Ack
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>> Enis
> >> > >>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> On Tue, Jul 14, 2015 at 3:18 PM, Stack <
> >> stack@duboce.net>
> >> > >>>> wrote:
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Matteo and I were thinking it time devs got together
> >> for a
> >> > >>>>>> pow-wow.
> >> > >>>>>>>>>>>>> There
> >> > >>>>>>>>>>>>>>> is a bunch of stuff in flight at the moment (see below
> >> > list)
> >> > >>>> and
> >> > >>>>>> it
> >> > >>>>>>>>>>>>> would
> >> > >>>>>>>>>>>>>>> be good to meet and whiteboard, surface goodo ideas
> that
> >> > have
> >> > >>>>>> gone
> >> > >>>>>>>>>>>>>> dormant
> >> > >>>>>>>>>>>>>>> in JIRA, or revisit designs/proposals out in
> >> JIRA-attached
> >> > >>>> google
> >> > >>>>>>>> doc
> >> > >>>>>>>>>>>>>> that
> >> > >>>>>>>>>>>>>>> need socializing.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> You can only come if you are wearing your bullshit
> hat.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Topics we'd go over could include:
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> + Our filesystem layout will not work if 1M regions
> >> > >>>>>> (Matteo/Stack)
> >> > >>>>>>>>>>>>>>> + Current state of the offheaping of read path and
> >> > alternate
> >> > >>>>>>>> KeyValue
> >> > >>>>>>>>>>>>>>> implementation (Anoop/Ram)
> >> > >>>>>>>>>>>>>>> + Append rejigger (Elliott)
> >> > >>>>>>>>>>>>>>> + A Pv2-based Assign (Matteo/Steven)
> >> > >>>>>>>>>>>>>>> + Splitting meta/1M regions
> >> > >>>>>>>>>>>>>>> + The revived Backup (Vladimir)
> >> > >>>>>>>>>>>>>>> + Time (Enis)
> >> > >>>>>>>>>>>>>>> + The overloaded SequenceId (Stack)
> >> > >>>>>>>>>>>>>>> + Upstreaming IT testing (Dima/Sean)
> >> > >>>>>>>>>>>>>>> + hbase-2.0.0
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> I put names by folks I know could talk to the topic.
> If
> >> you
> >> > >>>> want
> >> > >>>>>> to
> >> > >>>>>>>>>>>>> take
> >> > >>>>>>>>>>>>>>> over a topic or put your name by one, just say.
> Suggest
> >> > that
> >> > >>>>>>>>>>>>> discussion
> >> > >>>>>>>>>>>>>>> lead off with a 5-10minute on current state of
> >> > >>>>>>>>>>>>>>> thought/design/implementation.
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> What do others think?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> What date would suit folks?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Anyone want to host?
> >> > >>>>>>>>>>>>>>>
> >> > >>>>>>>>>>>>>>> Thanks,
> >> > >>>>>>>>>>>>>>> Matteo and St.Ack
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> --
> >> > >>>>>>>>>>>> Best regards,
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> - Andy
> >> > >>>>>>>>>>>>
> >> > >>>>>>>>>>>> Problems worthy of attack prove their worth by hitting
> >> back. -
> >> > >>>> Piet
> >> > >>>>>>>> Hein
> >> > >>>>>>>>>>>> (via Tom White)
> >> > >>
> >> >
> >>
> >
> >
>