You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2009/06/01 01:20:25 UTC

Re: KeyValue and BatchOperation

This is good place to start: http://wiki.apache.org/hadoop/HowToContribute

This is a version localized to hbase but may be a little stale:
http://wiki.apache.org/hadoop/Hbase/HowToContribute

Start small (smile).

Ask questions if nothing's clear.

Good stuff,

St.Ack


On Sun, May 31, 2009 at 12:53 PM, Sudipto Das <su...@gmail.com> wrote:

> Hi Stack,
>
> Thanks for the comments. Can you give me a pointer to how to submit patches
> etc. I am not very accustomed to large version management systems and huge
> projects. During my leisure, I will create and submit patches for these
> enhancements. I will be updating my local code, so there is not hurry in
> committing these changes, if you decide to commit them. :)
>
> Thanks
> Sudipto
>
> PhD Candidate
> CS @ UCSB
> Santa Barbara, CA 93106, USA
> http://www.cs.ucsb.edu/~sudipto <http://www.cs.ucsb.edu/%7Esudipto>
>
>
> On Sat, May 30, 2009 at 2:24 PM, stack <st...@duboce.net> wrote:
>
> > On Thu, May 28, 2009 at 10:33 PM, Sudipto Das <su...@gmail.com>
> wrote:
> >
> > >
> > > About augmenting HBaseRPC, it would be really helpful if someone can
> > extend
> > > the functionality of HBaseRPC by simply subclassing it. But in the
> > present
> > > version, the Invocation class, where the method names are converted to
> > > codes, is set to private and hence is not accessible from a sub class.
> >
> >
> >
> > These clases were copied down from our parent, Hadoop Core, and modified.
> > Most of the access definition remains as it was when we copied it --
> though
> > we had to change some to get our versions of things like ObjectWritable
> > into
> > place.
> >
> > Best thing to do is make a patch.  Be parsimonious about changes made,
> > especially in these classes -- it will make it easier to review your
> patch
> > and so have it applied and it'll help going forward as we try to keep up
> > with important RPC changes in Hadoop Core.
> >
> >
> >
> > > Similar is the case with hbase.io.HbaseObjectWritable (btw is the Hbase
> > > intentional? most other classes are name HBase) where adding new code
> for
> > > classes is also private and hence cannot be extended. So right now
> > > extending
> > > HBaseRPC amounts to either modifying HBase code and rebuilding the jar,
> > or
> > > to simply copy the code to a local package, modify it, and sbypass the
> > > original HBaseRPC mechanism. I took the latter shortcut, so I haven't
> > > really
> > > thought of a design which can be easily extendable.
> >
> >
> > HbaseObjectWritable should be changed, yeah.
> >
> > Again, subject the minimally intrusive patch that will get you what you
> > want.   Would suggest you not do radical rewrite if you want it committed
> > any time soon.
> >
> >
> > > Another feature in HBaseRPC is that both the client and the server are
> > > synchronous. I can understand the synchronous behavior of the client
> > calls
> > > (the calls to HBase block until the client gets the response from the
> > > remote
> > > region server), but synchronous behavior for the Server can deteriorate
> > the
> > > performance of the server. For example, HBaseServer by default has 10
> > > handler threads, and each thread takes up a request, and blocks till
> the
> > > request is completed by HRegionServer.
> > >
> > Now if there are 10 updates, all
> > > going to the Log, and the HDFS holds up a few of threads for an
> extended
> > > amount of time, the number of handlers for incoming requests is
> > > considerably
> > > reduced. Again, increasing the number of handlers is just a temporary
> > > solution till some scans miss the cache and require remote HDFS reads.
> In
> > > addition, already there are a large number of threads in the system. I
> > was
> > > wondering if an asynchronous design of HBaseServer would be useful.
> > Correct
> > > me if I am wrong.
> >
> >
> >
> > You are not wrong.
> >
> > The RPC needs to be replaced.  Previous, its not been that much of a
> > priority and we were expecting to just ride along on the tails of Hadoop
> > Core.
> >
> > Hadoop Core RPC is being revamped, probably by 0.21 (see project avro)
> but
> > this may or not meet our needs -- we don't need a generalized RPC since
> we
> > only have a few request/response types, we're sending data so we need a
> > format that is as compact as possible, and we want async nio.  Some eval.
> > has been going on by one of the lads looking at alternatives.  Devs
> > chatting
> > are thinking this a fundamental for hbase 0.21.0.
> >
> >
> >
> > >
> > > About the Log, yes, a Logger with multiple log workers can be easily
> > > designed. But ordering the log entries by time of arrival can be
> tricky,
> > > even if the time is taken atomically using lock enclosing the time to
> > call.
> > > This is because if we are using System.currentTimeMillis(), time taken
> to
> > > acquire a lock, read the time and release the lock is way less than
> 1ms.
> > > Therefore, under high append load, multiple entries will have the same
> > > timestamp. I haven't checked with System.nanoTime() though. On the
> other
> > > hand, using an AtomicLong and assigning id using a shared reference
> > works.
> > > This id can also work as the LSN since a log only requires LSN to be
> > > monotonically increasing, and a totally ordered queue of Log requests
> > from
> > > where the worker threads poll for requests would ensure that.
> >
> >
> >
> > True.  I was thinking the application timestamp good enough sorting
> between
> > the two or three log files that we'd be running concurrently on a
> > paritcular
> > regionserver.  Order would be important only really on edits to same
> > cell...
> > but after you exposition below, it probably needs more thought.
> >
> >
> > > For the type of Log entries, the design of HLog requires that the
> LogKey
> > > and
> > > LogEdit need to implement Writable and HeapSize. So if we define a
> > combined
> > > type for these two interfaces (just like WritableComparable), then it
> > would
> > > work with the same design of HLog, but would be flexible enough to take
> a
> > > general class of edits and reusable as well. I think some tweaking with
> > the
> > > reflection API will be needed in the place whether the Hadoop writer is
> > > created where HLog passes the class types of the Key and Value.
> > >
> >
> >
> > We'll take the above as a patch.  Sounds reasonable.
> >
> >
> >
> > >
> > > I am working on the above mentioned things for my work, so if I find
> them
> > > to
> > > be improvements, I can let you know. Though it might take some time.
> > Thanks
> > > again for the helpful comments.
> > >
> >
> > Let us know how else we can help.  In particular, you are in deep in guts
> > of
> > hbase so you may need more info about near future developments; just keep
> > asking questions.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >
> > > On Thu, May 28, 2009 at 9:06 PM, stack <st...@duboce.net> wrote:
> > >
> > > > Answers interwoven below.
> > > >
> > > > On Thu, May 28, 2009 at 7:01 PM, Sudipto Das <su...@cs.ucsb.edu>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using HBase for development of a system for my research
> project.
> > I
> > > > > have
> > > > > a few questions regarding some recent API and Class changes in
> HBase
> > > > which
> > > > > I
> > > > > suppose are to be released in HBase 0.20.
> > > > >
> > > > > * I saw that for internal operations of HBase a new class
> > > > > org.apache.hadoop.hbase.KeyValue has replaced the usage of a lot of
> > > other
> > > > > classes (for example HLogEdit and HStoreKey), but till now, there
> is
> > no
> > > > > change in the client API which still sends updates via
> BatchOperation
> > > and
> > > > > BatchUpdate which is then converted to KeyValue. So I am a bit
> > confused
> > > > > whether to use KeyValue or the present BatchOperation class for
> > > > > communicating with the RegionServers. I am not using the HBase
> Client
> > > > API,
> > > > > so I am not limited by that, so I was wondering which class would
> be
> > a
> > > > > better choice to be compatible with 0.20.
> > > > >
> > > >
> > > > You've caught TRUNK in a state of transition.   HBASE-1304 is the
> last
> > > > missing piece (HBASE-1234 added KeyValue and before that was change
> in
> > > our
> > > > store file format).  HBASE-1304 should be going in in the next week
> or
> > > so.
> > > > Its a fat patch that will deprecate current API in favor of a new,
> more
> > > > compact and comprehensive one.
> > > >
> > > > The overarching motivation behind 0.20.0 refactoring is efficiency
> and
> > > > speed-up.  The server/client API will change in HBASE-1304 in a
> manner
> > > > which
> > > > moves work out to the clients.  Server will probably be passing lists
> > of
> > > > KeyValues to client to organize.
> > > >
> > > > If you can wait a few more days, you'll have better idea of shape of
> > > things
> > > > to come (BatchUpdate and BatchOperation will be deprecated).
> > > >
> > > >
> > > > >
> > > > > * What is a good way of determining the HeapSize (in order to
> > implement
> > > > the
> > > > > hbase.io.HeapSize interface) for newly added classes? I saw that
> > > > > hbase.io.HeapSize has a few new constants provide sizes of some of
> > the
> > > > > common types, but most HBase internal classes use an assigned value
> > of
> > > > > HEAP_TAX, and I could not figure out how the value was obtained.
> > > >
> > > >
> > > > Value was obtained through study of heap changes creating particular
> > > class
> > > > instances in a controlled heap and by comparing with sizes reported
> by
> > a
> > > > profiler and using an instrumented JVM that reports JVM deep sizeof
> > > (There
> > > > are a few. Here's one: http://www.javamex.com/classmexer/... though
> I
> > > >  think
> > > > our Ryan was using something else).
> > > >
> > > > Ask on list if you need help.  Some of the lads are getting good at
> > this
> > > > stuff.
> > > >
> > > >
> > > >
> > > > > * For IPC, I could not pass a List of Writables since a List is not
> > > > > Writable. Is there any plan for adding a utility class (or is there
> > > > already
> > > > > any such class available?) that can act as a List of Writables type
> > and
> > > > can
> > > > > be shipped across the network using IPC.
> > > > >
> > > >
> > > >
> > > > Our IPC is a customized version of the Hadoop IPC passing codes for
> > > > classnames to save on size of messages among other things.  HBase
> > doesn't
> > > > need a generalized IPC as HBase does so we've cut it down.
> > > >
> > > > If you want to add new types -- I don't think we have a list at
> moment,
> > > > only
> > > > thing close are Map and array -- then you'll need to add a code.  If
> > this
> > > > is
> > > > not easy to do, then lets talk and make it so.  We want to make it so
> > > hbase
> > > > is subclassable or amendable and its proven so in some regard in that
> > > there
> > > > is the Transactional HBase done as a subclass but we for sure haven't
> > > made
> > > > universally so.  Help us out.
> > > >
> > > >
> > > > >
> > > > > * The HLog right now is very rigid in terms of what it accepts as
> Log
> > > > > Entries. Is there something inherent in Hadoop IO that prevents the
> > > > Logger
> > > > > from accepting any Writable as log edit, rather than the mandatory
> > > > KeyValue
> > > > > (or HLogEdit in 0.19.x). This will make the logger flexible and
> > > reusable
> > > > > for
> > > > > other uses. Apparently, Hadoop IO just needs writables. Is there
> any
> > > > catch
> > > > > in a generic type for the Log Key and Log Edit?
> > > >
> > > >
> > > >
> > > > As Value, I don't think so.  We've just made it so the value is just
> > the
> > > > value -- not a container with some fluff and then the value (which
> was
> > > what
> > > > HLogEdit was).  If you want this change, just submit a patch and
> we'll
> > > > commit it.
> > > >
> > > >
> > > > >
> > > > > * As noted in the Bigtable paper, a single logger thread can become
> a
> > > > > bottleneck for an update intensive workload with Write Ahead
> Logging.
> > I
> > > > was
> > > > > wondering if an advanced logger will be available in some newer
> > version
> > > > of
> > > > > HBase? Advanced as in multi-threaded logger supporting asynchronous
> > > > appends
> > > > > serialized using a common log sequence number.
> > > > >
> > > >
> > > > This came up in recent discussions up on IRC.  We need to work on
> this
> > > from
> > > > both ends.  We've started to keep an accounting of how long writes
> > take.
> > > > We're noticing that HDFS can stall quite frequently such that
> > > appends/syncs
> > > > can take seconds on occasion.    Because of this we started to talk
> up
> > a
> > > > pool of log writers.  It wouldn't be hard to do.  We actually went
> > ahead
> > > > and
> > > > changed the HLogEdit key adding time-of-addition so that we can
> > organize
> > > > the
> > > > edits later by their arrival time.
> > > >
> > > > On the other end, some work was done recently to multithread the log
> > > > splitting.  Its made a big improvement over what was there previous
> but
> > > is
> > > > insufficient serving a user-facing app out of hbase in real time.
> The
> > > > multhreading can be compounded so both reading and writing are
> > > > multithreaded
> > > > but more so, we need to make a system like that described in the
> > bigtable
> > > > paper where the split of logs is distributed out across the cluster
> > > > MapReduce style.  This is a critical need but won't be done till
> 0.21.0
> > > its
> > > > thought.
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Any comments and suggestions would be helpful.
> > > > >
> > > >
> > > > Thanks for the throughtful questions.
> > > >
> > > > Let us know how we can help get your project done.
> > > >
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > > > Regards
> > > > > Sudipto
> > > > >
> > > > > PhD Candidate
> > > > > CS @ UCSB
> > > > > Santa Barbara, CA 93106, USA
> > > > > http://www.cs.ucsb.edu/~sudipto<http://www.cs.ucsb.edu/%7Esudipto><
> http://www.cs.ucsb.edu/%7Esudipto><
> > http://www.cs.ucsb.edu/%7Esudipto> <
> > > http://www.cs.ucsb.edu/%7Esudipto> <
> > > > http://www.cs.ucsb.edu/%7Esudipto> <
> > > > > http://www.cs.ucsb.edu/%7Esudipto>
> > > > >
> > > >
> > >
> >
>