You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Alex Baranau <al...@gmail.com> on 2011/04/19 19:25:46 UTC

[ANN]: HBaseWD: Distribute Sequential Writes in HBase

Hello guys,

I'd like to introduce a new small java project/lib around HBase: HBaseWD. It
is aimed to help with distribution of the load (across regionservers) when
writing sequential (becasue of the row key nature) records. It implements
the solution which was discussed several times on this mailing list (e.g.
here: http://search-hadoop.com/m/gNRA82No5Wk).

Please find the sources at https://github.com/sematext/HBaseWD (there's also
a jar of current version for convenience). It is very easy to make use of
it: e.g. I added it to one existing project with 1+2 lines of code (one
where I write to HBase and 2 for configuring MapReduce job).

Any feedback is highly appreciated!

Please find below the short intro to the lib [1].

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

[1]

Description:
------------
HBaseWD stands for Distributing (sequential) Writes. It was inspired by
discussions on HBase mailing lists around the problem of choosing between:
* writing records with sequential row keys (e.g. time-series data with row
key
  built based on ts)
* using random unique IDs for records

First approach makes possible to perform fast range scans with help of
setting
start/stop keys on Scanner, but creates single region server hot-spotting
problem upon writing data (as row keys go in sequence all records end up
written into a single region at a time).

Second approach aims for fastest writing performance by distributing new
records over random regions but makes not possible doing fast range scans
against written data.

The suggested approach stays in the middle of the two above and proved to
perform well by distributing records over the cluster during writing data
while allowing range scans over it. HBaseWD provides very simple API to
work with which makes it perfect to use with existing code.

Please refer to unit-tests for lib usage info as they aimed to act as
example.

Brief Usage Info (Examples):
----------------------------

Distributing records with sequential keys which are being written in up to
Byte.MAX_VALUE buckets:

    byte bucketsCount = (byte) 32; // distributing into 32 buckets
    RowKeyDistributor keyDistributor =
                           new
RowKeyDistributorByOneBytePrefix(bucketsCount);
    for (int i = 0; i < 100; i++) {
      Put put = new Put(keyDistributor.getDistributedKey(originalKey));
      ... // add values
      hTable.put(put);
    }


Performing a range scan over written data (internally <bucketsCount>
scanners
executed):

    Scan scan = new Scan(startKey, stopKey);
    ResultScanner rs = DistributedScanner.create(hTable, scan,
keyDistributor);
    for (Result current : rs) {
      ...
    }

Performing mapreduce job over written data chunk specified by Scan:

    Configuration conf = HBaseConfiguration.create();
    Job job = new Job(conf, "testMapreduceJob");

    Scan scan = new Scan(startKey, stopKey);

    TableMapReduceUtil.initTableMapperJob("table", scan,
      RowCounterMapper.class, ImmutableBytesWritable.class, Result.class,
job);

    // Substituting standard TableInputFormat which was set in
    // TableMapReduceUtil.initTableMapperJob(...)
    job.setInputFormatClass(WdTableInputFormat.class);
    keyDistributor.addInfo(job.getConfiguration());


Extending Row Keys Distributing Patterns:
-----------------------------------------

HBaseWD is designed to be flexible and to support custom row key
distribution
approaches. To define custom row key distributing logic just implement
AbstractRowKeyDistributor abstract class which is really very simple:

    public abstract class AbstractRowKeyDistributor implements
Parametrizable {
      public abstract byte[] getDistributedKey(byte[] originalKey);
      public abstract byte[] getOriginalKey(byte[] adjustedKey);
      public abstract byte[][] getAllDistributedKeys(byte[] originalKey);
      ... // some utility methods
    }

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Stack <st...@duboce.net>.

On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <al...@gmail.com> wrote:
> Hello guys,
>

And girls!

Thanks for making this addition Alex (and posting the list).

Good stuff,
St.Ack

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Awesome, I'm going to check it out and use it today. Thank you :)

On Thu, May 19, 2011 at 8:14 AM, Alex Baranau <al...@gmail.com>wrote:

> Implemented RowKeyDistributorByHashPrefix. From README:
>
> Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
> see
> example below. It creates "distributed key" based on original key value
>
> so that later when you have original key and want to update the record you
> can
> calculate distributed key without roundtrip to HBase.
>
> AbstractRowKeyDistributor keyDistributor =
>         new RowKeyDistributorByHashPrefix(
>                   new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));
>
> You can use your own hashing logic here by implementing simple interface:
>
> public static interface Hasher extends Parametrizable {
>   byte[] getHashPrefix(byte[] originalKey);
>   byte[][] getAllPossiblePrefixes();
> }
>
>
> OneByteSimpleHash implements very simple hash algorythm: simple sum of all
> bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
> can use buckets count # up to 255. Please, use wisely, as (the same thing as
> with byOneByte prefix) Disctributed scanner will instantiate this number of
> scans under the hood.
>
> With this row key hash-based distributor, you can find out the distributed
> key (and use it to update the record) without roundtrip to HBase. From
> unit-test:
>
>     // Testing simple get
>     byte[] originalKey = new byte[] {123, 124, 122};
>
>     Put put = new Put(keyDistributor.getDistributedKey(originalKey));
>     put.add(CF, QUAL, Bytes.toBytes("some"));
>     hTable.put(put);
>
>     byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
>     Result result = hTable.get(new Get(distributedKey));
>     Assert.assertArrayEquals(originalKey,
> keyDistributor.getOriginalKey(result.getRow()));
>     Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
> QUAL));
>
>
> NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
> (downloadable from https://github.com/sematext/HBaseWD)
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> P.S.
> > Can you summarize HBaseWD in your blog
> That is on my todo list! You pushed it higher to the top priority items ;)
>
>
> On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> I have another question about option 2. It seems like I need to handle the
>> distributed scan differently to read from start row to end row, assuming 1
>> byte hash of the original key is used as prefix since the order of the
>> original key range is different from the resulting distributed key range.
>>
>> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Alex:
>> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>> >
>> > Thanks
>> >
>> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
>> > >wrote:
>> >
>> > > There are several options here. E.g.:
>> > >
>> > > 1) Given that you have "original key" of the record, you can fetch the
>> > > stored record key from HBase and use it to create Put with updated (or
>> > new)
>> > > cells.
>> > >
>> > > Currently you'll need to use distributes scan for that, there's not
>> > > analogue
>> > > for Get operation yet (see
>> https://github.com/sematext/HBaseWD/issues/1
>> > ).
>> > >
>> > > Note: you need to first find out the real key of stored record by
>> > fetching
>> > > data from HBase in case you use included in current lib
>> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>> > >
>> > > 2) You can create your own RowKeyDistributor implementation which will
>> > > create "distributed key" based on original key value so that later
>> when
>> > you
>> > > have original key and want to update the record you can calculate
>> > > distributed key without roundtrip to HBase.
>> > >
>> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte
>> hash
>> > of
>> > > original key (https://github.com/sematext/HBaseWD/issues/2).
>> > >
>> > >
>> > >
>> > > In either way you don't need to delete record to update some cells of
>> it
>> > or
>> > > add new cells.
>> > >
>> > > Please let me know if you have more Qs!
>> > >
>> > > Alex Baranau
>> > > ----
>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> > HBase
>> > >
>> > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
>> > > wrote:
>> > >
>> > > > I have another question. For overwriting, do I need to delete the
>> > > existing
>> > > > one before re-writing it?
>> > > >
>> > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <
>> weishung@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
>> > > > >
>> > > > >
>> > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
>> > > alex.baranov.v@gmail.com
>> > > > >wrote:
>> > > > >
>> > > > >> Thanks for the interest!
>> > > > >>
>> > > > >> We are using it in production. It is simple and hence quite
>> stable.
>> > > > Though
>> > > > >> some minor pieces are missing (like
>> > > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't
>> affect
>> > > > >> stability
>> > > > >> and/or major functionality.
>> > > > >>
>> > > > >> Alex Baranau
>> > > > >> ----
>> > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>> Hadoop
>> > -
>> > > > >> HBase
>> > > > >>
>> > > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
>> > weishung@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > What's the status on this package? Is it mature enough?
>> > > > >> >  I am using it in my project, tried out the write method
>> yesterday
>> > > and
>> > > > >> > going
>> > > > >> > to incorporate into read method tomorrow.
>> > > > >> >
>> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
>> > > > alex.baranov.v@gmail.com
>> > > > >> > >wrote:
>> > > > >> >
>> > > > >> > > > The start/end rows may be written twice.
>> > > > >> > >
>> > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
>> > > > "bearable"
>> > > > >> in
>> > > > >> > > attribute value no matter how long are they (keys), since we
>> > > already
>> > > > >> OK
>> > > > >> > > with
>> > > > >> > > transferring them initially (i.e. we should be OK with
>> > > transferring
>> > > > 2x
>> > > > >> > > times
>> > > > >> > > more).
>> > > > >> > >
>> > > > >> > > So, what about the suggestion of sourceScan attribute value I
>> > > > >> mentioned?
>> > > > >> > If
>> > > > >> > > you can tell why it isn't sufficient in your case, I'd have
>> more
>> > > > info
>> > > > >> to
>> > > > >> > > think about better suggestion ;)
>> > > > >> > >
>> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
>> > > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > > >> > > >?
>> > > > >> > >
>> > > > >> > > np. Can do that. Just thought that they (patches) can be
>> sorted
>> > by
>> > > > >> date
>> > > > >> > to
>> > > > >> > > find out the final one (aka "convention over naming-rules").
>> > > > >> > >
>> > > > >> > > Alex.
>> > > > >> > >
>> > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <
>> yuzhihong@gmail.com>
>> > > > wrote:
>> > > > >> > >
>> > > > >> > > > >> Though it might be ok, since we anyways "transfer"
>> > start/stop
>> > > > >> rows
>> > > > >> > > with
>> > > > >> > > > Scan object.
>> > > > >> > > > In write() method, we now have:
>> > > > >> > > >     Bytes.writeByteArray(out, this.startRow);
>> > > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
>> > > > >> > > > ...
>> > > > >> > > >       for (Map.Entry<String, byte[]> attr :
>> > > > >> this.attributes.entrySet())
>> > > > >> > {
>> > > > >> > > >         WritableUtils.writeString(out, attr.getKey());
>> > > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
>> > > > >> > > >       }
>> > > > >> > > > The start/end rows may be written twice.
>> > > > >> > > >
>> > > > >> > > > Of course, you have full control over how to generate the
>> > unique
>> > > > ID
>> > > > >> for
>> > > > >> > > > "sourceScan" attribute.
>> > > > >> > > >
>> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
>> > Maybe
>> > > > the
>> > > > >> > > second
>> > > > >> > > > should be named HBASE-3811-v2.patch<
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > > >> > > >?
>> > > > >> > > >
>> > > > >> > > > Thanks
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
>> > > > >> > alex.baranov.v@gmail.com
>> > > > >> > > >wrote:
>> > > > >> > > >
>> > > > >> > > >> > Can you remove the first version ?
>> > > > >> > > >> Isn't it ok to keep it in JIRA issue?
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > In HBaseWD, can you use reflection to detect whether
>> Scan
>> > > > >> supports
>> > > > >> > > >> setAttribute() ?
>> > > > >> > > >> > If it does, can you encode start row and end row as
>> > > > "sourceScan"
>> > > > >> > > >> attribute ?
>> > > > >> > > >>
>> > > > >> > > >> Yeah, smth like this is going to be implemented. Though
>> I'd
>> > > still
>> > > > >> want
>> > > > >> > > to
>> > > > >> > > >> hear from the devs the story about Scan version.
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > One consideration is that start row or end row may be
>> quite
>> > > > long.
>> > > > >> > > >>
>> > > > >> > > >> Yeah, that is was my though too at first. Though it might
>> be
>> > > ok,
>> > > > >> since
>> > > > >> > > we
>> > > > >> > > >> anyways "transfer" start/stop rows with Scan object.
>> > > > >> > > >>
>> > > > >> > > >> > What do you think ?
>> > > > >> > > >>
>> > > > >> > > >> I'd love to hear from you is this variant I mentioned is
>> what
>> > > we
>> > > > >> are
>> > > > >> > > >> looking at here:
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > From what I understand, you want to distinguish scans
>> fired
>> > > by
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >> distributed scan.
>> > > > >> > > >> > I.e. group scans which were fired by single distributed
>> > scan.
>> > > > If
>> > > > >> > > that's
>> > > > >> > > >> what you want, distributed
>> > > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
>> > > attribute
>> > > > >> to
>> > > > >> > its
>> > > > >> > > >> value. This way we'll
>> > > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
>> > <number
>> > > of
>> > > > >> > > >> distributed scans invoked by
>> > > > >> > > >> > client side> and two scans on server side will have the
>> > same
>> > > > >> > > >> "sourceScan" attribute iff they
>> > > > >> > > >> > "belong" to same distributed scan.
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> Alex Baranau
>> > > > >> > > >> ----
>> > > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> -
>> > > > Hadoop
>> > > > >> -
>> > > > >> > > >> HBase
>> > > > >> > > >>
>> > > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <
>> yuzhihong@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> > > >>
>> > > > >> > > >>> Alex:
>> > > > >> > > >>> Your second patch looks good.
>> > > > >> > > >>> Can you remove the first version ?
>> > > > >> > > >>>
>> > > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
>> > > > supports
>> > > > >> > > >>> setAttribute() ?
>> > > > >> > > >>> If it does, can you encode start row and end row as
>> > > "sourceScan"
>> > > > >> > > >>> attribute ?
>> > > > >> > > >>>
>> > > > >> > > >>> One consideration is that start row or end row may be
>> quite
>> > > > long.
>> > > > >> > > >>> Ideally we should store hash code of source Scan object
>> as
>> > > > >> > "sourceScan"
>> > > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can
>> add
>> > > it,
>> > > > >> that
>> > > > >> > > would
>> > > > >> > > >>> require running all Scan related tests.
>> > > > >> > > >>>
>> > > > >> > > >>> What do you think ?
>> > > > >> > > >>>
>> > > > >> > > >>> Thanks
>> > > > >> > > >>>
>> > > > >> > > >>>
>> > > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
>> > > > >> > > alex.baranov.v@gmail.com>wrote:
>> > > > >> > > >>>
>> > > > >> > > >>>> Sorry for the delay in response (public holidays here).
>> > > > >> > > >>>>
>> > > > >> > > >>>> This depends on what info you are looking for on server
>> > side.
>> > > > >> > > >>>>
>> > > > >> > > >>>> From what I understand, you want to distinguish scans
>> fired
>> > > by
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >>>> distributed scan. I.e. group scans which were fired by
>> > single
>> > > > >> > > distributed
>> > > > >> > > >>>> scan. If that's what you want, distributed scan can
>> > generate
>> > > > >> unique
>> > > > >> > ID
>> > > > >> > > and
>> > > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
>> > we'll
>> > > > have
>> > > > >> <#
>> > > > >> > > of
>> > > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
>> > > > distributed
>> > > > >> > scans
>> > > > >> > > >>>> invoked by client side> and two scans on server side
>> will
>> > > have
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
>> > distributed
>> > > > >> scan.
>> > > > >> > > >>>>
>> > > > >> > > >>>> Is this what are you looking for?
>> > > > >> > > >>>>
>> > > > >> > > >>>> Alex Baranau
>> > > > >> > > >>>>
>> > > > >> > > >>>> P.S. attached patch for HBASE-3811<
>> > > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
>> > > > >> > > >>>> .
>> > > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
>> > > > >> > > >>>>
>> > > > >> > > >>>> ----
>> > > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
>> Nutch
>> > -
>> > > > >> Hadoop
>> > > > >> > -
>> > > > >> > > >>>> HBase
>> > > > >> > > >>>>
>> > > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
>> > yuzhihong@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> > > >>>>
>> > > > >> > > >>>>> Alex:
>> > > > >> > > >>>>> What type of identification should we put in the map of
>> > the
>> > > > Scan
>> > > > >> > > object
>> > > > >> > > >>>>> ?
>> > > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
>> > the
>> > > > user
>> > > > >> > can
>> > > > >> > > >>>>> use same distributor on multiple scans.
>> > > > >> > > >>>>>
>> > > > >> > > >>>>> Please share your thought.
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
>> > > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> Alex Baranau
>> > > > >> > > >>>>>> ----
>> > > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
>> > Nutch
>> > > -
>> > > > >> > Hadoop
>> > > > >> > > -
>> > > > >> > > >>>>>> HBase
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
>> > > yuzhihong@gmail.com
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> > My plan was to make regions that have active
>> scanners
>> > > more
>> > > > >> > stable
>> > > > >> > > -
>> > > > >> > > >>>>>> trying
>> > > > >> > > >>>>>> > not to move them when balancing.
>> > > > >> > > >>>>>> > I prefer second approach - adding custom
>> attribute(s)
>> > to
>> > > > Scan
>> > > > >> so
>> > > > >> > > >>>>>> that the
>> > > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > If you can file a JIRA, that would be great.
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
>> > > > >> > > >>>>>> alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > >wrote:
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
>> > just
>> > > > >> > > >>>>>> differently) when
>> > > > >> > > >>>>>> > > determining the load?
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > The current code looks like this:
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > class DistributedScanner:
>> > > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
>> > hTable,
>> > > > >> Scan
>> > > > >> > > >>>>>> original,
>> > > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
>> > > > >> IOException {
>> > > > >> > > >>>>>> > >    byte[][] startKeys =
>> > > > >> > > >>>>>> > >
>> > > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
>> > > > >> > > >>>>>> > >    byte[][] stopKeys =
>> > > > >> > > >>>>>> > >
>> > > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
>> > > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
>> > > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
>> > > > >> > > >>>>>> > >      scans[i] = new Scan(original);
>> > > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
>> > > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
>> > > > >> > > >>>>>> > >    }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > >    ResultScanner[] rss = new
>> > > > >> ResultScanner[startKeys.length];
>> > > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
>> > > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
>> > > > >> > > >>>>>> > >    }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > >    return new DistributedScanner(rss);
>> > > > >> > > >>>>>> > >  }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > This is client code. To make these scans
>> > "identifiable"
>> > > > we
>> > > > >> > need
>> > > > >> > > to
>> > > > >> > > >>>>>> either
>> > > > >> > > >>>>>> > > use some different (derived from Scan) class or
>> add
>> > > some
>> > > > >> > > attribute
>> > > > >> > > >>>>>> to
>> > > > >> > > >>>>>> > them.
>> > > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
>> > the
>> > > > >> former,
>> > > > >> > > but
>> > > > >> > > >>>>>> I
>> > > > >> > > >>>>>> > don't
>> > > > >> > > >>>>>> > > really like the idea of creating extra class (with
>> no
>> > > > extra
>> > > > >> > > >>>>>> > functionality)
>> > > > >> > > >>>>>> > > just to distinguish it from the base one.
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > If you can share why/how do you want to treat them
>> > > > >> differently
>> > > > >> > > on
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side, that would be helpful.
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > Alex Baranau
>> > > > >> > > >>>>>> > > ----
>> > > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene
>> -
>> > > > Nutch
>> > > > >> -
>> > > > >> > > >>>>>> Hadoop -
>> > > > >> > > >>>>>> > HBase
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
>> > > > >> yuzhihong@gmail.com>
>> > > > >> > > >>>>>> wrote:
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > > My request would be to make the distributed scan
>> > > > >> > identifiable
>> > > > >> > > >>>>>> from
>> > > > >> > > >>>>>> > server
>> > > > >> > > >>>>>> > > > side.
>> > > > >> > > >>>>>> > > > :-)
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
>> > > > >> > > >>>>>> > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > >wrote:
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
>> of
>> > > > >> regions
>> > > > >> > for
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > > > underlying
>> > > > >> > > >>>>>> > > > > > table.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
>> > holds
>> > > > data
>> > > > >> > for
>> > > > >> > > >>>>>> the whole
>> > > > >> > > >>>>>> > > > table
>> > > > >> > > >>>>>> > > > > (not many records in table yet), distributed
>> scan
>> > > > will
>> > > > >> > fire
>> > > > >> > > N
>> > > > >> > > >>>>>> scans
>> > > > >> > > >>>>>> > > > against
>> > > > >> > > >>>>>> > > > > the same region.
>> > > > >> > > >>>>>> > > > > On the other hand, in case there are huge
>> number
>> > of
>> > > > >> > regions
>> > > > >> > > >>>>>> for
>> > > > >> > > >>>>>> > single
>> > > > >> > > >>>>>> > > > > table, each scan can span over multiple
>> regions.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
>> > "distributed
>> > > > >> scan"
>> > > > >> > at
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > With current implementation "distributed" scan
>> > > won't
>> > > > be
>> > > > >> > > >>>>>> recognized as
>> > > > >> > > >>>>>> > > > > something special on the server side. It will
>> be
>> > an
>> > > > >> > ordinary
>> > > > >> > > >>>>>> scan.
>> > > > >> > > >>>>>> > > Though
>> > > > >> > > >>>>>> > > > > the number of scan will increase, given that
>> the
>> > > > >> typical
>> > > > >> > > >>>>>> situation is
>> > > > >> > > >>>>>> > > > "many
>> > > > >> > > >>>>>> > > > > regions for single table", the scans of the
>> same
>> > > > >> > > "distributed
>> > > > >> > > >>>>>> scan"
>> > > > >> > > >>>>>> > are
>> > > > >> > > >>>>>> > > > > likely not to hit the same region.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > Not sure if I answered your questions here.
>> Feel
>> > > free
>> > > > >> to
>> > > > >> > ask
>> > > > >> > > >>>>>> more ;)
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > ----
>> > > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
>> > Lucene
>> > > -
>> > > > >> Nutch
>> > > > >> > -
>> > > > >> > > >>>>>> Hadoop -
>> > > > >> > > >>>>>> > > > HBase
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
>> > > > >> > > yuzhihong@gmail.com>
>> > > > >> > > >>>>>> wrote:
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > > Alex:
>> > > > >> > > >>>>>> > > > > > If you read this, you would know why I
>> asked:
>> > > > >> > > >>>>>> > > > > >
>> > https://issues.apache.org/jira/browse/HBASE-3679
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
>> > "distributed
>> > > > >> scan"
>> > > > >> > at
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side.
>> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
>> of
>> > > > >> regions
>> > > > >> > for
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > > > underlying
>> > > > >> > > >>>>>> > > > > > table.
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > Cheers
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex
>> Baranau
>> > <
>> > > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > > > >wrote:
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > > Hi Ted,
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
>> > > where
>> > > > >> data
>> > > > >> > is
>> > > > >> > > >>>>>> consumed
>> > > > >> > > >>>>>> > > by
>> > > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
>> > > > >> performance
>> > > > >> > of
>> > > > >> > > >>>>>> pure
>> > > > >> > > >>>>>> > > > > "distributed
>> > > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
>> > > expect
>> > > > >> it
>> > > > >> > to
>> > > > >> > > be
>> > > > >> > > >>>>>> close
>> > > > >> > > >>>>>> > to
>> > > > >> > > >>>>>> > > > > > simple
>> > > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
>> > > faster
>> > > > >> > > >>>>>> depending on
>> > > > >> > > >>>>>> > your
>> > > > >> > > >>>>>> > > > > data
>> > > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
>> > > > timeseries
>> > > > >> > data
>> > > > >> > > >>>>>> > > (sequential)
>> > > > >> > > >>>>>> > > > > > which
>> > > > >> > > >>>>>> > > > > > > is written into the single region at a
>> time,
>> > > then
>> > > > >> e.g.
>> > > > >> > > if
>> > > > >> > > >>>>>> you
>> > > > >> > > >>>>>> > > access
>> > > > >> > > >>>>>> > > > > > delta
>> > > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if
>> from
>> > > not
>> > > > >> > single
>> > > > >> > > >>>>>> client)
>> > > > >> > > >>>>>> > > > these
>> > > > >> > > >>>>>> > > > > > > scans
>> > > > >> > > >>>>>> > > > > > > are likely to hit the same region or
>> couple
>> > of
>> > > > >> regions
>> > > > >> > > at
>> > > > >> > > >>>>>> a time,
>> > > > >> > > >>>>>> > > > which
>> > > > >> > > >>>>>> > > > > > may
>> > > > >> > > >>>>>> > > > > > > perform worse comparing to many scans
>> hitting
>> > > > data
>> > > > >> > that
>> > > > >> > > is
>> > > > >> > > >>>>>> much
>> > > > >> > > >>>>>> > > > better
>> > > > >> > > >>>>>> > > > > > > spread over region servers.
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should
>> not
>> > > > >> affect
>> > > > >> > > >>>>>> reading
>> > > > >> > > >>>>>> > > > > performance
>> > > > >> > > >>>>>> > > > > > at
>> > > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
>> > > times
>> > > > >> more
>> > > > >> > > >>>>>> splits and
>> > > > >> > > >>>>>> > > > hence
>> > > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
>> > > cases
>> > > > >> this
>> > > > >> > > even
>> > > > >> > > >>>>>> > improves
>> > > > >> > > >>>>>> > > > > > overall
>> > > > >> > > >>>>>> > > > > > > performance of the MR job since work is
>> > better
>> > > > >> > > distributed
>> > > > >> > > >>>>>> over
>> > > > >> > > >>>>>> > > > cluster
>> > > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
>> > > constantly
>> > > > >> > process
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > coming
>> > > > >> > > >>>>>> > > > > delta
>> > > > >> > > >>>>>> > > > > > > which usually resides in one or just
>> couple
>> > of
>> > > > >> regions
>> > > > >> > > >>>>>> depending
>> > > > >> > > >>>>>> > on
>> > > > >> > > >>>>>> > > > > > > processing frequency).
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > If you can share details on your case,
>> that
>> > > will
>> > > > >> help
>> > > > >> > to
>> > > > >> > > >>>>>> > understand
>> > > > >> > > >>>>>> > > > > what
>> > > > >> > > >>>>>> > > > > > > effect(s) to expect from using this
>> approach.
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > > > ----
>> > > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr
>> -
>> > > > Lucene
>> > > > >> -
>> > > > >> > > Nutch
>> > > > >> > > >>>>>> -
>> > > > >> > > >>>>>> > Hadoop
>> > > > >> > > >>>>>> > > -
>> > > > >> > > >>>>>> > > > > > HBase
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
>> > > > >> > > >>>>>> yuzhihong@gmail.com>
>> > > > >> > > >>>>>> > > wrote:
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
>> > > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
>> > compared
>> > > > to
>> > > > >> one
>> > > > >> > > >>>>>> scanner
>> > > > >> > > >>>>>> > > > > > originally,
>> > > > >> > > >>>>>> > > > > > > > have you performed load testing to see
>> the
>> > > > impact
>> > > > >> ?
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > Thanks
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
>> > > Baranau
>> > > > <
>> > > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > > > > > >wrote:
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Hello guys,
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
>> > > > >> project/lib
>> > > > >> > > >>>>>> around
>> > > > >> > > >>>>>> > > HBase:
>> > > > >> > > >>>>>> > > > > > > HBaseWD.
>> > > > >> > > >>>>>> > > > > > > > > It
>> > > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of
>> the
>> > > > load
>> > > > >> > > (across
>> > > > >> > > >>>>>> > > > > regionservers)
>> > > > >> > > >>>>>> > > > > > > > when
>> > > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
>> > key
>> > > > >> nature)
>> > > > >> > > >>>>>> records.
>> > > > >> > > >>>>>> > It
>> > > > >> > > >>>>>> > > > > > > implements
>> > > > >> > > >>>>>> > > > > > > > > the solution which was discussed
>> several
>> > > > times
>> > > > >> on
>> > > > >> > > this
>> > > > >> > > >>>>>> > mailing
>> > > > >> > > >>>>>> > > > list
>> > > > >> > > >>>>>> > > > > > > (e.g.
>> > > > >> > > >>>>>> > > > > > > > > here:
>> > > http://search-hadoop.com/m/gNRA82No5Wk
>> > > > ).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please find the sources at
>> > > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
>> > > > >> > > >>>>>> > > > > > > > > also
>> > > > >> > > >>>>>> > > > > > > > > a jar of current version for
>> > convenience).
>> > > It
>> > > > >> is
>> > > > >> > > very
>> > > > >> > > >>>>>> easy to
>> > > > >> > > >>>>>> > > > make
>> > > > >> > > >>>>>> > > > > > use
>> > > > >> > > >>>>>> > > > > > > of
>> > > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
>> > project
>> > > > >> with
>> > > > >> > 1+2
>> > > > >> > > >>>>>> lines of
>> > > > >> > > >>>>>> > > > code
>> > > > >> > > >>>>>> > > > > > (one
>> > > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
>> > > configuring
>> > > > >> > > MapReduce
>> > > > >> > > >>>>>> job).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please find below the short intro to
>> the
>> > > lib
>> > > > >> [1].
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > > > > > ----
>> > > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ ::
>> Solr
>> > -
>> > > > >> Lucene
>> > > > >> > -
>> > > > >> > > >>>>>> Nutch -
>> > > > >> > > >>>>>> > > > Hadoop
>> > > > >> > > >>>>>> > > > > -
>> > > > >> > > >>>>>> > > > > > > > HBase
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > [1]
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Description:
>> > > > >> > > >>>>>> > > > > > > > > ------------
>> > > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
>> > > (sequential)
>> > > > >> > Writes.
>> > > > >> > > >>>>>> It was
>> > > > >> > > >>>>>> > > > > inspired
>> > > > >> > > >>>>>> > > > > > by
>> > > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists
>> around
>> > > the
>> > > > >> > > problem
>> > > > >> > > >>>>>> of
>> > > > >> > > >>>>>> > > choosing
>> > > > >> > > >>>>>> > > > > > > > between:
>> > > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
>> > keys
>> > > > >> (e.g.
>> > > > >> > > >>>>>> time-series
>> > > > >> > > >>>>>> > > data
>> > > > >> > > >>>>>> > > > > > with
>> > > > >> > > >>>>>> > > > > > > > row
>> > > > >> > > >>>>>> > > > > > > > > key
>> > > > >> > > >>>>>> > > > > > > > >  built based on ts)
>> > > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > First approach makes possible to
>> perform
>> > > fast
>> > > > >> > range
>> > > > >> > > >>>>>> scans
>> > > > >> > > >>>>>> > with
>> > > > >> > > >>>>>> > > > help
>> > > > >> > > >>>>>> > > > > > of
>> > > > >> > > >>>>>> > > > > > > > > setting
>> > > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but
>> creates
>> > > > single
>> > > > >> > > region
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > > > > > hot-spotting
>> > > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys
>> go
>> > > in
>> > > > >> > > sequence
>> > > > >> > > >>>>>> all
>> > > > >> > > >>>>>> > > records
>> > > > >> > > >>>>>> > > > > end
>> > > > >> > > >>>>>> > > > > > > up
>> > > > >> > > >>>>>> > > > > > > > > written into a single region at a
>> time).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest
>> writing
>> > > > >> > performance
>> > > > >> > > >>>>>> by
>> > > > >> > > >>>>>> > > > > distributing
>> > > > >> > > >>>>>> > > > > > > new
>> > > > >> > > >>>>>> > > > > > > > > records over random regions but makes
>> not
>> > > > >> possible
>> > > > >> > > >>>>>> doing fast
>> > > > >> > > >>>>>> > > > range
>> > > > >> > > >>>>>> > > > > > > scans
>> > > > >> > > >>>>>> > > > > > > > > against written data.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
>> > middle
>> > > of
>> > > > >> the
>> > > > >> > > two
>> > > > >> > > >>>>>> above
>> > > > >> > > >>>>>> > and
>> > > > >> > > >>>>>> > > > > > proved
>> > > > >> > > >>>>>> > > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > perform well by distributing records
>> over
>> > > the
>> > > > >> > > cluster
>> > > > >> > > >>>>>> during
>> > > > >> > > >>>>>> > > > > writing
>> > > > >> > > >>>>>> > > > > > > data
>> > > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
>> > HBaseWD
>> > > > >> > provides
>> > > > >> > > >>>>>> very
>> > > > >> > > >>>>>> > > simple
>> > > > >> > > >>>>>> > > > > API
>> > > > >> > > >>>>>> > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to
>> use
>> > > with
>> > > > >> > > existing
>> > > > >> > > >>>>>> code.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib
>> usage
>> > > info
>> > > > >> as
>> > > > >> > > they
>> > > > >> > > >>>>>> aimed
>> > > > >> > > >>>>>> > to
>> > > > >> > > >>>>>> > > > act
>> > > > >> > > >>>>>> > > > > as
>> > > > >> > > >>>>>> > > > > > > > > example.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
>> > > > >> > > >>>>>> > > > > > > > > ----------------------------
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Distributing records with sequential
>> keys
>> > > > which
>> > > > >> > are
>> > > > >> > > >>>>>> being
>> > > > >> > > >>>>>> > > written
>> > > > >> > > >>>>>> > > > > in
>> > > > >> > > >>>>>> > > > > > up
>> > > > >> > > >>>>>> > > > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
>> > > > >> distributing
>> > > > >> > > into
>> > > > >> > > >>>>>> 32
>> > > > >> > > >>>>>> > > buckets
>> > > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
>> > > > >> > > >>>>>> > > > > > > > >                           new
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
>> > > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
>> > > > >> > > >>>>>> > > > > > > > >      Put put = new
>> > > > >> > > >>>>>> > > > > >
>> > > Put(keyDistributor.getDistributedKey(originalKey));
>> > > > >> > > >>>>>> > > > > > > > >      ... // add values
>> > > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Performing a range scan over written
>> data
>> > > > >> > > (internally
>> > > > >> > > >>>>>> > > > > <bucketsCount>
>> > > > >> > > >>>>>> > > > > > > > > scanners
>> > > > >> > > >>>>>> > > > > > > > > executed):
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
>> > stopKey);
>> > > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
>> > > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
>> > > > >> > > >>>>>> > > > > > > > > keyDistributor);
>> > > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
>> > > > >> > > >>>>>> > > > > > > > >      ...
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
>> > data
>> > > > >> chunk
>> > > > >> > > >>>>>> specified by
>> > > > >> > > >>>>>> > > > Scan:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Configuration conf =
>> > > > >> > HBaseConfiguration.create();
>> > > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
>> > > > "testMapreduceJob");
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
>> > stopKey);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >>  TableMapReduceUtil.initTableMapperJob("table",
>> > > > >> > > >>>>>> scan,
>> > > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
>> > > > >> > > >>>>>> ImmutableBytesWritable.class,
>> > > > >> > > >>>>>> > > > > > > Result.class,
>> > > > >> > > >>>>>> > > > > > > > > job);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    // Substituting standard
>> > > TableInputFormat
>> > > > >> which
>> > > > >> > > was
>> > > > >> > > >>>>>> set in
>> > > > >> > > >>>>>> > > > > > > > >    //
>> > > > >> TableMapReduceUtil.initTableMapperJob(...)
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >>  keyDistributor.addInfo(job.getConfiguration());
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing
>> Patterns:
>> > > > >> > > >>>>>> > > > > > > > >
>> -----------------------------------------
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and
>> to
>> > > > >> support
>> > > > >> > > >>>>>> custom row
>> > > > >> > > >>>>>> > > key
>> > > > >> > > >>>>>> > > > > > > > > distribution
>> > > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
>> > > > >> distributing
>> > > > >> > > >>>>>> logic just
>> > > > >> > > >>>>>> > > > > > implement
>> > > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract
>> class
>> > > > which
>> > > > >> is
>> > > > >> > > >>>>>> really very
>> > > > >> > > >>>>>> > > > > simple:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    public abstract class
>> > > > >> AbstractRowKeyDistributor
>> > > > >> > > >>>>>> implements
>> > > > >> > > >>>>>> > > > > > > > > Parametrizable {
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
>> > > > >> > getDistributedKey(byte[]
>> > > > >> > > >>>>>> > > > originalKey);
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
>> > > > >> getOriginalKey(byte[]
>> > > > >> > > >>>>>> > adjustedKey);
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
>> > > > >> > > >>>>>> getAllDistributedKeys(byte[]
>> > > > >> > > >>>>>> > > > > > > originalKey);
>> > > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>
>> > > > >> > > >>>
>> > > > >> > > >>
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Awesome, I'm going to check it out and use it today. Thank you :)

On Thu, May 19, 2011 at 8:14 AM, Alex Baranau <al...@gmail.com>wrote:

> Implemented RowKeyDistributorByHashPrefix. From README:
>
> Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
> see
> example below. It creates "distributed key" based on original key value
>
> so that later when you have original key and want to update the record you
> can
> calculate distributed key without roundtrip to HBase.
>
> AbstractRowKeyDistributor keyDistributor =
>         new RowKeyDistributorByHashPrefix(
>                   new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));
>
> You can use your own hashing logic here by implementing simple interface:
>
> public static interface Hasher extends Parametrizable {
>   byte[] getHashPrefix(byte[] originalKey);
>   byte[][] getAllPossiblePrefixes();
> }
>
>
> OneByteSimpleHash implements very simple hash algorythm: simple sum of all
> bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
> can use buckets count # up to 255. Please, use wisely, as (the same thing as
> with byOneByte prefix) Disctributed scanner will instantiate this number of
> scans under the hood.
>
> With this row key hash-based distributor, you can find out the distributed
> key (and use it to update the record) without roundtrip to HBase. From
> unit-test:
>
>     // Testing simple get
>     byte[] originalKey = new byte[] {123, 124, 122};
>
>     Put put = new Put(keyDistributor.getDistributedKey(originalKey));
>     put.add(CF, QUAL, Bytes.toBytes("some"));
>     hTable.put(put);
>
>     byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
>     Result result = hTable.get(new Get(distributedKey));
>     Assert.assertArrayEquals(originalKey,
> keyDistributor.getOriginalKey(result.getRow()));
>     Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
> QUAL));
>
>
> NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
> (downloadable from https://github.com/sematext/HBaseWD)
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> P.S.
> > Can you summarize HBaseWD in your blog
> That is on my todo list! You pushed it higher to the top priority items ;)
>
>
> On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <we...@gmail.com>wrote:
>
>> I have another question about option 2. It seems like I need to handle the
>> distributed scan differently to read from start row to end row, assuming 1
>> byte hash of the original key is used as prefix since the order of the
>> original key range is different from the resulting distributed key range.
>>
>> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Alex:
>> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>> >
>> > Thanks
>> >
>> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
>> > >wrote:
>> >
>> > > There are several options here. E.g.:
>> > >
>> > > 1) Given that you have "original key" of the record, you can fetch the
>> > > stored record key from HBase and use it to create Put with updated (or
>> > new)
>> > > cells.
>> > >
>> > > Currently you'll need to use distributes scan for that, there's not
>> > > analogue
>> > > for Get operation yet (see
>> https://github.com/sematext/HBaseWD/issues/1
>> > ).
>> > >
>> > > Note: you need to first find out the real key of stored record by
>> > fetching
>> > > data from HBase in case you use included in current lib
>> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>> > >
>> > > 2) You can create your own RowKeyDistributor implementation which will
>> > > create "distributed key" based on original key value so that later
>> when
>> > you
>> > > have original key and want to update the record you can calculate
>> > > distributed key without roundtrip to HBase.
>> > >
>> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte
>> hash
>> > of
>> > > original key (https://github.com/sematext/HBaseWD/issues/2).
>> > >
>> > >
>> > >
>> > > In either way you don't need to delete record to update some cells of
>> it
>> > or
>> > > add new cells.
>> > >
>> > > Please let me know if you have more Qs!
>> > >
>> > > Alex Baranau
>> > > ----
>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> > HBase
>> > >
>> > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
>> > > wrote:
>> > >
>> > > > I have another question. For overwriting, do I need to delete the
>> > > existing
>> > > > one before re-writing it?
>> > > >
>> > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <
>> weishung@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
>> > > > >
>> > > > >
>> > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
>> > > alex.baranov.v@gmail.com
>> > > > >wrote:
>> > > > >
>> > > > >> Thanks for the interest!
>> > > > >>
>> > > > >> We are using it in production. It is simple and hence quite
>> stable.
>> > > > Though
>> > > > >> some minor pieces are missing (like
>> > > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't
>> affect
>> > > > >> stability
>> > > > >> and/or major functionality.
>> > > > >>
>> > > > >> Alex Baranau
>> > > > >> ----
>> > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>> Hadoop
>> > -
>> > > > >> HBase
>> > > > >>
>> > > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
>> > weishung@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >> > What's the status on this package? Is it mature enough?
>> > > > >> >  I am using it in my project, tried out the write method
>> yesterday
>> > > and
>> > > > >> > going
>> > > > >> > to incorporate into read method tomorrow.
>> > > > >> >
>> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
>> > > > alex.baranov.v@gmail.com
>> > > > >> > >wrote:
>> > > > >> >
>> > > > >> > > > The start/end rows may be written twice.
>> > > > >> > >
>> > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
>> > > > "bearable"
>> > > > >> in
>> > > > >> > > attribute value no matter how long are they (keys), since we
>> > > already
>> > > > >> OK
>> > > > >> > > with
>> > > > >> > > transferring them initially (i.e. we should be OK with
>> > > transferring
>> > > > 2x
>> > > > >> > > times
>> > > > >> > > more).
>> > > > >> > >
>> > > > >> > > So, what about the suggestion of sourceScan attribute value I
>> > > > >> mentioned?
>> > > > >> > If
>> > > > >> > > you can tell why it isn't sufficient in your case, I'd have
>> more
>> > > > info
>> > > > >> to
>> > > > >> > > think about better suggestion ;)
>> > > > >> > >
>> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
>> > > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > > >> > > >?
>> > > > >> > >
>> > > > >> > > np. Can do that. Just thought that they (patches) can be
>> sorted
>> > by
>> > > > >> date
>> > > > >> > to
>> > > > >> > > find out the final one (aka "convention over naming-rules").
>> > > > >> > >
>> > > > >> > > Alex.
>> > > > >> > >
>> > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <
>> yuzhihong@gmail.com>
>> > > > wrote:
>> > > > >> > >
>> > > > >> > > > >> Though it might be ok, since we anyways "transfer"
>> > start/stop
>> > > > >> rows
>> > > > >> > > with
>> > > > >> > > > Scan object.
>> > > > >> > > > In write() method, we now have:
>> > > > >> > > >     Bytes.writeByteArray(out, this.startRow);
>> > > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
>> > > > >> > > > ...
>> > > > >> > > >       for (Map.Entry<String, byte[]> attr :
>> > > > >> this.attributes.entrySet())
>> > > > >> > {
>> > > > >> > > >         WritableUtils.writeString(out, attr.getKey());
>> > > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
>> > > > >> > > >       }
>> > > > >> > > > The start/end rows may be written twice.
>> > > > >> > > >
>> > > > >> > > > Of course, you have full control over how to generate the
>> > unique
>> > > > ID
>> > > > >> for
>> > > > >> > > > "sourceScan" attribute.
>> > > > >> > > >
>> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
>> > Maybe
>> > > > the
>> > > > >> > > second
>> > > > >> > > > should be named HBASE-3811-v2.patch<
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > > >> > > >?
>> > > > >> > > >
>> > > > >> > > > Thanks
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
>> > > > >> > alex.baranov.v@gmail.com
>> > > > >> > > >wrote:
>> > > > >> > > >
>> > > > >> > > >> > Can you remove the first version ?
>> > > > >> > > >> Isn't it ok to keep it in JIRA issue?
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > In HBaseWD, can you use reflection to detect whether
>> Scan
>> > > > >> supports
>> > > > >> > > >> setAttribute() ?
>> > > > >> > > >> > If it does, can you encode start row and end row as
>> > > > "sourceScan"
>> > > > >> > > >> attribute ?
>> > > > >> > > >>
>> > > > >> > > >> Yeah, smth like this is going to be implemented. Though
>> I'd
>> > > still
>> > > > >> want
>> > > > >> > > to
>> > > > >> > > >> hear from the devs the story about Scan version.
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > One consideration is that start row or end row may be
>> quite
>> > > > long.
>> > > > >> > > >>
>> > > > >> > > >> Yeah, that is was my though too at first. Though it might
>> be
>> > > ok,
>> > > > >> since
>> > > > >> > > we
>> > > > >> > > >> anyways "transfer" start/stop rows with Scan object.
>> > > > >> > > >>
>> > > > >> > > >> > What do you think ?
>> > > > >> > > >>
>> > > > >> > > >> I'd love to hear from you is this variant I mentioned is
>> what
>> > > we
>> > > > >> are
>> > > > >> > > >> looking at here:
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> > From what I understand, you want to distinguish scans
>> fired
>> > > by
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >> distributed scan.
>> > > > >> > > >> > I.e. group scans which were fired by single distributed
>> > scan.
>> > > > If
>> > > > >> > > that's
>> > > > >> > > >> what you want, distributed
>> > > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
>> > > attribute
>> > > > >> to
>> > > > >> > its
>> > > > >> > > >> value. This way we'll
>> > > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
>> > <number
>> > > of
>> > > > >> > > >> distributed scans invoked by
>> > > > >> > > >> > client side> and two scans on server side will have the
>> > same
>> > > > >> > > >> "sourceScan" attribute iff they
>> > > > >> > > >> > "belong" to same distributed scan.
>> > > > >> > > >>
>> > > > >> > > >>
>> > > > >> > > >> Alex Baranau
>> > > > >> > > >> ----
>> > > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> -
>> > > > Hadoop
>> > > > >> -
>> > > > >> > > >> HBase
>> > > > >> > > >>
>> > > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <
>> yuzhihong@gmail.com
>> > >
>> > > > >> wrote:
>> > > > >> > > >>
>> > > > >> > > >>> Alex:
>> > > > >> > > >>> Your second patch looks good.
>> > > > >> > > >>> Can you remove the first version ?
>> > > > >> > > >>>
>> > > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
>> > > > supports
>> > > > >> > > >>> setAttribute() ?
>> > > > >> > > >>> If it does, can you encode start row and end row as
>> > > "sourceScan"
>> > > > >> > > >>> attribute ?
>> > > > >> > > >>>
>> > > > >> > > >>> One consideration is that start row or end row may be
>> quite
>> > > > long.
>> > > > >> > > >>> Ideally we should store hash code of source Scan object
>> as
>> > > > >> > "sourceScan"
>> > > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can
>> add
>> > > it,
>> > > > >> that
>> > > > >> > > would
>> > > > >> > > >>> require running all Scan related tests.
>> > > > >> > > >>>
>> > > > >> > > >>> What do you think ?
>> > > > >> > > >>>
>> > > > >> > > >>> Thanks
>> > > > >> > > >>>
>> > > > >> > > >>>
>> > > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
>> > > > >> > > alex.baranov.v@gmail.com>wrote:
>> > > > >> > > >>>
>> > > > >> > > >>>> Sorry for the delay in response (public holidays here).
>> > > > >> > > >>>>
>> > > > >> > > >>>> This depends on what info you are looking for on server
>> > side.
>> > > > >> > > >>>>
>> > > > >> > > >>>> From what I understand, you want to distinguish scans
>> fired
>> > > by
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >>>> distributed scan. I.e. group scans which were fired by
>> > single
>> > > > >> > > distributed
>> > > > >> > > >>>> scan. If that's what you want, distributed scan can
>> > generate
>> > > > >> unique
>> > > > >> > ID
>> > > > >> > > and
>> > > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
>> > we'll
>> > > > have
>> > > > >> <#
>> > > > >> > > of
>> > > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
>> > > > distributed
>> > > > >> > scans
>> > > > >> > > >>>> invoked by client side> and two scans on server side
>> will
>> > > have
>> > > > >> the
>> > > > >> > > same
>> > > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
>> > distributed
>> > > > >> scan.
>> > > > >> > > >>>>
>> > > > >> > > >>>> Is this what are you looking for?
>> > > > >> > > >>>>
>> > > > >> > > >>>> Alex Baranau
>> > > > >> > > >>>>
>> > > > >> > > >>>> P.S. attached patch for HBASE-3811<
>> > > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
>> > > > >> > > >>>> .
>> > > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
>> > > > >> > > >>>>
>> > > > >> > > >>>> ----
>> > > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
>> Nutch
>> > -
>> > > > >> Hadoop
>> > > > >> > -
>> > > > >> > > >>>> HBase
>> > > > >> > > >>>>
>> > > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
>> > yuzhihong@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> > > >>>>
>> > > > >> > > >>>>> Alex:
>> > > > >> > > >>>>> What type of identification should we put in the map of
>> > the
>> > > > Scan
>> > > > >> > > object
>> > > > >> > > >>>>> ?
>> > > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
>> > the
>> > > > user
>> > > > >> > can
>> > > > >> > > >>>>> use same distributor on multiple scans.
>> > > > >> > > >>>>>
>> > > > >> > > >>>>> Please share your thought.
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
>> > > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> Alex Baranau
>> > > > >> > > >>>>>> ----
>> > > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
>> > Nutch
>> > > -
>> > > > >> > Hadoop
>> > > > >> > > -
>> > > > >> > > >>>>>> HBase
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
>> > > yuzhihong@gmail.com
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>> > My plan was to make regions that have active
>> scanners
>> > > more
>> > > > >> > stable
>> > > > >> > > -
>> > > > >> > > >>>>>> trying
>> > > > >> > > >>>>>> > not to move them when balancing.
>> > > > >> > > >>>>>> > I prefer second approach - adding custom
>> attribute(s)
>> > to
>> > > > Scan
>> > > > >> so
>> > > > >> > > >>>>>> that the
>> > > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > If you can file a JIRA, that would be great.
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
>> > > > >> > > >>>>>> alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > >wrote:
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
>> > just
>> > > > >> > > >>>>>> differently) when
>> > > > >> > > >>>>>> > > determining the load?
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > The current code looks like this:
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > class DistributedScanner:
>> > > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
>> > hTable,
>> > > > >> Scan
>> > > > >> > > >>>>>> original,
>> > > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
>> > > > >> IOException {
>> > > > >> > > >>>>>> > >    byte[][] startKeys =
>> > > > >> > > >>>>>> > >
>> > > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
>> > > > >> > > >>>>>> > >    byte[][] stopKeys =
>> > > > >> > > >>>>>> > >
>> > > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
>> > > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
>> > > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
>> > > > >> > > >>>>>> > >      scans[i] = new Scan(original);
>> > > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
>> > > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
>> > > > >> > > >>>>>> > >    }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > >    ResultScanner[] rss = new
>> > > > >> ResultScanner[startKeys.length];
>> > > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
>> > > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
>> > > > >> > > >>>>>> > >    }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > >    return new DistributedScanner(rss);
>> > > > >> > > >>>>>> > >  }
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > This is client code. To make these scans
>> > "identifiable"
>> > > > we
>> > > > >> > need
>> > > > >> > > to
>> > > > >> > > >>>>>> either
>> > > > >> > > >>>>>> > > use some different (derived from Scan) class or
>> add
>> > > some
>> > > > >> > > attribute
>> > > > >> > > >>>>>> to
>> > > > >> > > >>>>>> > them.
>> > > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
>> > the
>> > > > >> former,
>> > > > >> > > but
>> > > > >> > > >>>>>> I
>> > > > >> > > >>>>>> > don't
>> > > > >> > > >>>>>> > > really like the idea of creating extra class (with
>> no
>> > > > extra
>> > > > >> > > >>>>>> > functionality)
>> > > > >> > > >>>>>> > > just to distinguish it from the base one.
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > If you can share why/how do you want to treat them
>> > > > >> differently
>> > > > >> > > on
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side, that would be helpful.
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > Alex Baranau
>> > > > >> > > >>>>>> > > ----
>> > > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene
>> -
>> > > > Nutch
>> > > > >> -
>> > > > >> > > >>>>>> Hadoop -
>> > > > >> > > >>>>>> > HBase
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
>> > > > >> yuzhihong@gmail.com>
>> > > > >> > > >>>>>> wrote:
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> > > > My request would be to make the distributed scan
>> > > > >> > identifiable
>> > > > >> > > >>>>>> from
>> > > > >> > > >>>>>> > server
>> > > > >> > > >>>>>> > > > side.
>> > > > >> > > >>>>>> > > > :-)
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
>> > > > >> > > >>>>>> > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > >wrote:
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
>> of
>> > > > >> regions
>> > > > >> > for
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > > > underlying
>> > > > >> > > >>>>>> > > > > > table.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
>> > holds
>> > > > data
>> > > > >> > for
>> > > > >> > > >>>>>> the whole
>> > > > >> > > >>>>>> > > > table
>> > > > >> > > >>>>>> > > > > (not many records in table yet), distributed
>> scan
>> > > > will
>> > > > >> > fire
>> > > > >> > > N
>> > > > >> > > >>>>>> scans
>> > > > >> > > >>>>>> > > > against
>> > > > >> > > >>>>>> > > > > the same region.
>> > > > >> > > >>>>>> > > > > On the other hand, in case there are huge
>> number
>> > of
>> > > > >> > regions
>> > > > >> > > >>>>>> for
>> > > > >> > > >>>>>> > single
>> > > > >> > > >>>>>> > > > > table, each scan can span over multiple
>> regions.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
>> > "distributed
>> > > > >> scan"
>> > > > >> > at
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > With current implementation "distributed" scan
>> > > won't
>> > > > be
>> > > > >> > > >>>>>> recognized as
>> > > > >> > > >>>>>> > > > > something special on the server side. It will
>> be
>> > an
>> > > > >> > ordinary
>> > > > >> > > >>>>>> scan.
>> > > > >> > > >>>>>> > > Though
>> > > > >> > > >>>>>> > > > > the number of scan will increase, given that
>> the
>> > > > >> typical
>> > > > >> > > >>>>>> situation is
>> > > > >> > > >>>>>> > > > "many
>> > > > >> > > >>>>>> > > > > regions for single table", the scans of the
>> same
>> > > > >> > > "distributed
>> > > > >> > > >>>>>> scan"
>> > > > >> > > >>>>>> > are
>> > > > >> > > >>>>>> > > > > likely not to hit the same region.
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > Not sure if I answered your questions here.
>> Feel
>> > > free
>> > > > >> to
>> > > > >> > ask
>> > > > >> > > >>>>>> more ;)
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > ----
>> > > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
>> > Lucene
>> > > -
>> > > > >> Nutch
>> > > > >> > -
>> > > > >> > > >>>>>> Hadoop -
>> > > > >> > > >>>>>> > > > HBase
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
>> > > > >> > > yuzhihong@gmail.com>
>> > > > >> > > >>>>>> wrote:
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > > > > Alex:
>> > > > >> > > >>>>>> > > > > > If you read this, you would know why I
>> asked:
>> > > > >> > > >>>>>> > > > > >
>> > https://issues.apache.org/jira/browse/HBASE-3679
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
>> > "distributed
>> > > > >> scan"
>> > > > >> > at
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > side.
>> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
>> of
>> > > > >> regions
>> > > > >> > for
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > > > underlying
>> > > > >> > > >>>>>> > > > > > table.
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > Cheers
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex
>> Baranau
>> > <
>> > > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > > > >wrote:
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > > > > Hi Ted,
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
>> > > where
>> > > > >> data
>> > > > >> > is
>> > > > >> > > >>>>>> consumed
>> > > > >> > > >>>>>> > > by
>> > > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
>> > > > >> performance
>> > > > >> > of
>> > > > >> > > >>>>>> pure
>> > > > >> > > >>>>>> > > > > "distributed
>> > > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
>> > > expect
>> > > > >> it
>> > > > >> > to
>> > > > >> > > be
>> > > > >> > > >>>>>> close
>> > > > >> > > >>>>>> > to
>> > > > >> > > >>>>>> > > > > > simple
>> > > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
>> > > faster
>> > > > >> > > >>>>>> depending on
>> > > > >> > > >>>>>> > your
>> > > > >> > > >>>>>> > > > > data
>> > > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
>> > > > timeseries
>> > > > >> > data
>> > > > >> > > >>>>>> > > (sequential)
>> > > > >> > > >>>>>> > > > > > which
>> > > > >> > > >>>>>> > > > > > > is written into the single region at a
>> time,
>> > > then
>> > > > >> e.g.
>> > > > >> > > if
>> > > > >> > > >>>>>> you
>> > > > >> > > >>>>>> > > access
>> > > > >> > > >>>>>> > > > > > delta
>> > > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if
>> from
>> > > not
>> > > > >> > single
>> > > > >> > > >>>>>> client)
>> > > > >> > > >>>>>> > > > these
>> > > > >> > > >>>>>> > > > > > > scans
>> > > > >> > > >>>>>> > > > > > > are likely to hit the same region or
>> couple
>> > of
>> > > > >> regions
>> > > > >> > > at
>> > > > >> > > >>>>>> a time,
>> > > > >> > > >>>>>> > > > which
>> > > > >> > > >>>>>> > > > > > may
>> > > > >> > > >>>>>> > > > > > > perform worse comparing to many scans
>> hitting
>> > > > data
>> > > > >> > that
>> > > > >> > > is
>> > > > >> > > >>>>>> much
>> > > > >> > > >>>>>> > > > better
>> > > > >> > > >>>>>> > > > > > > spread over region servers.
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should
>> not
>> > > > >> affect
>> > > > >> > > >>>>>> reading
>> > > > >> > > >>>>>> > > > > performance
>> > > > >> > > >>>>>> > > > > > at
>> > > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
>> > > times
>> > > > >> more
>> > > > >> > > >>>>>> splits and
>> > > > >> > > >>>>>> > > > hence
>> > > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
>> > > cases
>> > > > >> this
>> > > > >> > > even
>> > > > >> > > >>>>>> > improves
>> > > > >> > > >>>>>> > > > > > overall
>> > > > >> > > >>>>>> > > > > > > performance of the MR job since work is
>> > better
>> > > > >> > > distributed
>> > > > >> > > >>>>>> over
>> > > > >> > > >>>>>> > > > cluster
>> > > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
>> > > constantly
>> > > > >> > process
>> > > > >> > > >>>>>> the
>> > > > >> > > >>>>>> > coming
>> > > > >> > > >>>>>> > > > > delta
>> > > > >> > > >>>>>> > > > > > > which usually resides in one or just
>> couple
>> > of
>> > > > >> regions
>> > > > >> > > >>>>>> depending
>> > > > >> > > >>>>>> > on
>> > > > >> > > >>>>>> > > > > > > processing frequency).
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > If you can share details on your case,
>> that
>> > > will
>> > > > >> help
>> > > > >> > to
>> > > > >> > > >>>>>> > understand
>> > > > >> > > >>>>>> > > > > what
>> > > > >> > > >>>>>> > > > > > > effect(s) to expect from using this
>> approach.
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > > > ----
>> > > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr
>> -
>> > > > Lucene
>> > > > >> -
>> > > > >> > > Nutch
>> > > > >> > > >>>>>> -
>> > > > >> > > >>>>>> > Hadoop
>> > > > >> > > >>>>>> > > -
>> > > > >> > > >>>>>> > > > > > HBase
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
>> > > > >> > > >>>>>> yuzhihong@gmail.com>
>> > > > >> > > >>>>>> > > wrote:
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
>> > > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
>> > compared
>> > > > to
>> > > > >> one
>> > > > >> > > >>>>>> scanner
>> > > > >> > > >>>>>> > > > > > originally,
>> > > > >> > > >>>>>> > > > > > > > have you performed load testing to see
>> the
>> > > > impact
>> > > > >> ?
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > Thanks
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
>> > > Baranau
>> > > > <
>> > > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
>> > > > >> > > >>>>>> > > > > > > > >wrote:
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Hello guys,
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
>> > > > >> project/lib
>> > > > >> > > >>>>>> around
>> > > > >> > > >>>>>> > > HBase:
>> > > > >> > > >>>>>> > > > > > > HBaseWD.
>> > > > >> > > >>>>>> > > > > > > > > It
>> > > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of
>> the
>> > > > load
>> > > > >> > > (across
>> > > > >> > > >>>>>> > > > > regionservers)
>> > > > >> > > >>>>>> > > > > > > > when
>> > > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
>> > key
>> > > > >> nature)
>> > > > >> > > >>>>>> records.
>> > > > >> > > >>>>>> > It
>> > > > >> > > >>>>>> > > > > > > implements
>> > > > >> > > >>>>>> > > > > > > > > the solution which was discussed
>> several
>> > > > times
>> > > > >> on
>> > > > >> > > this
>> > > > >> > > >>>>>> > mailing
>> > > > >> > > >>>>>> > > > list
>> > > > >> > > >>>>>> > > > > > > (e.g.
>> > > > >> > > >>>>>> > > > > > > > > here:
>> > > http://search-hadoop.com/m/gNRA82No5Wk
>> > > > ).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please find the sources at
>> > > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
>> > > > >> > > >>>>>> > > > > > > > > also
>> > > > >> > > >>>>>> > > > > > > > > a jar of current version for
>> > convenience).
>> > > It
>> > > > >> is
>> > > > >> > > very
>> > > > >> > > >>>>>> easy to
>> > > > >> > > >>>>>> > > > make
>> > > > >> > > >>>>>> > > > > > use
>> > > > >> > > >>>>>> > > > > > > of
>> > > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
>> > project
>> > > > >> with
>> > > > >> > 1+2
>> > > > >> > > >>>>>> lines of
>> > > > >> > > >>>>>> > > > code
>> > > > >> > > >>>>>> > > > > > (one
>> > > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
>> > > configuring
>> > > > >> > > MapReduce
>> > > > >> > > >>>>>> job).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please find below the short intro to
>> the
>> > > lib
>> > > > >> [1].
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Alex Baranau
>> > > > >> > > >>>>>> > > > > > > > > ----
>> > > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ ::
>> Solr
>> > -
>> > > > >> Lucene
>> > > > >> > -
>> > > > >> > > >>>>>> Nutch -
>> > > > >> > > >>>>>> > > > Hadoop
>> > > > >> > > >>>>>> > > > > -
>> > > > >> > > >>>>>> > > > > > > > HBase
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > [1]
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Description:
>> > > > >> > > >>>>>> > > > > > > > > ------------
>> > > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
>> > > (sequential)
>> > > > >> > Writes.
>> > > > >> > > >>>>>> It was
>> > > > >> > > >>>>>> > > > > inspired
>> > > > >> > > >>>>>> > > > > > by
>> > > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists
>> around
>> > > the
>> > > > >> > > problem
>> > > > >> > > >>>>>> of
>> > > > >> > > >>>>>> > > choosing
>> > > > >> > > >>>>>> > > > > > > > between:
>> > > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
>> > keys
>> > > > >> (e.g.
>> > > > >> > > >>>>>> time-series
>> > > > >> > > >>>>>> > > data
>> > > > >> > > >>>>>> > > > > > with
>> > > > >> > > >>>>>> > > > > > > > row
>> > > > >> > > >>>>>> > > > > > > > > key
>> > > > >> > > >>>>>> > > > > > > > >  built based on ts)
>> > > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > First approach makes possible to
>> perform
>> > > fast
>> > > > >> > range
>> > > > >> > > >>>>>> scans
>> > > > >> > > >>>>>> > with
>> > > > >> > > >>>>>> > > > help
>> > > > >> > > >>>>>> > > > > > of
>> > > > >> > > >>>>>> > > > > > > > > setting
>> > > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but
>> creates
>> > > > single
>> > > > >> > > region
>> > > > >> > > >>>>>> server
>> > > > >> > > >>>>>> > > > > > > hot-spotting
>> > > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys
>> go
>> > > in
>> > > > >> > > sequence
>> > > > >> > > >>>>>> all
>> > > > >> > > >>>>>> > > records
>> > > > >> > > >>>>>> > > > > end
>> > > > >> > > >>>>>> > > > > > > up
>> > > > >> > > >>>>>> > > > > > > > > written into a single region at a
>> time).
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest
>> writing
>> > > > >> > performance
>> > > > >> > > >>>>>> by
>> > > > >> > > >>>>>> > > > > distributing
>> > > > >> > > >>>>>> > > > > > > new
>> > > > >> > > >>>>>> > > > > > > > > records over random regions but makes
>> not
>> > > > >> possible
>> > > > >> > > >>>>>> doing fast
>> > > > >> > > >>>>>> > > > range
>> > > > >> > > >>>>>> > > > > > > scans
>> > > > >> > > >>>>>> > > > > > > > > against written data.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
>> > middle
>> > > of
>> > > > >> the
>> > > > >> > > two
>> > > > >> > > >>>>>> above
>> > > > >> > > >>>>>> > and
>> > > > >> > > >>>>>> > > > > > proved
>> > > > >> > > >>>>>> > > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > perform well by distributing records
>> over
>> > > the
>> > > > >> > > cluster
>> > > > >> > > >>>>>> during
>> > > > >> > > >>>>>> > > > > writing
>> > > > >> > > >>>>>> > > > > > > data
>> > > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
>> > HBaseWD
>> > > > >> > provides
>> > > > >> > > >>>>>> very
>> > > > >> > > >>>>>> > > simple
>> > > > >> > > >>>>>> > > > > API
>> > > > >> > > >>>>>> > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to
>> use
>> > > with
>> > > > >> > > existing
>> > > > >> > > >>>>>> code.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib
>> usage
>> > > info
>> > > > >> as
>> > > > >> > > they
>> > > > >> > > >>>>>> aimed
>> > > > >> > > >>>>>> > to
>> > > > >> > > >>>>>> > > > act
>> > > > >> > > >>>>>> > > > > as
>> > > > >> > > >>>>>> > > > > > > > > example.
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
>> > > > >> > > >>>>>> > > > > > > > > ----------------------------
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Distributing records with sequential
>> keys
>> > > > which
>> > > > >> > are
>> > > > >> > > >>>>>> being
>> > > > >> > > >>>>>> > > written
>> > > > >> > > >>>>>> > > > > in
>> > > > >> > > >>>>>> > > > > > up
>> > > > >> > > >>>>>> > > > > > > > to
>> > > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
>> > > > >> distributing
>> > > > >> > > into
>> > > > >> > > >>>>>> 32
>> > > > >> > > >>>>>> > > buckets
>> > > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
>> > > > >> > > >>>>>> > > > > > > > >                           new
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
>> > > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
>> > > > >> > > >>>>>> > > > > > > > >      Put put = new
>> > > > >> > > >>>>>> > > > > >
>> > > Put(keyDistributor.getDistributedKey(originalKey));
>> > > > >> > > >>>>>> > > > > > > > >      ... // add values
>> > > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Performing a range scan over written
>> data
>> > > > >> > > (internally
>> > > > >> > > >>>>>> > > > > <bucketsCount>
>> > > > >> > > >>>>>> > > > > > > > > scanners
>> > > > >> > > >>>>>> > > > > > > > > executed):
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
>> > stopKey);
>> > > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
>> > > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
>> > > > >> > > >>>>>> > > > > > > > > keyDistributor);
>> > > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
>> > > > >> > > >>>>>> > > > > > > > >      ...
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
>> > data
>> > > > >> chunk
>> > > > >> > > >>>>>> specified by
>> > > > >> > > >>>>>> > > > Scan:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Configuration conf =
>> > > > >> > HBaseConfiguration.create();
>> > > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
>> > > > "testMapreduceJob");
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
>> > stopKey);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >>  TableMapReduceUtil.initTableMapperJob("table",
>> > > > >> > > >>>>>> scan,
>> > > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
>> > > > >> > > >>>>>> ImmutableBytesWritable.class,
>> > > > >> > > >>>>>> > > > > > > Result.class,
>> > > > >> > > >>>>>> > > > > > > > > job);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    // Substituting standard
>> > > TableInputFormat
>> > > > >> which
>> > > > >> > > was
>> > > > >> > > >>>>>> set in
>> > > > >> > > >>>>>> > > > > > > > >    //
>> > > > >> TableMapReduceUtil.initTableMapperJob(...)
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >>  keyDistributor.addInfo(job.getConfiguration());
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing
>> Patterns:
>> > > > >> > > >>>>>> > > > > > > > >
>> -----------------------------------------
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and
>> to
>> > > > >> support
>> > > > >> > > >>>>>> custom row
>> > > > >> > > >>>>>> > > key
>> > > > >> > > >>>>>> > > > > > > > > distribution
>> > > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
>> > > > >> distributing
>> > > > >> > > >>>>>> logic just
>> > > > >> > > >>>>>> > > > > > implement
>> > > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract
>> class
>> > > > which
>> > > > >> is
>> > > > >> > > >>>>>> really very
>> > > > >> > > >>>>>> > > > > simple:
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > > >    public abstract class
>> > > > >> AbstractRowKeyDistributor
>> > > > >> > > >>>>>> implements
>> > > > >> > > >>>>>> > > > > > > > > Parametrizable {
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
>> > > > >> > getDistributedKey(byte[]
>> > > > >> > > >>>>>> > > > originalKey);
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
>> > > > >> getOriginalKey(byte[]
>> > > > >> > > >>>>>> > adjustedKey);
>> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
>> > > > >> > > >>>>>> getAllDistributedKeys(byte[]
>> > > > >> > > >>>>>> > > > > > > originalKey);
>> > > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
>> > > > >> > > >>>>>> > > > > > > > >    }
>> > > > >> > > >>>>>> > > > > > > > >
>> > > > >> > > >>>>>> > > > > > > >
>> > > > >> > > >>>>>> > > > > > >
>> > > > >> > > >>>>>> > > > > >
>> > > > >> > > >>>>>> > > > >
>> > > > >> > > >>>>>> > > >
>> > > > >> > > >>>>>> > >
>> > > > >> > > >>>>>> >
>> > > > >> > > >>>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>>
>> > > > >> > > >>>>
>> > > > >> > > >>>
>> > > > >> > > >>
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

Implemented RowKeyDistributorByHashPrefix. From README:

Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
see
example below. It creates "distributed key" based on original key value
so that later when you have original key and want to update the record you
can
calculate distributed key without roundtrip to HBase.

AbstractRowKeyDistributor keyDistributor =
        new RowKeyDistributorByHashPrefix(
                  new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));

You can use your own hashing logic here by implementing simple interface:

public static interface Hasher extends Parametrizable {
  byte[] getHashPrefix(byte[] originalKey);
  byte[][] getAllPossiblePrefixes();
}


OneByteSimpleHash implements very simple hash algorythm: simple sum of all
bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
can use buckets count # up to 255. Please, use wisely, as (the same thing as
with byOneByte prefix) Disctributed scanner will instantiate this number of
scans under the hood.

With this row key hash-based distributor, you can find out the distributed
key (and use it to update the record) without roundtrip to HBase. From
unit-test:

    // Testing simple get
    byte[] originalKey = new byte[] {123, 124, 122};
    Put put = new Put(keyDistributor.getDistributedKey(originalKey));
    put.add(CF, QUAL, Bytes.toBytes("some"));
    hTable.put(put);

    byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
    Result result = hTable.get(new Get(distributedKey));
    Assert.assertArrayEquals(originalKey,
keyDistributor.getOriginalKey(result.getRow()));
    Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
QUAL));


NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
(downloadable from https://github.com/sematext/HBaseWD)

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

P.S.
> Can you summarize HBaseWD in your blog
That is on my todo list! You pushed it higher to the top priority items ;)

On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <we...@gmail.com> wrote:

> I have another question about option 2. It seems like I need to handle the
> distributed scan differently to read from start row to end row, assuming 1
> byte hash of the original key is used as prefix since the order of the
> original key range is different from the resulting distributed key range.
>
> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Alex:
> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
> >
> > Thanks
> >
> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > There are several options here. E.g.:
> > >
> > > 1) Given that you have "original key" of the record, you can fetch the
> > > stored record key from HBase and use it to create Put with updated (or
> > new)
> > > cells.
> > >
> > > Currently you'll need to use distributes scan for that, there's not
> > > analogue
> > > for Get operation yet (see
> https://github.com/sematext/HBaseWD/issues/1
> > ).
> > >
> > > Note: you need to first find out the real key of stored record by
> > fetching
> > > data from HBase in case you use included in current lib
> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
> > >
> > > 2) You can create your own RowKeyDistributor implementation which will
> > > create "distributed key" based on original key value so that later when
> > you
> > > have original key and want to update the record you can calculate
> > > distributed key without roundtrip to HBase.
> > >
> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte
> hash
> > of
> > > original key (https://github.com/sematext/HBaseWD/issues/2).
> > >
> > >
> > >
> > > In either way you don't need to delete record to update some cells of
> it
> > or
> > > add new cells.
> > >
> > > Please let me know if you have more Qs!
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> > > wrote:
> > >
> > > > I have another question. For overwriting, do I need to delete the
> > > existing
> > > > one before re-writing it?
> > > >
> > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <weishung@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > > > >
> > > > >
> > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Thanks for the interest!
> > > > >>
> > > > >> We are using it in production. It is simple and hence quite
> stable.
> > > > Though
> > > > >> some minor pieces are missing (like
> > > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > > > >> stability
> > > > >> and/or major functionality.
> > > > >>
> > > > >> Alex Baranau
> > > > >> ----
> > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > > > >> HBase
> > > > >>
> > > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
> > weishung@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > What's the status on this package? Is it mature enough?
> > > > >> >  I am using it in my project, tried out the write method
> yesterday
> > > and
> > > > >> > going
> > > > >> > to incorporate into read method tomorrow.
> > > > >> >
> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > > > alex.baranov.v@gmail.com
> > > > >> > >wrote:
> > > > >> >
> > > > >> > > > The start/end rows may be written twice.
> > > > >> > >
> > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > > > "bearable"
> > > > >> in
> > > > >> > > attribute value no matter how long are they (keys), since we
> > > already
> > > > >> OK
> > > > >> > > with
> > > > >> > > transferring them initially (i.e. we should be OK with
> > > transferring
> > > > 2x
> > > > >> > > times
> > > > >> > > more).
> > > > >> > >
> > > > >> > > So, what about the suggestion of sourceScan attribute value I
> > > > >> mentioned?
> > > > >> > If
> > > > >> > > you can tell why it isn't sufficient in your case, I'd have
> more
> > > > info
> > > > >> to
> > > > >> > > think about better suggestion ;)
> > > > >> > >
> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > > >> > > >?
> > > > >> > >
> > > > >> > > np. Can do that. Just thought that they (patches) can be
> sorted
> > by
> > > > >> date
> > > > >> > to
> > > > >> > > find out the final one (aka "convention over naming-rules").
> > > > >> > >
> > > > >> > > Alex.
> > > > >> > >
> > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > >> Though it might be ok, since we anyways "transfer"
> > start/stop
> > > > >> rows
> > > > >> > > with
> > > > >> > > > Scan object.
> > > > >> > > > In write() method, we now have:
> > > > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > > >> > > > ...
> > > > >> > > >       for (Map.Entry<String, byte[]> attr :
> > > > >> this.attributes.entrySet())
> > > > >> > {
> > > > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > > >> > > >       }
> > > > >> > > > The start/end rows may be written twice.
> > > > >> > > >
> > > > >> > > > Of course, you have full control over how to generate the
> > unique
> > > > ID
> > > > >> for
> > > > >> > > > "sourceScan" attribute.
> > > > >> > > >
> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > Maybe
> > > > the
> > > > >> > > second
> > > > >> > > > should be named HBASE-3811-v2.patch<
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > > >> > > >?
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > > > >> > alex.baranov.v@gmail.com
> > > > >> > > >wrote:
> > > > >> > > >
> > > > >> > > >> > Can you remove the first version ?
> > > > >> > > >> Isn't it ok to keep it in JIRA issue?
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > > > >> supports
> > > > >> > > >> setAttribute() ?
> > > > >> > > >> > If it does, can you encode start row and end row as
> > > > "sourceScan"
> > > > >> > > >> attribute ?
> > > > >> > > >>
> > > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> > > still
> > > > >> want
> > > > >> > > to
> > > > >> > > >> hear from the devs the story about Scan version.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > One consideration is that start row or end row may be
> quite
> > > > long.
> > > > >> > > >>
> > > > >> > > >> Yeah, that is was my though too at first. Though it might
> be
> > > ok,
> > > > >> since
> > > > >> > > we
> > > > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > > > >> > > >>
> > > > >> > > >> > What do you think ?
> > > > >> > > >>
> > > > >> > > >> I'd love to hear from you is this variant I mentioned is
> what
> > > we
> > > > >> are
> > > > >> > > >> looking at here:
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > From what I understand, you want to distinguish scans
> fired
> > > by
> > > > >> the
> > > > >> > > same
> > > > >> > > >> distributed scan.
> > > > >> > > >> > I.e. group scans which were fired by single distributed
> > scan.
> > > > If
> > > > >> > > that's
> > > > >> > > >> what you want, distributed
> > > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> > > attribute
> > > > >> to
> > > > >> > its
> > > > >> > > >> value. This way we'll
> > > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
> > <number
> > > of
> > > > >> > > >> distributed scans invoked by
> > > > >> > > >> > client side> and two scans on server side will have the
> > same
> > > > >> > > >> "sourceScan" attribute iff they
> > > > >> > > >> > "belong" to same distributed scan.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> Alex Baranau
> > > > >> > > >> ----
> > > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > > > Hadoop
> > > > >> -
> > > > >> > > >> HBase
> > > > >> > > >>
> > > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > >>
> > > > >> > > >>> Alex:
> > > > >> > > >>> Your second patch looks good.
> > > > >> > > >>> Can you remove the first version ?
> > > > >> > > >>>
> > > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > > > supports
> > > > >> > > >>> setAttribute() ?
> > > > >> > > >>> If it does, can you encode start row and end row as
> > > "sourceScan"
> > > > >> > > >>> attribute ?
> > > > >> > > >>>
> > > > >> > > >>> One consideration is that start row or end row may be
> quite
> > > > long.
> > > > >> > > >>> Ideally we should store hash code of source Scan object as
> > > > >> > "sourceScan"
> > > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can
> add
> > > it,
> > > > >> that
> > > > >> > > would
> > > > >> > > >>> require running all Scan related tests.
> > > > >> > > >>>
> > > > >> > > >>> What do you think ?
> > > > >> > > >>>
> > > > >> > > >>> Thanks
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > > >> > > alex.baranov.v@gmail.com>wrote:
> > > > >> > > >>>
> > > > >> > > >>>> Sorry for the delay in response (public holidays here).
> > > > >> > > >>>>
> > > > >> > > >>>> This depends on what info you are looking for on server
> > side.
> > > > >> > > >>>>
> > > > >> > > >>>> From what I understand, you want to distinguish scans
> fired
> > > by
> > > > >> the
> > > > >> > > same
> > > > >> > > >>>> distributed scan. I.e. group scans which were fired by
> > single
> > > > >> > > distributed
> > > > >> > > >>>> scan. If that's what you want, distributed scan can
> > generate
> > > > >> unique
> > > > >> > ID
> > > > >> > > and
> > > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
> > we'll
> > > > have
> > > > >> <#
> > > > >> > > of
> > > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > > > distributed
> > > > >> > scans
> > > > >> > > >>>> invoked by client side> and two scans on server side will
> > > have
> > > > >> the
> > > > >> > > same
> > > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
> > distributed
> > > > >> scan.
> > > > >> > > >>>>
> > > > >> > > >>>> Is this what are you looking for?
> > > > >> > > >>>>
> > > > >> > > >>>> Alex Baranau
> > > > >> > > >>>>
> > > > >> > > >>>> P.S. attached patch for HBASE-3811<
> > > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > > >> > > >>>> .
> > > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > > >> > > >>>>
> > > > >> > > >>>> ----
> > > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > > >> Hadoop
> > > > >> > -
> > > > >> > > >>>> HBase
> > > > >> > > >>>>
> > > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > >>>>
> > > > >> > > >>>>> Alex:
> > > > >> > > >>>>> What type of identification should we put in the map of
> > the
> > > > Scan
> > > > >> > > object
> > > > >> > > >>>>> ?
> > > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
> > the
> > > > user
> > > > >> > can
> > > > >> > > >>>>> use same distributor on multiple scans.
> > > > >> > > >>>>>
> > > > >> > > >>>>> Please share your thought.
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > > > >> > > >>>>>
> > > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> Alex Baranau
> > > > >> > > >>>>>> ----
> > > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > > -
> > > > >> > Hadoop
> > > > >> > > -
> > > > >> > > >>>>>> HBase
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> > My plan was to make regions that have active scanners
> > > more
> > > > >> > stable
> > > > >> > > -
> > > > >> > > >>>>>> trying
> > > > >> > > >>>>>> > not to move them when balancing.
> > > > >> > > >>>>>> > I prefer second approach - adding custom attribute(s)
> > to
> > > > Scan
> > > > >> so
> > > > >> > > >>>>>> that the
> > > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > > >> > > >>>>>> alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > >wrote:
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
> > just
> > > > >> > > >>>>>> differently) when
> > > > >> > > >>>>>> > > determining the load?
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > The current code looks like this:
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > class DistributedScanner:
> > > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
> > hTable,
> > > > >> Scan
> > > > >> > > >>>>>> original,
> > > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > > > >> IOException {
> > > > >> > > >>>>>> > >    byte[][] startKeys =
> > > > >> > > >>>>>> > >
> > > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > > >> > > >>>>>> > >    byte[][] stopKeys =
> > > > >> > > >>>>>> > >
> > > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > > > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > > > >> > > >>>>>> > >    }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > >    ResultScanner[] rss = new
> > > > >> ResultScanner[startKeys.length];
> > > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > > > >> > > >>>>>> > >    }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > > > >> > > >>>>>> > >  }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > This is client code. To make these scans
> > "identifiable"
> > > > we
> > > > >> > need
> > > > >> > > to
> > > > >> > > >>>>>> either
> > > > >> > > >>>>>> > > use some different (derived from Scan) class or add
> > > some
> > > > >> > > attribute
> > > > >> > > >>>>>> to
> > > > >> > > >>>>>> > them.
> > > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
> > the
> > > > >> former,
> > > > >> > > but
> > > > >> > > >>>>>> I
> > > > >> > > >>>>>> > don't
> > > > >> > > >>>>>> > > really like the idea of creating extra class (with
> no
> > > > extra
> > > > >> > > >>>>>> > functionality)
> > > > >> > > >>>>>> > > just to distinguish it from the base one.
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > > > >> differently
> > > > >> > > on
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side, that would be helpful.
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > Alex Baranau
> > > > >> > > >>>>>> > > ----
> > > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > > > Nutch
> > > > >> -
> > > > >> > > >>>>>> Hadoop -
> > > > >> > > >>>>>> > HBase
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > > > >> yuzhihong@gmail.com>
> > > > >> > > >>>>>> wrote:
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > > My request would be to make the distributed scan
> > > > >> > identifiable
> > > > >> > > >>>>>> from
> > > > >> > > >>>>>> > server
> > > > >> > > >>>>>> > > > side.
> > > > >> > > >>>>>> > > > :-)
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > > >> > > >>>>>> > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > >wrote:
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
> of
> > > > >> regions
> > > > >> > for
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > > > underlying
> > > > >> > > >>>>>> > > > > > table.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
> > holds
> > > > data
> > > > >> > for
> > > > >> > > >>>>>> the whole
> > > > >> > > >>>>>> > > > table
> > > > >> > > >>>>>> > > > > (not many records in table yet), distributed
> scan
> > > > will
> > > > >> > fire
> > > > >> > > N
> > > > >> > > >>>>>> scans
> > > > >> > > >>>>>> > > > against
> > > > >> > > >>>>>> > > > > the same region.
> > > > >> > > >>>>>> > > > > On the other hand, in case there are huge
> number
> > of
> > > > >> > regions
> > > > >> > > >>>>>> for
> > > > >> > > >>>>>> > single
> > > > >> > > >>>>>> > > > > table, each scan can span over multiple
> regions.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> > "distributed
> > > > >> scan"
> > > > >> > at
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > With current implementation "distributed" scan
> > > won't
> > > > be
> > > > >> > > >>>>>> recognized as
> > > > >> > > >>>>>> > > > > something special on the server side. It will
> be
> > an
> > > > >> > ordinary
> > > > >> > > >>>>>> scan.
> > > > >> > > >>>>>> > > Though
> > > > >> > > >>>>>> > > > > the number of scan will increase, given that
> the
> > > > >> typical
> > > > >> > > >>>>>> situation is
> > > > >> > > >>>>>> > > > "many
> > > > >> > > >>>>>> > > > > regions for single table", the scans of the
> same
> > > > >> > > "distributed
> > > > >> > > >>>>>> scan"
> > > > >> > > >>>>>> > are
> > > > >> > > >>>>>> > > > > likely not to hit the same region.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > Not sure if I answered your questions here.
> Feel
> > > free
> > > > >> to
> > > > >> > ask
> > > > >> > > >>>>>> more ;)
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > ----
> > > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > > -
> > > > >> Nutch
> > > > >> > -
> > > > >> > > >>>>>> Hadoop -
> > > > >> > > >>>>>> > > > HBase
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > > >> > > yuzhihong@gmail.com>
> > > > >> > > >>>>>> wrote:
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > > Alex:
> > > > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > > >> > > >>>>>> > > > > >
> > https://issues.apache.org/jira/browse/HBASE-3679
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> > "distributed
> > > > >> scan"
> > > > >> > at
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side.
> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
> of
> > > > >> regions
> > > > >> > for
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > > > underlying
> > > > >> > > >>>>>> > > > > > table.
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > Cheers
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex
> Baranau
> > <
> > > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > > > >wrote:
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > > Hi Ted,
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> > > where
> > > > >> data
> > > > >> > is
> > > > >> > > >>>>>> consumed
> > > > >> > > >>>>>> > > by
> > > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > > > >> performance
> > > > >> > of
> > > > >> > > >>>>>> pure
> > > > >> > > >>>>>> > > > > "distributed
> > > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> > > expect
> > > > >> it
> > > > >> > to
> > > > >> > > be
> > > > >> > > >>>>>> close
> > > > >> > > >>>>>> > to
> > > > >> > > >>>>>> > > > > > simple
> > > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> > > faster
> > > > >> > > >>>>>> depending on
> > > > >> > > >>>>>> > your
> > > > >> > > >>>>>> > > > > data
> > > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > > > timeseries
> > > > >> > data
> > > > >> > > >>>>>> > > (sequential)
> > > > >> > > >>>>>> > > > > > which
> > > > >> > > >>>>>> > > > > > > is written into the single region at a
> time,
> > > then
> > > > >> e.g.
> > > > >> > > if
> > > > >> > > >>>>>> you
> > > > >> > > >>>>>> > > access
> > > > >> > > >>>>>> > > > > > delta
> > > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if
> from
> > > not
> > > > >> > single
> > > > >> > > >>>>>> client)
> > > > >> > > >>>>>> > > > these
> > > > >> > > >>>>>> > > > > > > scans
> > > > >> > > >>>>>> > > > > > > are likely to hit the same region or couple
> > of
> > > > >> regions
> > > > >> > > at
> > > > >> > > >>>>>> a time,
> > > > >> > > >>>>>> > > > which
> > > > >> > > >>>>>> > > > > > may
> > > > >> > > >>>>>> > > > > > > perform worse comparing to many scans
> hitting
> > > > data
> > > > >> > that
> > > > >> > > is
> > > > >> > > >>>>>> much
> > > > >> > > >>>>>> > > > better
> > > > >> > > >>>>>> > > > > > > spread over region servers.
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should
> not
> > > > >> affect
> > > > >> > > >>>>>> reading
> > > > >> > > >>>>>> > > > > performance
> > > > >> > > >>>>>> > > > > > at
> > > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> > > times
> > > > >> more
> > > > >> > > >>>>>> splits and
> > > > >> > > >>>>>> > > > hence
> > > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> > > cases
> > > > >> this
> > > > >> > > even
> > > > >> > > >>>>>> > improves
> > > > >> > > >>>>>> > > > > > overall
> > > > >> > > >>>>>> > > > > > > performance of the MR job since work is
> > better
> > > > >> > > distributed
> > > > >> > > >>>>>> over
> > > > >> > > >>>>>> > > > cluster
> > > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> > > constantly
> > > > >> > process
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > coming
> > > > >> > > >>>>>> > > > > delta
> > > > >> > > >>>>>> > > > > > > which usually resides in one or just couple
> > of
> > > > >> regions
> > > > >> > > >>>>>> depending
> > > > >> > > >>>>>> > on
> > > > >> > > >>>>>> > > > > > > processing frequency).
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > If you can share details on your case, that
> > > will
> > > > >> help
> > > > >> > to
> > > > >> > > >>>>>> > understand
> > > > >> > > >>>>>> > > > > what
> > > > >> > > >>>>>> > > > > > > effect(s) to expect from using this
> approach.
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > > > ----
> > > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > > > Lucene
> > > > >> -
> > > > >> > > Nutch
> > > > >> > > >>>>>> -
> > > > >> > > >>>>>> > Hadoop
> > > > >> > > >>>>>> > > -
> > > > >> > > >>>>>> > > > > > HBase
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > > >> > > >>>>>> yuzhihong@gmail.com>
> > > > >> > > >>>>>> > > wrote:
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
> > compared
> > > > to
> > > > >> one
> > > > >> > > >>>>>> scanner
> > > > >> > > >>>>>> > > > > > originally,
> > > > >> > > >>>>>> > > > > > > > have you performed load testing to see
> the
> > > > impact
> > > > >> ?
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > Thanks
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> > > Baranau
> > > > <
> > > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > > > > > >wrote:
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Hello guys,
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > > > >> project/lib
> > > > >> > > >>>>>> around
> > > > >> > > >>>>>> > > HBase:
> > > > >> > > >>>>>> > > > > > > HBaseWD.
> > > > >> > > >>>>>> > > > > > > > > It
> > > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of
> the
> > > > load
> > > > >> > > (across
> > > > >> > > >>>>>> > > > > regionservers)
> > > > >> > > >>>>>> > > > > > > > when
> > > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
> > key
> > > > >> nature)
> > > > >> > > >>>>>> records.
> > > > >> > > >>>>>> > It
> > > > >> > > >>>>>> > > > > > > implements
> > > > >> > > >>>>>> > > > > > > > > the solution which was discussed
> several
> > > > times
> > > > >> on
> > > > >> > > this
> > > > >> > > >>>>>> > mailing
> > > > >> > > >>>>>> > > > list
> > > > >> > > >>>>>> > > > > > > (e.g.
> > > > >> > > >>>>>> > > > > > > > > here:
> > > http://search-hadoop.com/m/gNRA82No5Wk
> > > > ).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please find the sources at
> > > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > > >> > > >>>>>> > > > > > > > > also
> > > > >> > > >>>>>> > > > > > > > > a jar of current version for
> > convenience).
> > > It
> > > > >> is
> > > > >> > > very
> > > > >> > > >>>>>> easy to
> > > > >> > > >>>>>> > > > make
> > > > >> > > >>>>>> > > > > > use
> > > > >> > > >>>>>> > > > > > > of
> > > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
> > project
> > > > >> with
> > > > >> > 1+2
> > > > >> > > >>>>>> lines of
> > > > >> > > >>>>>> > > > code
> > > > >> > > >>>>>> > > > > > (one
> > > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> > > configuring
> > > > >> > > MapReduce
> > > > >> > > >>>>>> job).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please find below the short intro to
> the
> > > lib
> > > > >> [1].
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > > > > > ----
> > > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ ::
> Solr
> > -
> > > > >> Lucene
> > > > >> > -
> > > > >> > > >>>>>> Nutch -
> > > > >> > > >>>>>> > > > Hadoop
> > > > >> > > >>>>>> > > > > -
> > > > >> > > >>>>>> > > > > > > > HBase
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > [1]
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Description:
> > > > >> > > >>>>>> > > > > > > > > ------------
> > > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> > > (sequential)
> > > > >> > Writes.
> > > > >> > > >>>>>> It was
> > > > >> > > >>>>>> > > > > inspired
> > > > >> > > >>>>>> > > > > > by
> > > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists
> around
> > > the
> > > > >> > > problem
> > > > >> > > >>>>>> of
> > > > >> > > >>>>>> > > choosing
> > > > >> > > >>>>>> > > > > > > > between:
> > > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
> > keys
> > > > >> (e.g.
> > > > >> > > >>>>>> time-series
> > > > >> > > >>>>>> > > data
> > > > >> > > >>>>>> > > > > > with
> > > > >> > > >>>>>> > > > > > > > row
> > > > >> > > >>>>>> > > > > > > > > key
> > > > >> > > >>>>>> > > > > > > > >  built based on ts)
> > > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > First approach makes possible to
> perform
> > > fast
> > > > >> > range
> > > > >> > > >>>>>> scans
> > > > >> > > >>>>>> > with
> > > > >> > > >>>>>> > > > help
> > > > >> > > >>>>>> > > > > > of
> > > > >> > > >>>>>> > > > > > > > > setting
> > > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > > > single
> > > > >> > > region
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > > > > > hot-spotting
> > > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys
> go
> > > in
> > > > >> > > sequence
> > > > >> > > >>>>>> all
> > > > >> > > >>>>>> > > records
> > > > >> > > >>>>>> > > > > end
> > > > >> > > >>>>>> > > > > > > up
> > > > >> > > >>>>>> > > > > > > > > written into a single region at a
> time).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest
> writing
> > > > >> > performance
> > > > >> > > >>>>>> by
> > > > >> > > >>>>>> > > > > distributing
> > > > >> > > >>>>>> > > > > > > new
> > > > >> > > >>>>>> > > > > > > > > records over random regions but makes
> not
> > > > >> possible
> > > > >> > > >>>>>> doing fast
> > > > >> > > >>>>>> > > > range
> > > > >> > > >>>>>> > > > > > > scans
> > > > >> > > >>>>>> > > > > > > > > against written data.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
> > middle
> > > of
> > > > >> the
> > > > >> > > two
> > > > >> > > >>>>>> above
> > > > >> > > >>>>>> > and
> > > > >> > > >>>>>> > > > > > proved
> > > > >> > > >>>>>> > > > > > > to
> > > > >> > > >>>>>> > > > > > > > > perform well by distributing records
> over
> > > the
> > > > >> > > cluster
> > > > >> > > >>>>>> during
> > > > >> > > >>>>>> > > > > writing
> > > > >> > > >>>>>> > > > > > > data
> > > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
> > HBaseWD
> > > > >> > provides
> > > > >> > > >>>>>> very
> > > > >> > > >>>>>> > > simple
> > > > >> > > >>>>>> > > > > API
> > > > >> > > >>>>>> > > > > > to
> > > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> > > with
> > > > >> > > existing
> > > > >> > > >>>>>> code.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib
> usage
> > > info
> > > > >> as
> > > > >> > > they
> > > > >> > > >>>>>> aimed
> > > > >> > > >>>>>> > to
> > > > >> > > >>>>>> > > > act
> > > > >> > > >>>>>> > > > > as
> > > > >> > > >>>>>> > > > > > > > > example.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > > >> > > >>>>>> > > > > > > > > ----------------------------
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Distributing records with sequential
> keys
> > > > which
> > > > >> > are
> > > > >> > > >>>>>> being
> > > > >> > > >>>>>> > > written
> > > > >> > > >>>>>> > > > > in
> > > > >> > > >>>>>> > > > > > up
> > > > >> > > >>>>>> > > > > > > > to
> > > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > > > >> distributing
> > > > >> > > into
> > > > >> > > >>>>>> 32
> > > > >> > > >>>>>> > > buckets
> > > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > > > >> > > >>>>>> > > > > > > > >                           new
> > > > >> > > >>>>>> > > > > > > > >
> > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > > > >> > > >>>>>> > > > > > > > >      Put put = new
> > > > >> > > >>>>>> > > > > >
> > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > >> > > >>>>>> > > > > > > > >      ... // add values
> > > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Performing a range scan over written
> data
> > > > >> > > (internally
> > > > >> > > >>>>>> > > > > <bucketsCount>
> > > > >> > > >>>>>> > > > > > > > > scanners
> > > > >> > > >>>>>> > > > > > > > > executed):
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> > stopKey);
> > > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > > >> > > >>>>>> > > > > > > > > keyDistributor);
> > > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > > > >> > > >>>>>> > > > > > > > >      ...
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
> > data
> > > > >> chunk
> > > > >> > > >>>>>> specified by
> > > > >> > > >>>>>> > > > Scan:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > > > >> > HBaseConfiguration.create();
> > > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > > > "testMapreduceJob");
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> > stopKey);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >>  TableMapReduceUtil.initTableMapperJob("table",
> > > > >> > > >>>>>> scan,
> > > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > > > >> > > >>>>>> ImmutableBytesWritable.class,
> > > > >> > > >>>>>> > > > > > > Result.class,
> > > > >> > > >>>>>> > > > > > > > > job);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    // Substituting standard
> > > TableInputFormat
> > > > >> which
> > > > >> > > was
> > > > >> > > >>>>>> set in
> > > > >> > > >>>>>> > > > > > > > >    //
> > > > >> TableMapReduceUtil.initTableMapperJob(...)
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >>  keyDistributor.addInfo(job.getConfiguration());
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing
> Patterns:
> > > > >> > > >>>>>> > > > > > > > >
> -----------------------------------------
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and
> to
> > > > >> support
> > > > >> > > >>>>>> custom row
> > > > >> > > >>>>>> > > key
> > > > >> > > >>>>>> > > > > > > > > distribution
> > > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > > > >> distributing
> > > > >> > > >>>>>> logic just
> > > > >> > > >>>>>> > > > > > implement
> > > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract
> class
> > > > which
> > > > >> is
> > > > >> > > >>>>>> really very
> > > > >> > > >>>>>> > > > > simple:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    public abstract class
> > > > >> AbstractRowKeyDistributor
> > > > >> > > >>>>>> implements
> > > > >> > > >>>>>> > > > > > > > > Parametrizable {
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > > >> > getDistributedKey(byte[]
> > > > >> > > >>>>>> > > > originalKey);
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > > >> getOriginalKey(byte[]
> > > > >> > > >>>>>> > adjustedKey);
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > > > >> > > >>>>>> getAllDistributedKeys(byte[]
> > > > >> > > >>>>>> > > > > > > originalKey);
> > > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>
> > > > >> > > >>>
> > > > >> > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

Implemented RowKeyDistributorByHashPrefix. From README:

Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please
see
example below. It creates "distributed key" based on original key value
so that later when you have original key and want to update the record you
can
calculate distributed key without roundtrip to HBase.

AbstractRowKeyDistributor keyDistributor =
        new RowKeyDistributorByHashPrefix(
                  new RowKeyDistributorByHashPrefix.OneByteSimpleHash(15));

You can use your own hashing logic here by implementing simple interface:

public static interface Hasher extends Parametrizable {
  byte[] getHashPrefix(byte[] originalKey);
  byte[][] getAllPossiblePrefixes();
}


OneByteSimpleHash implements very simple hash algorythm: simple sum of all
bytes in row key % maxBuckets. In example above 15 is maxBuckets count. You
can use buckets count # up to 255. Please, use wisely, as (the same thing as
with byOneByte prefix) Disctributed scanner will instantiate this number of
scans under the hood.

With this row key hash-based distributor, you can find out the distributed
key (and use it to update the record) without roundtrip to HBase. From
unit-test:

    // Testing simple get
    byte[] originalKey = new byte[] {123, 124, 122};
    Put put = new Put(keyDistributor.getDistributedKey(originalKey));
    put.add(CF, QUAL, Bytes.toBytes("some"));
    hTable.put(put);

    byte[] distributedKey = keyDistributor.getDistributedKey(originalKey);
    Result result = hTable.get(new Get(distributedKey));
    Assert.assertArrayEquals(originalKey,
keyDistributor.getOriginalKey(result.getRow()));
    Assert.assertArrayEquals(Bytes.toBytes("some"), result.getValue(CF,
QUAL));


NOTE: This feature is included in hbasewd-0.1.0-SNAPSHOT-2011.05.19.jar
(downloadable from https://github.com/sematext/HBaseWD)

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

P.S.
> Can you summarize HBaseWD in your blog
That is on my todo list! You pushed it higher to the top priority items ;)

On Thu, May 19, 2011 at 6:50 AM, Weishung Chung <we...@gmail.com> wrote:

> I have another question about option 2. It seems like I need to handle the
> distributed scan differently to read from start row to end row, assuming 1
> byte hash of the original key is used as prefix since the order of the
> original key range is different from the resulting distributed key range.
>
> On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Alex:
> > Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
> >
> > Thanks
> >
> > On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > There are several options here. E.g.:
> > >
> > > 1) Given that you have "original key" of the record, you can fetch the
> > > stored record key from HBase and use it to create Put with updated (or
> > new)
> > > cells.
> > >
> > > Currently you'll need to use distributes scan for that, there's not
> > > analogue
> > > for Get operation yet (see
> https://github.com/sematext/HBaseWD/issues/1
> > ).
> > >
> > > Note: you need to first find out the real key of stored record by
> > fetching
> > > data from HBase in case you use included in current lib
> > > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
> > >
> > > 2) You can create your own RowKeyDistributor implementation which will
> > > create "distributed key" based on original key value so that later when
> > you
> > > have original key and want to update the record you can calculate
> > > distributed key without roundtrip to HBase.
> > >
> > > E.g. your RowKeyDistributor implementation you can calculate 1-byte
> hash
> > of
> > > original key (https://github.com/sematext/HBaseWD/issues/2).
> > >
> > >
> > >
> > > In either way you don't need to delete record to update some cells of
> it
> > or
> > > add new cells.
> > >
> > > Please let me know if you have more Qs!
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> > > wrote:
> > >
> > > > I have another question. For overwriting, do I need to delete the
> > > existing
> > > > one before re-writing it?
> > > >
> > > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <weishung@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > > > >
> > > > >
> > > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > > >wrote:
> > > > >
> > > > >> Thanks for the interest!
> > > > >>
> > > > >> We are using it in production. It is simple and hence quite
> stable.
> > > > Though
> > > > >> some minor pieces are missing (like
> > > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > > > >> stability
> > > > >> and/or major functionality.
> > > > >>
> > > > >> Alex Baranau
> > > > >> ----
> > > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > > > >> HBase
> > > > >>
> > > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
> > weishung@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > What's the status on this package? Is it mature enough?
> > > > >> >  I am using it in my project, tried out the write method
> yesterday
> > > and
> > > > >> > going
> > > > >> > to incorporate into read method tomorrow.
> > > > >> >
> > > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > > > alex.baranov.v@gmail.com
> > > > >> > >wrote:
> > > > >> >
> > > > >> > > > The start/end rows may be written twice.
> > > > >> > >
> > > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > > > "bearable"
> > > > >> in
> > > > >> > > attribute value no matter how long are they (keys), since we
> > > already
> > > > >> OK
> > > > >> > > with
> > > > >> > > transferring them initially (i.e. we should be OK with
> > > transferring
> > > > 2x
> > > > >> > > times
> > > > >> > > more).
> > > > >> > >
> > > > >> > > So, what about the suggestion of sourceScan attribute value I
> > > > >> mentioned?
> > > > >> > If
> > > > >> > > you can tell why it isn't sufficient in your case, I'd have
> more
> > > > info
> > > > >> to
> > > > >> > > think about better suggestion ;)
> > > > >> > >
> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > > >> > > >?
> > > > >> > >
> > > > >> > > np. Can do that. Just thought that they (patches) can be
> sorted
> > by
> > > > >> date
> > > > >> > to
> > > > >> > > find out the final one (aka "convention over naming-rules").
> > > > >> > >
> > > > >> > > Alex.
> > > > >> > >
> > > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > >> > >
> > > > >> > > > >> Though it might be ok, since we anyways "transfer"
> > start/stop
> > > > >> rows
> > > > >> > > with
> > > > >> > > > Scan object.
> > > > >> > > > In write() method, we now have:
> > > > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > > >> > > > ...
> > > > >> > > >       for (Map.Entry<String, byte[]> attr :
> > > > >> this.attributes.entrySet())
> > > > >> > {
> > > > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > > >> > > >       }
> > > > >> > > > The start/end rows may be written twice.
> > > > >> > > >
> > > > >> > > > Of course, you have full control over how to generate the
> > unique
> > > > ID
> > > > >> for
> > > > >> > > > "sourceScan" attribute.
> > > > >> > > >
> > > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > Maybe
> > > > the
> > > > >> > > second
> > > > >> > > > should be named HBASE-3811-v2.patch<
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > > >> > > >?
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > > > >> > alex.baranov.v@gmail.com
> > > > >> > > >wrote:
> > > > >> > > >
> > > > >> > > >> > Can you remove the first version ?
> > > > >> > > >> Isn't it ok to keep it in JIRA issue?
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > > > >> supports
> > > > >> > > >> setAttribute() ?
> > > > >> > > >> > If it does, can you encode start row and end row as
> > > > "sourceScan"
> > > > >> > > >> attribute ?
> > > > >> > > >>
> > > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> > > still
> > > > >> want
> > > > >> > > to
> > > > >> > > >> hear from the devs the story about Scan version.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > One consideration is that start row or end row may be
> quite
> > > > long.
> > > > >> > > >>
> > > > >> > > >> Yeah, that is was my though too at first. Though it might
> be
> > > ok,
> > > > >> since
> > > > >> > > we
> > > > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > > > >> > > >>
> > > > >> > > >> > What do you think ?
> > > > >> > > >>
> > > > >> > > >> I'd love to hear from you is this variant I mentioned is
> what
> > > we
> > > > >> are
> > > > >> > > >> looking at here:
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> > From what I understand, you want to distinguish scans
> fired
> > > by
> > > > >> the
> > > > >> > > same
> > > > >> > > >> distributed scan.
> > > > >> > > >> > I.e. group scans which were fired by single distributed
> > scan.
> > > > If
> > > > >> > > that's
> > > > >> > > >> what you want, distributed
> > > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> > > attribute
> > > > >> to
> > > > >> > its
> > > > >> > > >> value. This way we'll
> > > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
> > <number
> > > of
> > > > >> > > >> distributed scans invoked by
> > > > >> > > >> > client side> and two scans on server side will have the
> > same
> > > > >> > > >> "sourceScan" attribute iff they
> > > > >> > > >> > "belong" to same distributed scan.
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > > >> Alex Baranau
> > > > >> > > >> ----
> > > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > > > Hadoop
> > > > >> -
> > > > >> > > >> HBase
> > > > >> > > >>
> > > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > >> wrote:
> > > > >> > > >>
> > > > >> > > >>> Alex:
> > > > >> > > >>> Your second patch looks good.
> > > > >> > > >>> Can you remove the first version ?
> > > > >> > > >>>
> > > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > > > supports
> > > > >> > > >>> setAttribute() ?
> > > > >> > > >>> If it does, can you encode start row and end row as
> > > "sourceScan"
> > > > >> > > >>> attribute ?
> > > > >> > > >>>
> > > > >> > > >>> One consideration is that start row or end row may be
> quite
> > > > long.
> > > > >> > > >>> Ideally we should store hash code of source Scan object as
> > > > >> > "sourceScan"
> > > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can
> add
> > > it,
> > > > >> that
> > > > >> > > would
> > > > >> > > >>> require running all Scan related tests.
> > > > >> > > >>>
> > > > >> > > >>> What do you think ?
> > > > >> > > >>>
> > > > >> > > >>> Thanks
> > > > >> > > >>>
> > > > >> > > >>>
> > > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > > >> > > alex.baranov.v@gmail.com>wrote:
> > > > >> > > >>>
> > > > >> > > >>>> Sorry for the delay in response (public holidays here).
> > > > >> > > >>>>
> > > > >> > > >>>> This depends on what info you are looking for on server
> > side.
> > > > >> > > >>>>
> > > > >> > > >>>> From what I understand, you want to distinguish scans
> fired
> > > by
> > > > >> the
> > > > >> > > same
> > > > >> > > >>>> distributed scan. I.e. group scans which were fired by
> > single
> > > > >> > > distributed
> > > > >> > > >>>> scan. If that's what you want, distributed scan can
> > generate
> > > > >> unique
> > > > >> > ID
> > > > >> > > and
> > > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
> > we'll
> > > > have
> > > > >> <#
> > > > >> > > of
> > > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > > > distributed
> > > > >> > scans
> > > > >> > > >>>> invoked by client side> and two scans on server side will
> > > have
> > > > >> the
> > > > >> > > same
> > > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
> > distributed
> > > > >> scan.
> > > > >> > > >>>>
> > > > >> > > >>>> Is this what are you looking for?
> > > > >> > > >>>>
> > > > >> > > >>>> Alex Baranau
> > > > >> > > >>>>
> > > > >> > > >>>> P.S. attached patch for HBASE-3811<
> > > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > > >> > > >>>> .
> > > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > > >> > > >>>>
> > > > >> > > >>>> ----
> > > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > > >> Hadoop
> > > > >> > -
> > > > >> > > >>>> HBase
> > > > >> > > >>>>
> > > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > >>>>
> > > > >> > > >>>>> Alex:
> > > > >> > > >>>>> What type of identification should we put in the map of
> > the
> > > > Scan
> > > > >> > > object
> > > > >> > > >>>>> ?
> > > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
> > the
> > > > user
> > > > >> > can
> > > > >> > > >>>>> use same distributor on multiple scans.
> > > > >> > > >>>>>
> > > > >> > > >>>>> Please share your thought.
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > > > >> > > >>>>>
> > > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> Alex Baranau
> > > > >> > > >>>>>> ----
> > > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > > -
> > > > >> > Hadoop
> > > > >> > > -
> > > > >> > > >>>>>> HBase
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> > > yuzhihong@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > >>>>>>
> > > > >> > > >>>>>> > My plan was to make regions that have active scanners
> > > more
> > > > >> > stable
> > > > >> > > -
> > > > >> > > >>>>>> trying
> > > > >> > > >>>>>> > not to move them when balancing.
> > > > >> > > >>>>>> > I prefer second approach - adding custom attribute(s)
> > to
> > > > Scan
> > > > >> so
> > > > >> > > >>>>>> that the
> > > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > > >> > > >>>>>> alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > >wrote:
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
> > just
> > > > >> > > >>>>>> differently) when
> > > > >> > > >>>>>> > > determining the load?
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > The current code looks like this:
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > class DistributedScanner:
> > > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
> > hTable,
> > > > >> Scan
> > > > >> > > >>>>>> original,
> > > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > > > >> IOException {
> > > > >> > > >>>>>> > >    byte[][] startKeys =
> > > > >> > > >>>>>> > >
> > > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > > >> > > >>>>>> > >    byte[][] stopKeys =
> > > > >> > > >>>>>> > >
> > > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > > > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > > > >> > > >>>>>> > >    }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > >    ResultScanner[] rss = new
> > > > >> ResultScanner[startKeys.length];
> > > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > > > >> > > >>>>>> > >    }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > > > >> > > >>>>>> > >  }
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > This is client code. To make these scans
> > "identifiable"
> > > > we
> > > > >> > need
> > > > >> > > to
> > > > >> > > >>>>>> either
> > > > >> > > >>>>>> > > use some different (derived from Scan) class or add
> > > some
> > > > >> > > attribute
> > > > >> > > >>>>>> to
> > > > >> > > >>>>>> > them.
> > > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
> > the
> > > > >> former,
> > > > >> > > but
> > > > >> > > >>>>>> I
> > > > >> > > >>>>>> > don't
> > > > >> > > >>>>>> > > really like the idea of creating extra class (with
> no
> > > > extra
> > > > >> > > >>>>>> > functionality)
> > > > >> > > >>>>>> > > just to distinguish it from the base one.
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > > > >> differently
> > > > >> > > on
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side, that would be helpful.
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > Alex Baranau
> > > > >> > > >>>>>> > > ----
> > > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > > > Nutch
> > > > >> -
> > > > >> > > >>>>>> Hadoop -
> > > > >> > > >>>>>> > HBase
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > > > >> yuzhihong@gmail.com>
> > > > >> > > >>>>>> wrote:
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> > > > My request would be to make the distributed scan
> > > > >> > identifiable
> > > > >> > > >>>>>> from
> > > > >> > > >>>>>> > server
> > > > >> > > >>>>>> > > > side.
> > > > >> > > >>>>>> > > > :-)
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > > >> > > >>>>>> > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > >wrote:
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
> of
> > > > >> regions
> > > > >> > for
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > > > underlying
> > > > >> > > >>>>>> > > > > > table.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
> > holds
> > > > data
> > > > >> > for
> > > > >> > > >>>>>> the whole
> > > > >> > > >>>>>> > > > table
> > > > >> > > >>>>>> > > > > (not many records in table yet), distributed
> scan
> > > > will
> > > > >> > fire
> > > > >> > > N
> > > > >> > > >>>>>> scans
> > > > >> > > >>>>>> > > > against
> > > > >> > > >>>>>> > > > > the same region.
> > > > >> > > >>>>>> > > > > On the other hand, in case there are huge
> number
> > of
> > > > >> > regions
> > > > >> > > >>>>>> for
> > > > >> > > >>>>>> > single
> > > > >> > > >>>>>> > > > > table, each scan can span over multiple
> regions.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> > "distributed
> > > > >> scan"
> > > > >> > at
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > With current implementation "distributed" scan
> > > won't
> > > > be
> > > > >> > > >>>>>> recognized as
> > > > >> > > >>>>>> > > > > something special on the server side. It will
> be
> > an
> > > > >> > ordinary
> > > > >> > > >>>>>> scan.
> > > > >> > > >>>>>> > > Though
> > > > >> > > >>>>>> > > > > the number of scan will increase, given that
> the
> > > > >> typical
> > > > >> > > >>>>>> situation is
> > > > >> > > >>>>>> > > > "many
> > > > >> > > >>>>>> > > > > regions for single table", the scans of the
> same
> > > > >> > > "distributed
> > > > >> > > >>>>>> scan"
> > > > >> > > >>>>>> > are
> > > > >> > > >>>>>> > > > > likely not to hit the same region.
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > Not sure if I answered your questions here.
> Feel
> > > free
> > > > >> to
> > > > >> > ask
> > > > >> > > >>>>>> more ;)
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > ----
> > > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > > -
> > > > >> Nutch
> > > > >> > -
> > > > >> > > >>>>>> Hadoop -
> > > > >> > > >>>>>> > > > HBase
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > > >> > > yuzhihong@gmail.com>
> > > > >> > > >>>>>> wrote:
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > > > > Alex:
> > > > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > > >> > > >>>>>> > > > > >
> > https://issues.apache.org/jira/browse/HBASE-3679
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> > "distributed
> > > > >> scan"
> > > > >> > at
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > side.
> > > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number
> of
> > > > >> regions
> > > > >> > for
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > > > underlying
> > > > >> > > >>>>>> > > > > > table.
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > Cheers
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex
> Baranau
> > <
> > > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > > > >wrote:
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > > > > Hi Ted,
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> > > where
> > > > >> data
> > > > >> > is
> > > > >> > > >>>>>> consumed
> > > > >> > > >>>>>> > > by
> > > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > > > >> performance
> > > > >> > of
> > > > >> > > >>>>>> pure
> > > > >> > > >>>>>> > > > > "distributed
> > > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> > > expect
> > > > >> it
> > > > >> > to
> > > > >> > > be
> > > > >> > > >>>>>> close
> > > > >> > > >>>>>> > to
> > > > >> > > >>>>>> > > > > > simple
> > > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> > > faster
> > > > >> > > >>>>>> depending on
> > > > >> > > >>>>>> > your
> > > > >> > > >>>>>> > > > > data
> > > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > > > timeseries
> > > > >> > data
> > > > >> > > >>>>>> > > (sequential)
> > > > >> > > >>>>>> > > > > > which
> > > > >> > > >>>>>> > > > > > > is written into the single region at a
> time,
> > > then
> > > > >> e.g.
> > > > >> > > if
> > > > >> > > >>>>>> you
> > > > >> > > >>>>>> > > access
> > > > >> > > >>>>>> > > > > > delta
> > > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if
> from
> > > not
> > > > >> > single
> > > > >> > > >>>>>> client)
> > > > >> > > >>>>>> > > > these
> > > > >> > > >>>>>> > > > > > > scans
> > > > >> > > >>>>>> > > > > > > are likely to hit the same region or couple
> > of
> > > > >> regions
> > > > >> > > at
> > > > >> > > >>>>>> a time,
> > > > >> > > >>>>>> > > > which
> > > > >> > > >>>>>> > > > > > may
> > > > >> > > >>>>>> > > > > > > perform worse comparing to many scans
> hitting
> > > > data
> > > > >> > that
> > > > >> > > is
> > > > >> > > >>>>>> much
> > > > >> > > >>>>>> > > > better
> > > > >> > > >>>>>> > > > > > > spread over region servers.
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should
> not
> > > > >> affect
> > > > >> > > >>>>>> reading
> > > > >> > > >>>>>> > > > > performance
> > > > >> > > >>>>>> > > > > > at
> > > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> > > times
> > > > >> more
> > > > >> > > >>>>>> splits and
> > > > >> > > >>>>>> > > > hence
> > > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> > > cases
> > > > >> this
> > > > >> > > even
> > > > >> > > >>>>>> > improves
> > > > >> > > >>>>>> > > > > > overall
> > > > >> > > >>>>>> > > > > > > performance of the MR job since work is
> > better
> > > > >> > > distributed
> > > > >> > > >>>>>> over
> > > > >> > > >>>>>> > > > cluster
> > > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> > > constantly
> > > > >> > process
> > > > >> > > >>>>>> the
> > > > >> > > >>>>>> > coming
> > > > >> > > >>>>>> > > > > delta
> > > > >> > > >>>>>> > > > > > > which usually resides in one or just couple
> > of
> > > > >> regions
> > > > >> > > >>>>>> depending
> > > > >> > > >>>>>> > on
> > > > >> > > >>>>>> > > > > > > processing frequency).
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > If you can share details on your case, that
> > > will
> > > > >> help
> > > > >> > to
> > > > >> > > >>>>>> > understand
> > > > >> > > >>>>>> > > > > what
> > > > >> > > >>>>>> > > > > > > effect(s) to expect from using this
> approach.
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > > > ----
> > > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > > > Lucene
> > > > >> -
> > > > >> > > Nutch
> > > > >> > > >>>>>> -
> > > > >> > > >>>>>> > Hadoop
> > > > >> > > >>>>>> > > -
> > > > >> > > >>>>>> > > > > > HBase
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > > >> > > >>>>>> yuzhihong@gmail.com>
> > > > >> > > >>>>>> > > wrote:
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
> > compared
> > > > to
> > > > >> one
> > > > >> > > >>>>>> scanner
> > > > >> > > >>>>>> > > > > > originally,
> > > > >> > > >>>>>> > > > > > > > have you performed load testing to see
> the
> > > > impact
> > > > >> ?
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > Thanks
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> > > Baranau
> > > > <
> > > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > > > >> > > >>>>>> > > > > > > > >wrote:
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Hello guys,
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > > > >> project/lib
> > > > >> > > >>>>>> around
> > > > >> > > >>>>>> > > HBase:
> > > > >> > > >>>>>> > > > > > > HBaseWD.
> > > > >> > > >>>>>> > > > > > > > > It
> > > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of
> the
> > > > load
> > > > >> > > (across
> > > > >> > > >>>>>> > > > > regionservers)
> > > > >> > > >>>>>> > > > > > > > when
> > > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
> > key
> > > > >> nature)
> > > > >> > > >>>>>> records.
> > > > >> > > >>>>>> > It
> > > > >> > > >>>>>> > > > > > > implements
> > > > >> > > >>>>>> > > > > > > > > the solution which was discussed
> several
> > > > times
> > > > >> on
> > > > >> > > this
> > > > >> > > >>>>>> > mailing
> > > > >> > > >>>>>> > > > list
> > > > >> > > >>>>>> > > > > > > (e.g.
> > > > >> > > >>>>>> > > > > > > > > here:
> > > http://search-hadoop.com/m/gNRA82No5Wk
> > > > ).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please find the sources at
> > > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > > >> > > >>>>>> > > > > > > > > also
> > > > >> > > >>>>>> > > > > > > > > a jar of current version for
> > convenience).
> > > It
> > > > >> is
> > > > >> > > very
> > > > >> > > >>>>>> easy to
> > > > >> > > >>>>>> > > > make
> > > > >> > > >>>>>> > > > > > use
> > > > >> > > >>>>>> > > > > > > of
> > > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
> > project
> > > > >> with
> > > > >> > 1+2
> > > > >> > > >>>>>> lines of
> > > > >> > > >>>>>> > > > code
> > > > >> > > >>>>>> > > > > > (one
> > > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> > > configuring
> > > > >> > > MapReduce
> > > > >> > > >>>>>> job).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please find below the short intro to
> the
> > > lib
> > > > >> [1].
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Alex Baranau
> > > > >> > > >>>>>> > > > > > > > > ----
> > > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ ::
> Solr
> > -
> > > > >> Lucene
> > > > >> > -
> > > > >> > > >>>>>> Nutch -
> > > > >> > > >>>>>> > > > Hadoop
> > > > >> > > >>>>>> > > > > -
> > > > >> > > >>>>>> > > > > > > > HBase
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > [1]
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Description:
> > > > >> > > >>>>>> > > > > > > > > ------------
> > > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> > > (sequential)
> > > > >> > Writes.
> > > > >> > > >>>>>> It was
> > > > >> > > >>>>>> > > > > inspired
> > > > >> > > >>>>>> > > > > > by
> > > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists
> around
> > > the
> > > > >> > > problem
> > > > >> > > >>>>>> of
> > > > >> > > >>>>>> > > choosing
> > > > >> > > >>>>>> > > > > > > > between:
> > > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
> > keys
> > > > >> (e.g.
> > > > >> > > >>>>>> time-series
> > > > >> > > >>>>>> > > data
> > > > >> > > >>>>>> > > > > > with
> > > > >> > > >>>>>> > > > > > > > row
> > > > >> > > >>>>>> > > > > > > > > key
> > > > >> > > >>>>>> > > > > > > > >  built based on ts)
> > > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > First approach makes possible to
> perform
> > > fast
> > > > >> > range
> > > > >> > > >>>>>> scans
> > > > >> > > >>>>>> > with
> > > > >> > > >>>>>> > > > help
> > > > >> > > >>>>>> > > > > > of
> > > > >> > > >>>>>> > > > > > > > > setting
> > > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > > > single
> > > > >> > > region
> > > > >> > > >>>>>> server
> > > > >> > > >>>>>> > > > > > > hot-spotting
> > > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys
> go
> > > in
> > > > >> > > sequence
> > > > >> > > >>>>>> all
> > > > >> > > >>>>>> > > records
> > > > >> > > >>>>>> > > > > end
> > > > >> > > >>>>>> > > > > > > up
> > > > >> > > >>>>>> > > > > > > > > written into a single region at a
> time).
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest
> writing
> > > > >> > performance
> > > > >> > > >>>>>> by
> > > > >> > > >>>>>> > > > > distributing
> > > > >> > > >>>>>> > > > > > > new
> > > > >> > > >>>>>> > > > > > > > > records over random regions but makes
> not
> > > > >> possible
> > > > >> > > >>>>>> doing fast
> > > > >> > > >>>>>> > > > range
> > > > >> > > >>>>>> > > > > > > scans
> > > > >> > > >>>>>> > > > > > > > > against written data.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
> > middle
> > > of
> > > > >> the
> > > > >> > > two
> > > > >> > > >>>>>> above
> > > > >> > > >>>>>> > and
> > > > >> > > >>>>>> > > > > > proved
> > > > >> > > >>>>>> > > > > > > to
> > > > >> > > >>>>>> > > > > > > > > perform well by distributing records
> over
> > > the
> > > > >> > > cluster
> > > > >> > > >>>>>> during
> > > > >> > > >>>>>> > > > > writing
> > > > >> > > >>>>>> > > > > > > data
> > > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
> > HBaseWD
> > > > >> > provides
> > > > >> > > >>>>>> very
> > > > >> > > >>>>>> > > simple
> > > > >> > > >>>>>> > > > > API
> > > > >> > > >>>>>> > > > > > to
> > > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> > > with
> > > > >> > > existing
> > > > >> > > >>>>>> code.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib
> usage
> > > info
> > > > >> as
> > > > >> > > they
> > > > >> > > >>>>>> aimed
> > > > >> > > >>>>>> > to
> > > > >> > > >>>>>> > > > act
> > > > >> > > >>>>>> > > > > as
> > > > >> > > >>>>>> > > > > > > > > example.
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > > >> > > >>>>>> > > > > > > > > ----------------------------
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Distributing records with sequential
> keys
> > > > which
> > > > >> > are
> > > > >> > > >>>>>> being
> > > > >> > > >>>>>> > > written
> > > > >> > > >>>>>> > > > > in
> > > > >> > > >>>>>> > > > > > up
> > > > >> > > >>>>>> > > > > > > > to
> > > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > > > >> distributing
> > > > >> > > into
> > > > >> > > >>>>>> 32
> > > > >> > > >>>>>> > > buckets
> > > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > > > >> > > >>>>>> > > > > > > > >                           new
> > > > >> > > >>>>>> > > > > > > > >
> > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > > > >> > > >>>>>> > > > > > > > >      Put put = new
> > > > >> > > >>>>>> > > > > >
> > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > >> > > >>>>>> > > > > > > > >      ... // add values
> > > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Performing a range scan over written
> data
> > > > >> > > (internally
> > > > >> > > >>>>>> > > > > <bucketsCount>
> > > > >> > > >>>>>> > > > > > > > > scanners
> > > > >> > > >>>>>> > > > > > > > > executed):
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> > stopKey);
> > > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > > >> > > >>>>>> > > > > > > > > keyDistributor);
> > > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > > > >> > > >>>>>> > > > > > > > >      ...
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
> > data
> > > > >> chunk
> > > > >> > > >>>>>> specified by
> > > > >> > > >>>>>> > > > Scan:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > > > >> > HBaseConfiguration.create();
> > > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > > > "testMapreduceJob");
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> > stopKey);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >>  TableMapReduceUtil.initTableMapperJob("table",
> > > > >> > > >>>>>> scan,
> > > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > > > >> > > >>>>>> ImmutableBytesWritable.class,
> > > > >> > > >>>>>> > > > > > > Result.class,
> > > > >> > > >>>>>> > > > > > > > > job);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    // Substituting standard
> > > TableInputFormat
> > > > >> which
> > > > >> > > was
> > > > >> > > >>>>>> set in
> > > > >> > > >>>>>> > > > > > > > >    //
> > > > >> TableMapReduceUtil.initTableMapperJob(...)
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > > > >> > > >>>>>> > > > > > > > >
> > > > >>  keyDistributor.addInfo(job.getConfiguration());
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing
> Patterns:
> > > > >> > > >>>>>> > > > > > > > >
> -----------------------------------------
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and
> to
> > > > >> support
> > > > >> > > >>>>>> custom row
> > > > >> > > >>>>>> > > key
> > > > >> > > >>>>>> > > > > > > > > distribution
> > > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > > > >> distributing
> > > > >> > > >>>>>> logic just
> > > > >> > > >>>>>> > > > > > implement
> > > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract
> class
> > > > which
> > > > >> is
> > > > >> > > >>>>>> really very
> > > > >> > > >>>>>> > > > > simple:
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > > >    public abstract class
> > > > >> AbstractRowKeyDistributor
> > > > >> > > >>>>>> implements
> > > > >> > > >>>>>> > > > > > > > > Parametrizable {
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > > >> > getDistributedKey(byte[]
> > > > >> > > >>>>>> > > > originalKey);
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > > >> getOriginalKey(byte[]
> > > > >> > > >>>>>> > adjustedKey);
> > > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > > > >> > > >>>>>> getAllDistributedKeys(byte[]
> > > > >> > > >>>>>> > > > > > > originalKey);
> > > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > > > >> > > >>>>>> > > > > > > > >    }
> > > > >> > > >>>>>> > > > > > > > >
> > > > >> > > >>>>>> > > > > > > >
> > > > >> > > >>>>>> > > > > > >
> > > > >> > > >>>>>> > > > > >
> > > > >> > > >>>>>> > > > >
> > > > >> > > >>>>>> > > >
> > > > >> > > >>>>>> > >
> > > > >> > > >>>>>> >
> > > > >> > > >>>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>>
> > > > >> > > >>>>
> > > > >> > > >>>
> > > > >> > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

I have another question about option 2. It seems like I need to handle the
distributed scan differently to read from start row to end row, assuming 1
byte hash of the original key is used as prefix since the order of the
original key range is different from the resulting distributed key range.

On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:

> Alex:
> Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>
> Thanks
>
> On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > There are several options here. E.g.:
> >
> > 1) Given that you have "original key" of the record, you can fetch the
> > stored record key from HBase and use it to create Put with updated (or
> new)
> > cells.
> >
> > Currently you'll need to use distributes scan for that, there's not
> > analogue
> > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1
> ).
> >
> > Note: you need to first find out the real key of stored record by
> fetching
> > data from HBase in case you use included in current lib
> > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
> >
> > 2) You can create your own RowKeyDistributor implementation which will
> > create "distributed key" based on original key value so that later when
> you
> > have original key and want to update the record you can calculate
> > distributed key without roundtrip to HBase.
> >
> > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash
> of
> > original key (https://github.com/sematext/HBaseWD/issues/2).
> >
> >
> >
> > In either way you don't need to delete record to update some cells of it
> or
> > add new cells.
> >
> > Please let me know if you have more Qs!
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > I have another question. For overwriting, do I need to delete the
> > existing
> > > one before re-writing it?
> > >
> > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > > wrote:
> > >
> > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > > >
> > > >
> > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > >wrote:
> > > >
> > > >> Thanks for the interest!
> > > >>
> > > >> We are using it in production. It is simple and hence quite stable.
> > > Though
> > > >> some minor pieces are missing (like
> > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > > >> stability
> > > >> and/or major functionality.
> > > >>
> > > >> Alex Baranau
> > > >> ----
> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > >> HBase
> > > >>
> > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
> weishung@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > What's the status on this package? Is it mature enough?
> > > >> >  I am using it in my project, tried out the write method yesterday
> > and
> > > >> > going
> > > >> > to incorporate into read method tomorrow.
> > > >> >
> > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > >> > >wrote:
> > > >> >
> > > >> > > > The start/end rows may be written twice.
> > > >> > >
> > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > > "bearable"
> > > >> in
> > > >> > > attribute value no matter how long are they (keys), since we
> > already
> > > >> OK
> > > >> > > with
> > > >> > > transferring them initially (i.e. we should be OK with
> > transferring
> > > 2x
> > > >> > > times
> > > >> > > more).
> > > >> > >
> > > >> > > So, what about the suggestion of sourceScan attribute value I
> > > >> mentioned?
> > > >> > If
> > > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > > info
> > > >> to
> > > >> > > think about better suggestion ;)
> > > >> > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > >
> > > >> > > np. Can do that. Just thought that they (patches) can be sorted
> by
> > > >> date
> > > >> > to
> > > >> > > find out the final one (aka "convention over naming-rules").
> > > >> > >
> > > >> > > Alex.
> > > >> > >
> > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > >> Though it might be ok, since we anyways "transfer"
> start/stop
> > > >> rows
> > > >> > > with
> > > >> > > > Scan object.
> > > >> > > > In write() method, we now have:
> > > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > >> > > > ...
> > > >> > > >       for (Map.Entry<String, byte[]> attr :
> > > >> this.attributes.entrySet())
> > > >> > {
> > > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > >> > > >       }
> > > >> > > > The start/end rows may be written twice.
> > > >> > > >
> > > >> > > > Of course, you have full control over how to generate the
> unique
> > > ID
> > > >> for
> > > >> > > > "sourceScan" attribute.
> > > >> > > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> Maybe
> > > the
> > > >> > > second
> > > >> > > > should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > > >> > alex.baranov.v@gmail.com
> > > >> > > >wrote:
> > > >> > > >
> > > >> > > >> > Can you remove the first version ?
> > > >> > > >> Isn't it ok to keep it in JIRA issue?
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > > >> supports
> > > >> > > >> setAttribute() ?
> > > >> > > >> > If it does, can you encode start row and end row as
> > > "sourceScan"
> > > >> > > >> attribute ?
> > > >> > > >>
> > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> > still
> > > >> want
> > > >> > > to
> > > >> > > >> hear from the devs the story about Scan version.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>
> > > >> > > >> Yeah, that is was my though too at first. Though it might be
> > ok,
> > > >> since
> > > >> > > we
> > > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >> > > >>
> > > >> > > >> > What do you think ?
> > > >> > > >>
> > > >> > > >> I'd love to hear from you is this variant I mentioned is what
> > we
> > > >> are
> > > >> > > >> looking at here:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >> distributed scan.
> > > >> > > >> > I.e. group scans which were fired by single distributed
> scan.
> > > If
> > > >> > > that's
> > > >> > > >> what you want, distributed
> > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> > attribute
> > > >> to
> > > >> > its
> > > >> > > >> value. This way we'll
> > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
> <number
> > of
> > > >> > > >> distributed scans invoked by
> > > >> > > >> > client side> and two scans on server side will have the
> same
> > > >> > > >> "sourceScan" attribute iff they
> > > >> > > >> > "belong" to same distributed scan.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> Alex Baranau
> > > >> > > >> ----
> > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > Hadoop
> > > >> -
> > > >> > > >> HBase
> > > >> > > >>
> > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > >> wrote:
> > > >> > > >>
> > > >> > > >>> Alex:
> > > >> > > >>> Your second patch looks good.
> > > >> > > >>> Can you remove the first version ?
> > > >> > > >>>
> > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > > supports
> > > >> > > >>> setAttribute() ?
> > > >> > > >>> If it does, can you encode start row and end row as
> > "sourceScan"
> > > >> > > >>> attribute ?
> > > >> > > >>>
> > > >> > > >>> One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>> Ideally we should store hash code of source Scan object as
> > > >> > "sourceScan"
> > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> > it,
> > > >> that
> > > >> > > would
> > > >> > > >>> require running all Scan related tests.
> > > >> > > >>>
> > > >> > > >>> What do you think ?
> > > >> > > >>>
> > > >> > > >>> Thanks
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > >> > > alex.baranov.v@gmail.com>wrote:
> > > >> > > >>>
> > > >> > > >>>> Sorry for the delay in response (public holidays here).
> > > >> > > >>>>
> > > >> > > >>>> This depends on what info you are looking for on server
> side.
> > > >> > > >>>>
> > > >> > > >>>> From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >>>> distributed scan. I.e. group scans which were fired by
> single
> > > >> > > distributed
> > > >> > > >>>> scan. If that's what you want, distributed scan can
> generate
> > > >> unique
> > > >> > ID
> > > >> > > and
> > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
> we'll
> > > have
> > > >> <#
> > > >> > > of
> > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > > distributed
> > > >> > scans
> > > >> > > >>>> invoked by client side> and two scans on server side will
> > have
> > > >> the
> > > >> > > same
> > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
> distributed
> > > >> scan.
> > > >> > > >>>>
> > > >> > > >>>> Is this what are you looking for?
> > > >> > > >>>>
> > > >> > > >>>> Alex Baranau
> > > >> > > >>>>
> > > >> > > >>>> P.S. attached patch for HBASE-3811<
> > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > >> > > >>>> .
> > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > >> > > >>>>
> > > >> > > >>>> ----
> > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > > >> Hadoop
> > > >> > -
> > > >> > > >>>> HBase
> > > >> > > >>>>
> > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >>>>
> > > >> > > >>>>> Alex:
> > > >> > > >>>>> What type of identification should we put in the map of
> the
> > > Scan
> > > >> > > object
> > > >> > > >>>>> ?
> > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
> the
> > > user
> > > >> > can
> > > >> > > >>>>> use same distributor on multiple scans.
> > > >> > > >>>>>
> > > >> > > >>>>> Please share your thought.
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > > >> > > >>>>>
> > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > >> > > >>>>>>
> > > >> > > >>>>>> Alex Baranau
> > > >> > > >>>>>> ----
> > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > >> > Hadoop
> > > >> > > -
> > > >> > > >>>>>> HBase
> > > >> > > >>>>>>
> > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >>>>>>
> > > >> > > >>>>>> > My plan was to make regions that have active scanners
> > more
> > > >> > stable
> > > >> > > -
> > > >> > > >>>>>> trying
> > > >> > > >>>>>> > not to move them when balancing.
> > > >> > > >>>>>> > I prefer second approach - adding custom attribute(s)
> to
> > > Scan
> > > >> so
> > > >> > > >>>>>> that the
> > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > >> > > >>>>>> alex.baranov.v@gmail.com
> > > >> > > >>>>>> > >wrote:
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
> just
> > > >> > > >>>>>> differently) when
> > > >> > > >>>>>> > > determining the load?
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > The current code looks like this:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > class DistributedScanner:
> > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
> hTable,
> > > >> Scan
> > > >> > > >>>>>> original,
> > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > > >> IOException {
> > > >> > > >>>>>> > >    byte[][] startKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > >> > > >>>>>> > >    byte[][] stopKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > > >> > > >>>>>> > >    }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > >    ResultScanner[] rss = new
> > > >> ResultScanner[startKeys.length];
> > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > > >> > > >>>>>> > >    }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > > >> > > >>>>>> > >  }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > This is client code. To make these scans
> "identifiable"
> > > we
> > > >> > need
> > > >> > > to
> > > >> > > >>>>>> either
> > > >> > > >>>>>> > > use some different (derived from Scan) class or add
> > some
> > > >> > > attribute
> > > >> > > >>>>>> to
> > > >> > > >>>>>> > them.
> > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
> the
> > > >> former,
> > > >> > > but
> > > >> > > >>>>>> I
> > > >> > > >>>>>> > don't
> > > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > > extra
> > > >> > > >>>>>> > functionality)
> > > >> > > >>>>>> > > just to distinguish it from the base one.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > > >> differently
> > > >> > > on
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side, that would be helpful.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > Alex Baranau
> > > >> > > >>>>>> > > ----
> > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > Nutch
> > > >> -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > HBase
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > > >> yuzhihong@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > > My request would be to make the distributed scan
> > > >> > identifiable
> > > >> > > >>>>>> from
> > > >> > > >>>>>> > server
> > > >> > > >>>>>> > > > side.
> > > >> > > >>>>>> > > > :-)
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > >> > > >>>>>> > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > >wrote:
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
> holds
> > > data
> > > >> > for
> > > >> > > >>>>>> the whole
> > > >> > > >>>>>> > > > table
> > > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > > will
> > > >> > fire
> > > >> > > N
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > > > against
> > > >> > > >>>>>> > > > > the same region.
> > > >> > > >>>>>> > > > > On the other hand, in case there are huge number
> of
> > > >> > regions
> > > >> > > >>>>>> for
> > > >> > > >>>>>> > single
> > > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > With current implementation "distributed" scan
> > won't
> > > be
> > > >> > > >>>>>> recognized as
> > > >> > > >>>>>> > > > > something special on the server side. It will be
> an
> > > >> > ordinary
> > > >> > > >>>>>> scan.
> > > >> > > >>>>>> > > Though
> > > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > > >> typical
> > > >> > > >>>>>> situation is
> > > >> > > >>>>>> > > > "many
> > > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > > >> > > "distributed
> > > >> > > >>>>>> scan"
> > > >> > > >>>>>> > are
> > > >> > > >>>>>> > > > > likely not to hit the same region.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> > free
> > > >> to
> > > >> > ask
> > > >> > > >>>>>> more ;)
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Alex Baranau
> > > >> > > >>>>>> > > > > ----
> > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> > -
> > > >> Nutch
> > > >> > -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > > > HBase
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > >> > > yuzhihong@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > Alex:
> > > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > >> > > >>>>>> > > > > >
> https://issues.apache.org/jira/browse/HBASE-3679
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > Cheers
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau
> <
> > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > > > >wrote:
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > > Hi Ted,
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> > where
> > > >> data
> > > >> > is
> > > >> > > >>>>>> consumed
> > > >> > > >>>>>> > > by
> > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > > >> performance
> > > >> > of
> > > >> > > >>>>>> pure
> > > >> > > >>>>>> > > > > "distributed
> > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> > expect
> > > >> it
> > > >> > to
> > > >> > > be
> > > >> > > >>>>>> close
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > > > simple
> > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> > faster
> > > >> > > >>>>>> depending on
> > > >> > > >>>>>> > your
> > > >> > > >>>>>> > > > > data
> > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > > timeseries
> > > >> > data
> > > >> > > >>>>>> > > (sequential)
> > > >> > > >>>>>> > > > > > which
> > > >> > > >>>>>> > > > > > > is written into the single region at a time,
> > then
> > > >> e.g.
> > > >> > > if
> > > >> > > >>>>>> you
> > > >> > > >>>>>> > > access
> > > >> > > >>>>>> > > > > > delta
> > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> > not
> > > >> > single
> > > >> > > >>>>>> client)
> > > >> > > >>>>>> > > > these
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > are likely to hit the same region or couple
> of
> > > >> regions
> > > >> > > at
> > > >> > > >>>>>> a time,
> > > >> > > >>>>>> > > > which
> > > >> > > >>>>>> > > > > > may
> > > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > > data
> > > >> > that
> > > >> > > is
> > > >> > > >>>>>> much
> > > >> > > >>>>>> > > > better
> > > >> > > >>>>>> > > > > > > spread over region servers.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > > >> affect
> > > >> > > >>>>>> reading
> > > >> > > >>>>>> > > > > performance
> > > >> > > >>>>>> > > > > > at
> > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> > times
> > > >> more
> > > >> > > >>>>>> splits and
> > > >> > > >>>>>> > > > hence
> > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> > cases
> > > >> this
> > > >> > > even
> > > >> > > >>>>>> > improves
> > > >> > > >>>>>> > > > > > overall
> > > >> > > >>>>>> > > > > > > performance of the MR job since work is
> better
> > > >> > > distributed
> > > >> > > >>>>>> over
> > > >> > > >>>>>> > > > cluster
> > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> > constantly
> > > >> > process
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > coming
> > > >> > > >>>>>> > > > > delta
> > > >> > > >>>>>> > > > > > > which usually resides in one or just couple
> of
> > > >> regions
> > > >> > > >>>>>> depending
> > > >> > > >>>>>> > on
> > > >> > > >>>>>> > > > > > > processing frequency).
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > If you can share details on your case, that
> > will
> > > >> help
> > > >> > to
> > > >> > > >>>>>> > understand
> > > >> > > >>>>>> > > > > what
> > > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > ----
> > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > > Lucene
> > > >> -
> > > >> > > Nutch
> > > >> > > >>>>>> -
> > > >> > > >>>>>> > Hadoop
> > > >> > > >>>>>> > > -
> > > >> > > >>>>>> > > > > > HBase
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > >> > > >>>>>> yuzhihong@gmail.com>
> > > >> > > >>>>>> > > wrote:
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
> compared
> > > to
> > > >> one
> > > >> > > >>>>>> scanner
> > > >> > > >>>>>> > > > > > originally,
> > > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > > impact
> > > >> ?
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > Thanks
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> > Baranau
> > > <
> > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > > > > > >wrote:
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Hello guys,
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > > >> project/lib
> > > >> > > >>>>>> around
> > > >> > > >>>>>> > > HBase:
> > > >> > > >>>>>> > > > > > > HBaseWD.
> > > >> > > >>>>>> > > > > > > > > It
> > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > > load
> > > >> > > (across
> > > >> > > >>>>>> > > > > regionservers)
> > > >> > > >>>>>> > > > > > > > when
> > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
> key
> > > >> nature)
> > > >> > > >>>>>> records.
> > > >> > > >>>>>> > It
> > > >> > > >>>>>> > > > > > > implements
> > > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > > times
> > > >> on
> > > >> > > this
> > > >> > > >>>>>> > mailing
> > > >> > > >>>>>> > > > list
> > > >> > > >>>>>> > > > > > > (e.g.
> > > >> > > >>>>>> > > > > > > > > here:
> > http://search-hadoop.com/m/gNRA82No5Wk
> > > ).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find the sources at
> > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > >> > > >>>>>> > > > > > > > > also
> > > >> > > >>>>>> > > > > > > > > a jar of current version for
> convenience).
> > It
> > > >> is
> > > >> > > very
> > > >> > > >>>>>> easy to
> > > >> > > >>>>>> > > > make
> > > >> > > >>>>>> > > > > > use
> > > >> > > >>>>>> > > > > > > of
> > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
> project
> > > >> with
> > > >> > 1+2
> > > >> > > >>>>>> lines of
> > > >> > > >>>>>> > > > code
> > > >> > > >>>>>> > > > > > (one
> > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> > configuring
> > > >> > > MapReduce
> > > >> > > >>>>>> job).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> > lib
> > > >> [1].
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > > > ----
> > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr
> -
> > > >> Lucene
> > > >> > -
> > > >> > > >>>>>> Nutch -
> > > >> > > >>>>>> > > > Hadoop
> > > >> > > >>>>>> > > > > -
> > > >> > > >>>>>> > > > > > > > HBase
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > [1]
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Description:
> > > >> > > >>>>>> > > > > > > > > ------------
> > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> > (sequential)
> > > >> > Writes.
> > > >> > > >>>>>> It was
> > > >> > > >>>>>> > > > > inspired
> > > >> > > >>>>>> > > > > > by
> > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> > the
> > > >> > > problem
> > > >> > > >>>>>> of
> > > >> > > >>>>>> > > choosing
> > > >> > > >>>>>> > > > > > > > between:
> > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
> keys
> > > >> (e.g.
> > > >> > > >>>>>> time-series
> > > >> > > >>>>>> > > data
> > > >> > > >>>>>> > > > > > with
> > > >> > > >>>>>> > > > > > > > row
> > > >> > > >>>>>> > > > > > > > > key
> > > >> > > >>>>>> > > > > > > > >  built based on ts)
> > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> > fast
> > > >> > range
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > with
> > > >> > > >>>>>> > > > help
> > > >> > > >>>>>> > > > > > of
> > > >> > > >>>>>> > > > > > > > > setting
> > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > > single
> > > >> > > region
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > > > > > hot-spotting
> > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> > in
> > > >> > > sequence
> > > >> > > >>>>>> all
> > > >> > > >>>>>> > > records
> > > >> > > >>>>>> > > > > end
> > > >> > > >>>>>> > > > > > > up
> > > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > > >> > performance
> > > >> > > >>>>>> by
> > > >> > > >>>>>> > > > > distributing
> > > >> > > >>>>>> > > > > > > new
> > > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > > >> possible
> > > >> > > >>>>>> doing fast
> > > >> > > >>>>>> > > > range
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > > > against written data.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
> middle
> > of
> > > >> the
> > > >> > > two
> > > >> > > >>>>>> above
> > > >> > > >>>>>> > and
> > > >> > > >>>>>> > > > > > proved
> > > >> > > >>>>>> > > > > > > to
> > > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> > the
> > > >> > > cluster
> > > >> > > >>>>>> during
> > > >> > > >>>>>> > > > > writing
> > > >> > > >>>>>> > > > > > > data
> > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
> HBaseWD
> > > >> > provides
> > > >> > > >>>>>> very
> > > >> > > >>>>>> > > simple
> > > >> > > >>>>>> > > > > API
> > > >> > > >>>>>> > > > > > to
> > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> > with
> > > >> > > existing
> > > >> > > >>>>>> code.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> > info
> > > >> as
> > > >> > > they
> > > >> > > >>>>>> aimed
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > act
> > > >> > > >>>>>> > > > > as
> > > >> > > >>>>>> > > > > > > > > example.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > >> > > >>>>>> > > > > > > > > ----------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > > which
> > > >> > are
> > > >> > > >>>>>> being
> > > >> > > >>>>>> > > written
> > > >> > > >>>>>> > > > > in
> > > >> > > >>>>>> > > > > > up
> > > >> > > >>>>>> > > > > > > > to
> > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > > >> distributing
> > > >> > > into
> > > >> > > >>>>>> 32
> > > >> > > >>>>>> > > buckets
> > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > > >> > > >>>>>> > > > > > > > >                           new
> > > >> > > >>>>>> > > > > > > > >
> > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > > >> > > >>>>>> > > > > > > > >      Put put = new
> > > >> > > >>>>>> > > > > >
> > Put(keyDistributor.getDistributedKey(originalKey));
> > > >> > > >>>>>> > > > > > > > >      ... // add values
> > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > > >> > > (internally
> > > >> > > >>>>>> > > > > <bucketsCount>
> > > >> > > >>>>>> > > > > > > > > scanners
> > > >> > > >>>>>> > > > > > > > > executed):
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > >> > > >>>>>> > > > > > > > > keyDistributor);
> > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > > >> > > >>>>>> > > > > > > > >      ...
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
> data
> > > >> chunk
> > > >> > > >>>>>> specified by
> > > >> > > >>>>>> > > > Scan:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > > >> > HBaseConfiguration.create();
> > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > > "testMapreduceJob");
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >>  TableMapReduceUtil.initTableMapperJob("table",
> > > >> > > >>>>>> scan,
> > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > > >> > > >>>>>> ImmutableBytesWritable.class,
> > > >> > > >>>>>> > > > > > > Result.class,
> > > >> > > >>>>>> > > > > > > > > job);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    // Substituting standard
> > TableInputFormat
> > > >> which
> > > >> > > was
> > > >> > > >>>>>> set in
> > > >> > > >>>>>> > > > > > > > >    //
> > > >> TableMapReduceUtil.initTableMapperJob(...)
> > > >> > > >>>>>> > > > > > > > >
> > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > > >> > > >>>>>> > > > > > > > >
> > > >>  keyDistributor.addInfo(job.getConfiguration());
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > > >> support
> > > >> > > >>>>>> custom row
> > > >> > > >>>>>> > > key
> > > >> > > >>>>>> > > > > > > > > distribution
> > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > > >> distributing
> > > >> > > >>>>>> logic just
> > > >> > > >>>>>> > > > > > implement
> > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > > which
> > > >> is
> > > >> > > >>>>>> really very
> > > >> > > >>>>>> > > > > simple:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    public abstract class
> > > >> AbstractRowKeyDistributor
> > > >> > > >>>>>> implements
> > > >> > > >>>>>> > > > > > > > > Parametrizable {
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > >> > getDistributedKey(byte[]
> > > >> > > >>>>>> > > > originalKey);
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > >> getOriginalKey(byte[]
> > > >> > > >>>>>> > adjustedKey);
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > > >> > > >>>>>> getAllDistributedKeys(byte[]
> > > >> > > >>>>>> > > > > > > originalKey);
> > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> >
> > > >> > > >>>>>>
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>
> > > >> > > >>>
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

I have another question about option 2. It seems like I need to handle the
distributed scan differently to read from start row to end row, assuming 1
byte hash of the original key is used as prefix since the order of the
original key range is different from the resulting distributed key range.

On Wed, May 18, 2011 at 6:18 PM, Ted Yu <yu...@gmail.com> wrote:

> Alex:
> Can you summarize HBaseWD in your blog, including points 1 and 2 below ?
>
> Thanks
>
> On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > There are several options here. E.g.:
> >
> > 1) Given that you have "original key" of the record, you can fetch the
> > stored record key from HBase and use it to create Put with updated (or
> new)
> > cells.
> >
> > Currently you'll need to use distributes scan for that, there's not
> > analogue
> > for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1
> ).
> >
> > Note: you need to first find out the real key of stored record by
> fetching
> > data from HBase in case you use included in current lib
> > RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
> >
> > 2) You can create your own RowKeyDistributor implementation which will
> > create "distributed key" based on original key value so that later when
> you
> > have original key and want to update the record you can calculate
> > distributed key without roundtrip to HBase.
> >
> > E.g. your RowKeyDistributor implementation you can calculate 1-byte hash
> of
> > original key (https://github.com/sematext/HBaseWD/issues/2).
> >
> >
> >
> > In either way you don't need to delete record to update some cells of it
> or
> > add new cells.
> >
> > Please let me know if you have more Qs!
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > I have another question. For overwriting, do I need to delete the
> > existing
> > > one before re-writing it?
> > >
> > > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > > wrote:
> > >
> > > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > > >
> > > >
> > > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > >wrote:
> > > >
> > > >> Thanks for the interest!
> > > >>
> > > >> We are using it in production. It is simple and hence quite stable.
> > > Though
> > > >> some minor pieces are missing (like
> > > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > > >> stability
> > > >> and/or major functionality.
> > > >>
> > > >> Alex Baranau
> > > >> ----
> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > >> HBase
> > > >>
> > > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <
> weishung@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > What's the status on this package? Is it mature enough?
> > > >> >  I am using it in my project, tried out the write method yesterday
> > and
> > > >> > going
> > > >> > to incorporate into read method tomorrow.
> > > >> >
> > > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > >> > >wrote:
> > > >> >
> > > >> > > > The start/end rows may be written twice.
> > > >> > >
> > > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > > "bearable"
> > > >> in
> > > >> > > attribute value no matter how long are they (keys), since we
> > already
> > > >> OK
> > > >> > > with
> > > >> > > transferring them initially (i.e. we should be OK with
> > transferring
> > > 2x
> > > >> > > times
> > > >> > > more).
> > > >> > >
> > > >> > > So, what about the suggestion of sourceScan attribute value I
> > > >> mentioned?
> > > >> > If
> > > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > > info
> > > >> to
> > > >> > > think about better suggestion ;)
> > > >> > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > >
> > > >> > > np. Can do that. Just thought that they (patches) can be sorted
> by
> > > >> date
> > > >> > to
> > > >> > > find out the final one (aka "convention over naming-rules").
> > > >> > >
> > > >> > > Alex.
> > > >> > >
> > > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > >> Though it might be ok, since we anyways "transfer"
> start/stop
> > > >> rows
> > > >> > > with
> > > >> > > > Scan object.
> > > >> > > > In write() method, we now have:
> > > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > >> > > > ...
> > > >> > > >       for (Map.Entry<String, byte[]> attr :
> > > >> this.attributes.entrySet())
> > > >> > {
> > > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > >> > > >       }
> > > >> > > > The start/end rows may be written twice.
> > > >> > > >
> > > >> > > > Of course, you have full control over how to generate the
> unique
> > > ID
> > > >> for
> > > >> > > > "sourceScan" attribute.
> > > >> > > >
> > > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> Maybe
> > > the
> > > >> > > second
> > > >> > > > should be named HBASE-3811-v2.patch<
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >> > > >?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > >
> > > >> > > >
> > > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > > >> > alex.baranov.v@gmail.com
> > > >> > > >wrote:
> > > >> > > >
> > > >> > > >> > Can you remove the first version ?
> > > >> > > >> Isn't it ok to keep it in JIRA issue?
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > > >> supports
> > > >> > > >> setAttribute() ?
> > > >> > > >> > If it does, can you encode start row and end row as
> > > "sourceScan"
> > > >> > > >> attribute ?
> > > >> > > >>
> > > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> > still
> > > >> want
> > > >> > > to
> > > >> > > >> hear from the devs the story about Scan version.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>
> > > >> > > >> Yeah, that is was my though too at first. Though it might be
> > ok,
> > > >> since
> > > >> > > we
> > > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >> > > >>
> > > >> > > >> > What do you think ?
> > > >> > > >>
> > > >> > > >> I'd love to hear from you is this variant I mentioned is what
> > we
> > > >> are
> > > >> > > >> looking at here:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> > From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >> distributed scan.
> > > >> > > >> > I.e. group scans which were fired by single distributed
> scan.
> > > If
> > > >> > > that's
> > > >> > > >> what you want, distributed
> > > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> > attribute
> > > >> to
> > > >> > its
> > > >> > > >> value. This way we'll
> > > >> > > >> > have <# of distinct "sourceScan" attribute values> =
> <number
> > of
> > > >> > > >> distributed scans invoked by
> > > >> > > >> > client side> and two scans on server side will have the
> same
> > > >> > > >> "sourceScan" attribute iff they
> > > >> > > >> > "belong" to same distributed scan.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >> Alex Baranau
> > > >> > > >> ----
> > > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > Hadoop
> > > >> -
> > > >> > > >> HBase
> > > >> > > >>
> > > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yuzhihong@gmail.com
> >
> > > >> wrote:
> > > >> > > >>
> > > >> > > >>> Alex:
> > > >> > > >>> Your second patch looks good.
> > > >> > > >>> Can you remove the first version ?
> > > >> > > >>>
> > > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > > supports
> > > >> > > >>> setAttribute() ?
> > > >> > > >>> If it does, can you encode start row and end row as
> > "sourceScan"
> > > >> > > >>> attribute ?
> > > >> > > >>>
> > > >> > > >>> One consideration is that start row or end row may be quite
> > > long.
> > > >> > > >>> Ideally we should store hash code of source Scan object as
> > > >> > "sourceScan"
> > > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> > it,
> > > >> that
> > > >> > > would
> > > >> > > >>> require running all Scan related tests.
> > > >> > > >>>
> > > >> > > >>> What do you think ?
> > > >> > > >>>
> > > >> > > >>> Thanks
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > >> > > alex.baranov.v@gmail.com>wrote:
> > > >> > > >>>
> > > >> > > >>>> Sorry for the delay in response (public holidays here).
> > > >> > > >>>>
> > > >> > > >>>> This depends on what info you are looking for on server
> side.
> > > >> > > >>>>
> > > >> > > >>>> From what I understand, you want to distinguish scans fired
> > by
> > > >> the
> > > >> > > same
> > > >> > > >>>> distributed scan. I.e. group scans which were fired by
> single
> > > >> > > distributed
> > > >> > > >>>> scan. If that's what you want, distributed scan can
> generate
> > > >> unique
> > > >> > ID
> > > >> > > and
> > > >> > > >>>> set, say "sourceScan" attribute to its value. This way
> we'll
> > > have
> > > >> <#
> > > >> > > of
> > > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > > distributed
> > > >> > scans
> > > >> > > >>>> invoked by client side> and two scans on server side will
> > have
> > > >> the
> > > >> > > same
> > > >> > > >>>> "sourceScan" attribute iff they "belong" to same
> distributed
> > > >> scan.
> > > >> > > >>>>
> > > >> > > >>>> Is this what are you looking for?
> > > >> > > >>>>
> > > >> > > >>>> Alex Baranau
> > > >> > > >>>>
> > > >> > > >>>> P.S. attached patch for HBASE-3811<
> > > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > >> > > >>>> .
> > > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > >> > > >>>>
> > > >> > > >>>> ----
> > > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > > >> Hadoop
> > > >> > -
> > > >> > > >>>> HBase
> > > >> > > >>>>
> > > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > >> > wrote:
> > > >> > > >>>>
> > > >> > > >>>>> Alex:
> > > >> > > >>>>> What type of identification should we put in the map of
> the
> > > Scan
> > > >> > > object
> > > >> > > >>>>> ?
> > > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But
> the
> > > user
> > > >> > can
> > > >> > > >>>>> use same distributor on multiple scans.
> > > >> > > >>>>>
> > > >> > > >>>>> Please share your thought.
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > > >> > > >>>>>
> > > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > >> > > >>>>>>
> > > >> > > >>>>>> Alex Baranau
> > > >> > > >>>>>> ----
> > > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > >> > Hadoop
> > > >> > > -
> > > >> > > >>>>>> HBase
> > > >> > > >>>>>>
> > > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> > yuzhihong@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >>>>>>
> > > >> > > >>>>>> > My plan was to make regions that have active scanners
> > more
> > > >> > stable
> > > >> > > -
> > > >> > > >>>>>> trying
> > > >> > > >>>>>> > not to move them when balancing.
> > > >> > > >>>>>> > I prefer second approach - adding custom attribute(s)
> to
> > > Scan
> > > >> so
> > > >> > > >>>>>> that the
> > > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > >> > > >>>>>> alex.baranov.v@gmail.com
> > > >> > > >>>>>> > >wrote:
> > > >> > > >>>>>> >
> > > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or
> just
> > > >> > > >>>>>> differently) when
> > > >> > > >>>>>> > > determining the load?
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > The current code looks like this:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > class DistributedScanner:
> > > >> > > >>>>>> > >  public static DistributedScanner create(HTable
> hTable,
> > > >> Scan
> > > >> > > >>>>>> original,
> > > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > > >> IOException {
> > > >> > > >>>>>> > >    byte[][] startKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > >> > > >>>>>> > >    byte[][] stopKeys =
> > > >> > > >>>>>> > >
> > > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > > >> > > >>>>>> > >    }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > >    ResultScanner[] rss = new
> > > >> ResultScanner[startKeys.length];
> > > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > > >> > > >>>>>> > >    }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > > >> > > >>>>>> > >  }
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > This is client code. To make these scans
> "identifiable"
> > > we
> > > >> > need
> > > >> > > to
> > > >> > > >>>>>> either
> > > >> > > >>>>>> > > use some different (derived from Scan) class or add
> > some
> > > >> > > attribute
> > > >> > > >>>>>> to
> > > >> > > >>>>>> > them.
> > > >> > > >>>>>> > > There's no API for doing the latter. But we can do
> the
> > > >> former,
> > > >> > > but
> > > >> > > >>>>>> I
> > > >> > > >>>>>> > don't
> > > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > > extra
> > > >> > > >>>>>> > functionality)
> > > >> > > >>>>>> > > just to distinguish it from the base one.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > > >> differently
> > > >> > > on
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side, that would be helpful.
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > Alex Baranau
> > > >> > > >>>>>> > > ----
> > > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > Nutch
> > > >> -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > HBase
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > > >> yuzhihong@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> > > > My request would be to make the distributed scan
> > > >> > identifiable
> > > >> > > >>>>>> from
> > > >> > > >>>>>> > server
> > > >> > > >>>>>> > > > side.
> > > >> > > >>>>>> > > > :-)
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > >> > > >>>>>> > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > >wrote:
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > True: e.g. when there's only one region that
> holds
> > > data
> > > >> > for
> > > >> > > >>>>>> the whole
> > > >> > > >>>>>> > > > table
> > > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > > will
> > > >> > fire
> > > >> > > N
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > > > against
> > > >> > > >>>>>> > > > > the same region.
> > > >> > > >>>>>> > > > > On the other hand, in case there are huge number
> of
> > > >> > regions
> > > >> > > >>>>>> for
> > > >> > > >>>>>> > single
> > > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > With current implementation "distributed" scan
> > won't
> > > be
> > > >> > > >>>>>> recognized as
> > > >> > > >>>>>> > > > > something special on the server side. It will be
> an
> > > >> > ordinary
> > > >> > > >>>>>> scan.
> > > >> > > >>>>>> > > Though
> > > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > > >> typical
> > > >> > > >>>>>> situation is
> > > >> > > >>>>>> > > > "many
> > > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > > >> > > "distributed
> > > >> > > >>>>>> scan"
> > > >> > > >>>>>> > are
> > > >> > > >>>>>> > > > > likely not to hit the same region.
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> > free
> > > >> to
> > > >> > ask
> > > >> > > >>>>>> more ;)
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > Alex Baranau
> > > >> > > >>>>>> > > > > ----
> > > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> > -
> > > >> Nutch
> > > >> > -
> > > >> > > >>>>>> Hadoop -
> > > >> > > >>>>>> > > > HBase
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > >> > > yuzhihong@gmail.com>
> > > >> > > >>>>>> wrote:
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > > > > Alex:
> > > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > >> > > >>>>>> > > > > >
> https://issues.apache.org/jira/browse/HBASE-3679
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > I need to deal with normal scan and
> "distributed
> > > >> scan"
> > > >> > at
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > side.
> > > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > > >> regions
> > > >> > for
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > > > underlying
> > > >> > > >>>>>> > > > > > table.
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > Cheers
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau
> <
> > > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > > > >wrote:
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > > > > Hi Ted,
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> > where
> > > >> data
> > > >> > is
> > > >> > > >>>>>> consumed
> > > >> > > >>>>>> > > by
> > > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > > >> performance
> > > >> > of
> > > >> > > >>>>>> pure
> > > >> > > >>>>>> > > > > "distributed
> > > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> > expect
> > > >> it
> > > >> > to
> > > >> > > be
> > > >> > > >>>>>> close
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > > > simple
> > > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> > faster
> > > >> > > >>>>>> depending on
> > > >> > > >>>>>> > your
> > > >> > > >>>>>> > > > > data
> > > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > > timeseries
> > > >> > data
> > > >> > > >>>>>> > > (sequential)
> > > >> > > >>>>>> > > > > > which
> > > >> > > >>>>>> > > > > > > is written into the single region at a time,
> > then
> > > >> e.g.
> > > >> > > if
> > > >> > > >>>>>> you
> > > >> > > >>>>>> > > access
> > > >> > > >>>>>> > > > > > delta
> > > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> > not
> > > >> > single
> > > >> > > >>>>>> client)
> > > >> > > >>>>>> > > > these
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > are likely to hit the same region or couple
> of
> > > >> regions
> > > >> > > at
> > > >> > > >>>>>> a time,
> > > >> > > >>>>>> > > > which
> > > >> > > >>>>>> > > > > > may
> > > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > > data
> > > >> > that
> > > >> > > is
> > > >> > > >>>>>> much
> > > >> > > >>>>>> > > > better
> > > >> > > >>>>>> > > > > > > spread over region servers.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > > >> affect
> > > >> > > >>>>>> reading
> > > >> > > >>>>>> > > > > performance
> > > >> > > >>>>>> > > > > > at
> > > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> > times
> > > >> more
> > > >> > > >>>>>> splits and
> > > >> > > >>>>>> > > > hence
> > > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> > cases
> > > >> this
> > > >> > > even
> > > >> > > >>>>>> > improves
> > > >> > > >>>>>> > > > > > overall
> > > >> > > >>>>>> > > > > > > performance of the MR job since work is
> better
> > > >> > > distributed
> > > >> > > >>>>>> over
> > > >> > > >>>>>> > > > cluster
> > > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> > constantly
> > > >> > process
> > > >> > > >>>>>> the
> > > >> > > >>>>>> > coming
> > > >> > > >>>>>> > > > > delta
> > > >> > > >>>>>> > > > > > > which usually resides in one or just couple
> of
> > > >> regions
> > > >> > > >>>>>> depending
> > > >> > > >>>>>> > on
> > > >> > > >>>>>> > > > > > > processing frequency).
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > If you can share details on your case, that
> > will
> > > >> help
> > > >> > to
> > > >> > > >>>>>> > understand
> > > >> > > >>>>>> > > > > what
> > > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > ----
> > > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > > Lucene
> > > >> -
> > > >> > > Nutch
> > > >> > > >>>>>> -
> > > >> > > >>>>>> > Hadoop
> > > >> > > >>>>>> > > -
> > > >> > > >>>>>> > > > > > HBase
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > >> > > >>>>>> yuzhihong@gmail.com>
> > > >> > > >>>>>> > > wrote:
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners
> compared
> > > to
> > > >> one
> > > >> > > >>>>>> scanner
> > > >> > > >>>>>> > > > > > originally,
> > > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > > impact
> > > >> ?
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > Thanks
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> > Baranau
> > > <
> > > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > > >> > > >>>>>> > > > > > > > >wrote:
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Hello guys,
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > > >> project/lib
> > > >> > > >>>>>> around
> > > >> > > >>>>>> > > HBase:
> > > >> > > >>>>>> > > > > > > HBaseWD.
> > > >> > > >>>>>> > > > > > > > > It
> > > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > > load
> > > >> > > (across
> > > >> > > >>>>>> > > > > regionservers)
> > > >> > > >>>>>> > > > > > > > when
> > > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row
> key
> > > >> nature)
> > > >> > > >>>>>> records.
> > > >> > > >>>>>> > It
> > > >> > > >>>>>> > > > > > > implements
> > > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > > times
> > > >> on
> > > >> > > this
> > > >> > > >>>>>> > mailing
> > > >> > > >>>>>> > > > list
> > > >> > > >>>>>> > > > > > > (e.g.
> > > >> > > >>>>>> > > > > > > > > here:
> > http://search-hadoop.com/m/gNRA82No5Wk
> > > ).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find the sources at
> > > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > >> > > >>>>>> > > > > > > > > also
> > > >> > > >>>>>> > > > > > > > > a jar of current version for
> convenience).
> > It
> > > >> is
> > > >> > > very
> > > >> > > >>>>>> easy to
> > > >> > > >>>>>> > > > make
> > > >> > > >>>>>> > > > > > use
> > > >> > > >>>>>> > > > > > > of
> > > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing
> project
> > > >> with
> > > >> > 1+2
> > > >> > > >>>>>> lines of
> > > >> > > >>>>>> > > > code
> > > >> > > >>>>>> > > > > > (one
> > > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> > configuring
> > > >> > > MapReduce
> > > >> > > >>>>>> job).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> > lib
> > > >> [1].
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Alex Baranau
> > > >> > > >>>>>> > > > > > > > > ----
> > > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr
> -
> > > >> Lucene
> > > >> > -
> > > >> > > >>>>>> Nutch -
> > > >> > > >>>>>> > > > Hadoop
> > > >> > > >>>>>> > > > > -
> > > >> > > >>>>>> > > > > > > > HBase
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > [1]
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Description:
> > > >> > > >>>>>> > > > > > > > > ------------
> > > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> > (sequential)
> > > >> > Writes.
> > > >> > > >>>>>> It was
> > > >> > > >>>>>> > > > > inspired
> > > >> > > >>>>>> > > > > > by
> > > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> > the
> > > >> > > problem
> > > >> > > >>>>>> of
> > > >> > > >>>>>> > > choosing
> > > >> > > >>>>>> > > > > > > > between:
> > > >> > > >>>>>> > > > > > > > > * writing records with sequential row
> keys
> > > >> (e.g.
> > > >> > > >>>>>> time-series
> > > >> > > >>>>>> > > data
> > > >> > > >>>>>> > > > > > with
> > > >> > > >>>>>> > > > > > > > row
> > > >> > > >>>>>> > > > > > > > > key
> > > >> > > >>>>>> > > > > > > > >  built based on ts)
> > > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> > fast
> > > >> > range
> > > >> > > >>>>>> scans
> > > >> > > >>>>>> > with
> > > >> > > >>>>>> > > > help
> > > >> > > >>>>>> > > > > > of
> > > >> > > >>>>>> > > > > > > > > setting
> > > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > > single
> > > >> > > region
> > > >> > > >>>>>> server
> > > >> > > >>>>>> > > > > > > hot-spotting
> > > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> > in
> > > >> > > sequence
> > > >> > > >>>>>> all
> > > >> > > >>>>>> > > records
> > > >> > > >>>>>> > > > > end
> > > >> > > >>>>>> > > > > > > up
> > > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > > >> > performance
> > > >> > > >>>>>> by
> > > >> > > >>>>>> > > > > distributing
> > > >> > > >>>>>> > > > > > > new
> > > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > > >> possible
> > > >> > > >>>>>> doing fast
> > > >> > > >>>>>> > > > range
> > > >> > > >>>>>> > > > > > > scans
> > > >> > > >>>>>> > > > > > > > > against written data.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > The suggested approach stays in the
> middle
> > of
> > > >> the
> > > >> > > two
> > > >> > > >>>>>> above
> > > >> > > >>>>>> > and
> > > >> > > >>>>>> > > > > > proved
> > > >> > > >>>>>> > > > > > > to
> > > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> > the
> > > >> > > cluster
> > > >> > > >>>>>> during
> > > >> > > >>>>>> > > > > writing
> > > >> > > >>>>>> > > > > > > data
> > > >> > > >>>>>> > > > > > > > > while allowing range scans over it.
> HBaseWD
> > > >> > provides
> > > >> > > >>>>>> very
> > > >> > > >>>>>> > > simple
> > > >> > > >>>>>> > > > > API
> > > >> > > >>>>>> > > > > > to
> > > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> > with
> > > >> > > existing
> > > >> > > >>>>>> code.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> > info
> > > >> as
> > > >> > > they
> > > >> > > >>>>>> aimed
> > > >> > > >>>>>> > to
> > > >> > > >>>>>> > > > act
> > > >> > > >>>>>> > > > > as
> > > >> > > >>>>>> > > > > > > > > example.
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > >> > > >>>>>> > > > > > > > > ----------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > > which
> > > >> > are
> > > >> > > >>>>>> being
> > > >> > > >>>>>> > > written
> > > >> > > >>>>>> > > > > in
> > > >> > > >>>>>> > > > > > up
> > > >> > > >>>>>> > > > > > > > to
> > > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > > >> distributing
> > > >> > > into
> > > >> > > >>>>>> 32
> > > >> > > >>>>>> > > buckets
> > > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > > >> > > >>>>>> > > > > > > > >                           new
> > > >> > > >>>>>> > > > > > > > >
> > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > > >> > > >>>>>> > > > > > > > >      Put put = new
> > > >> > > >>>>>> > > > > >
> > Put(keyDistributor.getDistributedKey(originalKey));
> > > >> > > >>>>>> > > > > > > > >      ... // add values
> > > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > > >> > > (internally
> > > >> > > >>>>>> > > > > <bucketsCount>
> > > >> > > >>>>>> > > > > > > > > scanners
> > > >> > > >>>>>> > > > > > > > > executed):
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > >> > > >>>>>> > > > > > > > > keyDistributor);
> > > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > > >> > > >>>>>> > > > > > > > >      ...
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written
> data
> > > >> chunk
> > > >> > > >>>>>> specified by
> > > >> > > >>>>>> > > > Scan:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > > >> > HBaseConfiguration.create();
> > > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > > "testMapreduceJob");
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey,
> stopKey);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >>  TableMapReduceUtil.initTableMapperJob("table",
> > > >> > > >>>>>> scan,
> > > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > > >> > > >>>>>> ImmutableBytesWritable.class,
> > > >> > > >>>>>> > > > > > > Result.class,
> > > >> > > >>>>>> > > > > > > > > job);
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    // Substituting standard
> > TableInputFormat
> > > >> which
> > > >> > > was
> > > >> > > >>>>>> set in
> > > >> > > >>>>>> > > > > > > > >    //
> > > >> TableMapReduceUtil.initTableMapperJob(...)
> > > >> > > >>>>>> > > > > > > > >
> > > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > > >> > > >>>>>> > > > > > > > >
> > > >>  keyDistributor.addInfo(job.getConfiguration());
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > > >> support
> > > >> > > >>>>>> custom row
> > > >> > > >>>>>> > > key
> > > >> > > >>>>>> > > > > > > > > distribution
> > > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > > >> distributing
> > > >> > > >>>>>> logic just
> > > >> > > >>>>>> > > > > > implement
> > > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > > which
> > > >> is
> > > >> > > >>>>>> really very
> > > >> > > >>>>>> > > > > simple:
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > > >    public abstract class
> > > >> AbstractRowKeyDistributor
> > > >> > > >>>>>> implements
> > > >> > > >>>>>> > > > > > > > > Parametrizable {
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > >> > getDistributedKey(byte[]
> > > >> > > >>>>>> > > > originalKey);
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > > >> getOriginalKey(byte[]
> > > >> > > >>>>>> > adjustedKey);
> > > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > > >> > > >>>>>> getAllDistributedKeys(byte[]
> > > >> > > >>>>>> > > > > > > originalKey);
> > > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > > >> > > >>>>>> > > > > > > > >    }
> > > >> > > >>>>>> > > > > > > > >
> > > >> > > >>>>>> > > > > > > >
> > > >> > > >>>>>> > > > > > >
> > > >> > > >>>>>> > > > > >
> > > >> > > >>>>>> > > > >
> > > >> > > >>>>>> > > >
> > > >> > > >>>>>> > >
> > > >> > > >>>>>> >
> > > >> > > >>>>>>
> > > >> > > >>>>>
> > > >> > > >>>>>
> > > >> > > >>>>
> > > >> > > >>>
> > > >> > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

Alex:
Can you summarize HBaseWD in your blog, including points 1 and 2 below ?

Thanks

On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <al...@gmail.com>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> > >> rows
> > >> > > with
> > >> > > > Scan object.
> > >> > > > In write() method, we now have:
> > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > >> > > > ...
> > >> > > >       for (Map.Entry<String, byte[]> attr :
> > >> this.attributes.entrySet())
> > >> > {
> > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > >> > > >       }
> > >> > > > The start/end rows may be written twice.
> > >> > > >
> > >> > > > Of course, you have full control over how to generate the unique
> > ID
> > >> for
> > >> > > > "sourceScan" attribute.
> > >> > > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> > the
> > >> > > second
> > >> > > > should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > >
> > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > >> > alex.baranov.v@gmail.com
> > >> > > >wrote:
> > >> > > >
> > >> > > >> > Can you remove the first version ?
> > >> > > >> Isn't it ok to keep it in JIRA issue?
> > >> > > >>
> > >> > > >>
> > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > >> supports
> > >> > > >> setAttribute() ?
> > >> > > >> > If it does, can you encode start row and end row as
> > "sourceScan"
> > >> > > >> attribute ?
> > >> > > >>
> > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> still
> > >> want
> > >> > > to
> > >> > > >> hear from the devs the story about Scan version.
> > >> > > >>
> > >> > > >>
> > >> > > >> > One consideration is that start row or end row may be quite
> > long.
> > >> > > >>
> > >> > > >> Yeah, that is was my though too at first. Though it might be
> ok,
> > >> since
> > >> > > we
> > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > >> > > >>
> > >> > > >> > What do you think ?
> > >> > > >>
> > >> > > >> I'd love to hear from you is this variant I mentioned is what
> we
> > >> are
> > >> > > >> looking at here:
> > >> > > >>
> > >> > > >>
> > >> > > >> > From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >> distributed scan.
> > >> > > >> > I.e. group scans which were fired by single distributed scan.
> > If
> > >> > > that's
> > >> > > >> what you want, distributed
> > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> attribute
> > >> to
> > >> > its
> > >> > > >> value. This way we'll
> > >> > > >> > have <# of distinct "sourceScan" attribute values> = <number
> of
> > >> > > >> distributed scans invoked by
> > >> > > >> > client side> and two scans on server side will have the same
> > >> > > >> "sourceScan" attribute iff they
> > >> > > >> > "belong" to same distributed scan.
> > >> > > >>
> > >> > > >>
> > >> > > >> Alex Baranau
> > >> > > >> ----
> > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > >> -
> > >> > > >> HBase
> > >> > > >>
> > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > > >>
> > >> > > >>> Alex:
> > >> > > >>> Your second patch looks good.
> > >> > > >>> Can you remove the first version ?
> > >> > > >>>
> > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > supports
> > >> > > >>> setAttribute() ?
> > >> > > >>> If it does, can you encode start row and end row as
> "sourceScan"
> > >> > > >>> attribute ?
> > >> > > >>>
> > >> > > >>> One consideration is that start row or end row may be quite
> > long.
> > >> > > >>> Ideally we should store hash code of source Scan object as
> > >> > "sourceScan"
> > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> it,
> > >> that
> > >> > > would
> > >> > > >>> require running all Scan related tests.
> > >> > > >>>
> > >> > > >>> What do you think ?
> > >> > > >>>
> > >> > > >>> Thanks
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > >> > > alex.baranov.v@gmail.com>wrote:
> > >> > > >>>
> > >> > > >>>> Sorry for the delay in response (public holidays here).
> > >> > > >>>>
> > >> > > >>>> This depends on what info you are looking for on server side.
> > >> > > >>>>
> > >> > > >>>> From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >>>> distributed scan. I.e. group scans which were fired by single
> > >> > > distributed
> > >> > > >>>> scan. If that's what you want, distributed scan can generate
> > >> unique
> > >> > ID
> > >> > > and
> > >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> > have
> > >> <#
> > >> > > of
> > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > distributed
> > >> > scans
> > >> > > >>>> invoked by client side> and two scans on server side will
> have
> > >> the
> > >> > > same
> > >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> > >> scan.
> > >> > > >>>>
> > >> > > >>>> Is this what are you looking for?
> > >> > > >>>>
> > >> > > >>>> Alex Baranau
> > >> > > >>>>
> > >> > > >>>> P.S. attached patch for HBASE-3811<
> > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > >> > > >>>> .
> > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > >> > > >>>>
> > >> > > >>>> ----
> > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > >> Hadoop
> > >> > -
> > >> > > >>>> HBase
> > >> > > >>>>
> > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yuzhihong@gmail.com
> >
> > >> > wrote:
> > >> > > >>>>
> > >> > > >>>>> Alex:
> > >> > > >>>>> What type of identification should we put in the map of the
> > Scan
> > >> > > object
> > >> > > >>>>> ?
> > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> > user
> > >> > can
> > >> > > >>>>> use same distributor on multiple scans.
> > >> > > >>>>>
> > >> > > >>>>> Please share your thought.
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > >> > > >>>>>>
> > >> > > >>>>>> Alex Baranau
> > >> > > >>>>>> ----
> > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > >> > Hadoop
> > >> > > -
> > >> > > >>>>>> HBase
> > >> > > >>>>>>
> > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > >> > > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>> > My plan was to make regions that have active scanners
> more
> > >> > stable
> > >> > > -
> > >> > > >>>>>> trying
> > >> > > >>>>>> > not to move them when balancing.
> > >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> > Scan
> > >> so
> > >> > > >>>>>> that the
> > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > >> > > >>>>>> >
> > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > >> > > >>>>>> >
> > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > >> > > >>>>>> alex.baranov.v@gmail.com
> > >> > > >>>>>> > >wrote:
> > >> > > >>>>>> >
> > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > >> > > >>>>>> differently) when
> > >> > > >>>>>> > > determining the load?
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > The current code looks like this:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > class DistributedScanner:
> > >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> > >> Scan
> > >> > > >>>>>> original,
> > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > >> IOException {
> > >> > > >>>>>> > >    byte[][] startKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > >> > > >>>>>> > >    byte[][] stopKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    ResultScanner[] rss = new
> > >> ResultScanner[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > >> > > >>>>>> > >  }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> > we
> > >> > need
> > >> > > to
> > >> > > >>>>>> either
> > >> > > >>>>>> > > use some different (derived from Scan) class or add
> some
> > >> > > attribute
> > >> > > >>>>>> to
> > >> > > >>>>>> > them.
> > >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> > >> former,
> > >> > > but
> > >> > > >>>>>> I
> > >> > > >>>>>> > don't
> > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > extra
> > >> > > >>>>>> > functionality)
> > >> > > >>>>>> > > just to distinguish it from the base one.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > >> differently
> > >> > > on
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side, that would be helpful.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > Alex Baranau
> > >> > > >>>>>> > > ----
> > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >> -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > HBase
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > >> yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > > My request would be to make the distributed scan
> > >> > identifiable
> > >> > > >>>>>> from
> > >> > > >>>>>> > server
> > >> > > >>>>>> > > > side.
> > >> > > >>>>>> > > > :-)
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > >> > > >>>>>> > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > >wrote:
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> > data
> > >> > for
> > >> > > >>>>>> the whole
> > >> > > >>>>>> > > > table
> > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > will
> > >> > fire
> > >> > > N
> > >> > > >>>>>> scans
> > >> > > >>>>>> > > > against
> > >> > > >>>>>> > > > > the same region.
> > >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> > >> > regions
> > >> > > >>>>>> for
> > >> > > >>>>>> > single
> > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > With current implementation "distributed" scan
> won't
> > be
> > >> > > >>>>>> recognized as
> > >> > > >>>>>> > > > > something special on the server side. It will be an
> > >> > ordinary
> > >> > > >>>>>> scan.
> > >> > > >>>>>> > > Though
> > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > >> typical
> > >> > > >>>>>> situation is
> > >> > > >>>>>> > > > "many
> > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > >> > > "distributed
> > >> > > >>>>>> scan"
> > >> > > >>>>>> > are
> > >> > > >>>>>> > > > > likely not to hit the same region.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> free
> > >> to
> > >> > ask
> > >> > > >>>>>> more ;)
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Alex Baranau
> > >> > > >>>>>> > > > > ----
> > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > >> Nutch
> > >> > -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > > > HBase
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > >> > > yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > Alex:
> > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > Cheers
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > >wrote:
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > > Hi Ted,
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> where
> > >> data
> > >> > is
> > >> > > >>>>>> consumed
> > >> > > >>>>>> > > by
> > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > >> performance
> > >> > of
> > >> > > >>>>>> pure
> > >> > > >>>>>> > > > > "distributed
> > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> expect
> > >> it
> > >> > to
> > >> > > be
> > >> > > >>>>>> close
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > > > simple
> > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> faster
> > >> > > >>>>>> depending on
> > >> > > >>>>>> > your
> > >> > > >>>>>> > > > > data
> > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > timeseries
> > >> > data
> > >> > > >>>>>> > > (sequential)
> > >> > > >>>>>> > > > > > which
> > >> > > >>>>>> > > > > > > is written into the single region at a time,
> then
> > >> e.g.
> > >> > > if
> > >> > > >>>>>> you
> > >> > > >>>>>> > > access
> > >> > > >>>>>> > > > > > delta
> > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> not
> > >> > single
> > >> > > >>>>>> client)
> > >> > > >>>>>> > > > these
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> > >> regions
> > >> > > at
> > >> > > >>>>>> a time,
> > >> > > >>>>>> > > > which
> > >> > > >>>>>> > > > > > may
> > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > data
> > >> > that
> > >> > > is
> > >> > > >>>>>> much
> > >> > > >>>>>> > > > better
> > >> > > >>>>>> > > > > > > spread over region servers.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > >> affect
> > >> > > >>>>>> reading
> > >> > > >>>>>> > > > > performance
> > >> > > >>>>>> > > > > > at
> > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> times
> > >> more
> > >> > > >>>>>> splits and
> > >> > > >>>>>> > > > hence
> > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> cases
> > >> this
> > >> > > even
> > >> > > >>>>>> > improves
> > >> > > >>>>>> > > > > > overall
> > >> > > >>>>>> > > > > > > performance of the MR job since work is better
> > >> > > distributed
> > >> > > >>>>>> over
> > >> > > >>>>>> > > > cluster
> > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> constantly
> > >> > process
> > >> > > >>>>>> the
> > >> > > >>>>>> > coming
> > >> > > >>>>>> > > > > delta
> > >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> > >> regions
> > >> > > >>>>>> depending
> > >> > > >>>>>> > on
> > >> > > >>>>>> > > > > > > processing frequency).
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > If you can share details on your case, that
> will
> > >> help
> > >> > to
> > >> > > >>>>>> > understand
> > >> > > >>>>>> > > > > what
> > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > ----
> > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > >> -
> > >> > > Nutch
> > >> > > >>>>>> -
> > >> > > >>>>>> > Hadoop
> > >> > > >>>>>> > > -
> > >> > > >>>>>> > > > > > HBase
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > >> > > >>>>>> yuzhihong@gmail.com>
> > >> > > >>>>>> > > wrote:
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> > to
> > >> one
> > >> > > >>>>>> scanner
> > >> > > >>>>>> > > > > > originally,
> > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > impact
> > >> ?
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > Thanks
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> Baranau
> > <
> > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > > > >wrote:
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > > Hello guys,
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > >> project/lib
> > >> > > >>>>>> around
> > >> > > >>>>>> > > HBase:
> > >> > > >>>>>> > > > > > > HBaseWD.
> > >> > > >>>>>> > > > > > > > > It
> > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > load
> > >> > > (across
> > >> > > >>>>>> > > > > regionservers)
> > >> > > >>>>>> > > > > > > > when
> > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> > >> nature)
> > >> > > >>>>>> records.
> > >> > > >>>>>> > It
> > >> > > >>>>>> > > > > > > implements
> > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > times
> > >> on
> > >> > > this
> > >> > > >>>>>> > mailing
> > >> > > >>>>>> > > > list
> > >> > > >>>>>> > > > > > > (e.g.
> > >> > > >>>>>> > > > > > > > > here:
> http://search-hadoop.com/m/gNRA82No5Wk
> > ).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find the sources at
> > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > >> > > >>>>>> > > > > > > > > also
> > >> > > >>>>>> > > > > > > > > a jar of current version for convenience).
> It
> > >> is
> > >> > > very
> > >> > > >>>>>> easy to
> > >> > > >>>>>> > > > make
> > >> > > >>>>>> > > > > > use
> > >> > > >>>>>> > > > > > > of
> > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> > >> with
> > >> > 1+2
> > >> > > >>>>>> lines of
> > >> > > >>>>>> > > > code
> > >> > > >>>>>> > > > > > (one
> > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> configuring
> > >> > > MapReduce
> > >> > > >>>>>> job).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> lib
> > >> [1].
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > > > ----
> > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > >> Lucene
> > >> > -
> > >> > > >>>>>> Nutch -
> > >> > > >>>>>> > > > Hadoop
> > >> > > >>>>>> > > > > -
> > >> > > >>>>>> > > > > > > > HBase
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > [1]
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Description:
> > >> > > >>>>>> > > > > > > > > ------------
> > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> (sequential)
> > >> > Writes.
> > >> > > >>>>>> It was
> > >> > > >>>>>> > > > > inspired
> > >> > > >>>>>> > > > > > by
> > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> the
> > >> > > problem
> > >> > > >>>>>> of
> > >> > > >>>>>> > > choosing
> > >> > > >>>>>> > > > > > > > between:
> > >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> > >> (e.g.
> > >> > > >>>>>> time-series
> > >> > > >>>>>> > > data
> > >> > > >>>>>> > > > > > with
> > >> > > >>>>>> > > > > > > > row
> > >> > > >>>>>> > > > > > > > > key
> > >> > > >>>>>> > > > > > > > >  built based on ts)
> > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> fast
> > >> > range
> > >> > > >>>>>> scans
> > >> > > >>>>>> > with
> > >> > > >>>>>> > > > help
> > >> > > >>>>>> > > > > > of
> > >> > > >>>>>> > > > > > > > > setting
> > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > single
> > >> > > region
> > >> > > >>>>>> server
> > >> > > >>>>>> > > > > > > hot-spotting
> > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> in
> > >> > > sequence
> > >> > > >>>>>> all
> > >> > > >>>>>> > > records
> > >> > > >>>>>> > > > > end
> > >> > > >>>>>> > > > > > > up
> > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > >> > performance
> > >> > > >>>>>> by
> > >> > > >>>>>> > > > > distributing
> > >> > > >>>>>> > > > > > > new
> > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > >> possible
> > >> > > >>>>>> doing fast
> > >> > > >>>>>> > > > range
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > > > against written data.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle
> of
> > >> the
> > >> > > two
> > >> > > >>>>>> above
> > >> > > >>>>>> > and
> > >> > > >>>>>> > > > > > proved
> > >> > > >>>>>> > > > > > > to
> > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> the
> > >> > > cluster
> > >> > > >>>>>> during
> > >> > > >>>>>> > > > > writing
> > >> > > >>>>>> > > > > > > data
> > >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> > >> > provides
> > >> > > >>>>>> very
> > >> > > >>>>>> > > simple
> > >> > > >>>>>> > > > > API
> > >> > > >>>>>> > > > > > to
> > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> with
> > >> > > existing
> > >> > > >>>>>> code.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> info
> > >> as
> > >> > > they
> > >> > > >>>>>> aimed
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > act
> > >> > > >>>>>> > > > > as
> > >> > > >>>>>> > > > > > > > > example.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > >> > > >>>>>> > > > > > > > > ----------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > which
> > >> > are
> > >> > > >>>>>> being
> > >> > > >>>>>> > > written
> > >> > > >>>>>> > > > > in
> > >> > > >>>>>> > > > > > up
> > >> > > >>>>>> > > > > > > > to
> > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > >> distributing
> > >> > > into
> > >> > > >>>>>> 32
> > >> > > >>>>>> > > buckets
> > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > >> > > >>>>>> > > > > > > > >                           new
> > >> > > >>>>>> > > > > > > > >
> > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > >> > > >>>>>> > > > > > > > >      Put put = new
> > >> > > >>>>>> > > > > >
> Put(keyDistributor.getDistributedKey(originalKey));
> > >> > > >>>>>> > > > > > > > >      ... // add values
> > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > >> > > (internally
> > >> > > >>>>>> > > > > <bucketsCount>
> > >> > > >>>>>> > > > > > > > > scanners
> > >> > > >>>>>> > > > > > > > > executed):
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > >> > > >>>>>> > > > > > > > > keyDistributor);
> > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > >> > > >>>>>> > > > > > > > >      ...
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> > >> chunk
> > >> > > >>>>>> specified by
> > >> > > >>>>>> > > > Scan:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > >> > HBaseConfiguration.create();
> > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > "testMapreduceJob");
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >>  TableMapReduceUtil.initTableMapperJob("table",
> > >> > > >>>>>> scan,
> > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > >> > > >>>>>> ImmutableBytesWritable.class,
> > >> > > >>>>>> > > > > > > Result.class,
> > >> > > >>>>>> > > > > > > > > job);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    // Substituting standard
> TableInputFormat
> > >> which
> > >> > > was
> > >> > > >>>>>> set in
> > >> > > >>>>>> > > > > > > > >    //
> > >> TableMapReduceUtil.initTableMapperJob(...)
> > >> > > >>>>>> > > > > > > > >
> > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > >> > > >>>>>> > > > > > > > >
> > >>  keyDistributor.addInfo(job.getConfiguration());
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > >> support
> > >> > > >>>>>> custom row
> > >> > > >>>>>> > > key
> > >> > > >>>>>> > > > > > > > > distribution
> > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > >> distributing
> > >> > > >>>>>> logic just
> > >> > > >>>>>> > > > > > implement
> > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > which
> > >> is
> > >> > > >>>>>> really very
> > >> > > >>>>>> > > > > simple:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    public abstract class
> > >> AbstractRowKeyDistributor
> > >> > > >>>>>> implements
> > >> > > >>>>>> > > > > > > > > Parametrizable {
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> > getDistributedKey(byte[]
> > >> > > >>>>>> > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> getOriginalKey(byte[]
> > >> > > >>>>>> > adjustedKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > >> > > >>>>>> getAllDistributedKeys(byte[]
> > >> > > >>>>>> > > > > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > >
> > >> > > >>>>>> >
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

Alex:
Can you summarize HBaseWD in your blog, including points 1 and 2 below ?

Thanks

On Wed, May 18, 2011 at 8:03 AM, Alex Baranau <al...@gmail.com>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> > >> rows
> > >> > > with
> > >> > > > Scan object.
> > >> > > > In write() method, we now have:
> > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > >> > > > ...
> > >> > > >       for (Map.Entry<String, byte[]> attr :
> > >> this.attributes.entrySet())
> > >> > {
> > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > >> > > >       }
> > >> > > > The start/end rows may be written twice.
> > >> > > >
> > >> > > > Of course, you have full control over how to generate the unique
> > ID
> > >> for
> > >> > > > "sourceScan" attribute.
> > >> > > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> > the
> > >> > > second
> > >> > > > should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > >
> > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > >> > alex.baranov.v@gmail.com
> > >> > > >wrote:
> > >> > > >
> > >> > > >> > Can you remove the first version ?
> > >> > > >> Isn't it ok to keep it in JIRA issue?
> > >> > > >>
> > >> > > >>
> > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > >> supports
> > >> > > >> setAttribute() ?
> > >> > > >> > If it does, can you encode start row and end row as
> > "sourceScan"
> > >> > > >> attribute ?
> > >> > > >>
> > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> still
> > >> want
> > >> > > to
> > >> > > >> hear from the devs the story about Scan version.
> > >> > > >>
> > >> > > >>
> > >> > > >> > One consideration is that start row or end row may be quite
> > long.
> > >> > > >>
> > >> > > >> Yeah, that is was my though too at first. Though it might be
> ok,
> > >> since
> > >> > > we
> > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > >> > > >>
> > >> > > >> > What do you think ?
> > >> > > >>
> > >> > > >> I'd love to hear from you is this variant I mentioned is what
> we
> > >> are
> > >> > > >> looking at here:
> > >> > > >>
> > >> > > >>
> > >> > > >> > From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >> distributed scan.
> > >> > > >> > I.e. group scans which were fired by single distributed scan.
> > If
> > >> > > that's
> > >> > > >> what you want, distributed
> > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> attribute
> > >> to
> > >> > its
> > >> > > >> value. This way we'll
> > >> > > >> > have <# of distinct "sourceScan" attribute values> = <number
> of
> > >> > > >> distributed scans invoked by
> > >> > > >> > client side> and two scans on server side will have the same
> > >> > > >> "sourceScan" attribute iff they
> > >> > > >> > "belong" to same distributed scan.
> > >> > > >>
> > >> > > >>
> > >> > > >> Alex Baranau
> > >> > > >> ----
> > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > >> -
> > >> > > >> HBase
> > >> > > >>
> > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > > >>
> > >> > > >>> Alex:
> > >> > > >>> Your second patch looks good.
> > >> > > >>> Can you remove the first version ?
> > >> > > >>>
> > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > supports
> > >> > > >>> setAttribute() ?
> > >> > > >>> If it does, can you encode start row and end row as
> "sourceScan"
> > >> > > >>> attribute ?
> > >> > > >>>
> > >> > > >>> One consideration is that start row or end row may be quite
> > long.
> > >> > > >>> Ideally we should store hash code of source Scan object as
> > >> > "sourceScan"
> > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> it,
> > >> that
> > >> > > would
> > >> > > >>> require running all Scan related tests.
> > >> > > >>>
> > >> > > >>> What do you think ?
> > >> > > >>>
> > >> > > >>> Thanks
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > >> > > alex.baranov.v@gmail.com>wrote:
> > >> > > >>>
> > >> > > >>>> Sorry for the delay in response (public holidays here).
> > >> > > >>>>
> > >> > > >>>> This depends on what info you are looking for on server side.
> > >> > > >>>>
> > >> > > >>>> From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >>>> distributed scan. I.e. group scans which were fired by single
> > >> > > distributed
> > >> > > >>>> scan. If that's what you want, distributed scan can generate
> > >> unique
> > >> > ID
> > >> > > and
> > >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> > have
> > >> <#
> > >> > > of
> > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > distributed
> > >> > scans
> > >> > > >>>> invoked by client side> and two scans on server side will
> have
> > >> the
> > >> > > same
> > >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> > >> scan.
> > >> > > >>>>
> > >> > > >>>> Is this what are you looking for?
> > >> > > >>>>
> > >> > > >>>> Alex Baranau
> > >> > > >>>>
> > >> > > >>>> P.S. attached patch for HBASE-3811<
> > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > >> > > >>>> .
> > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > >> > > >>>>
> > >> > > >>>> ----
> > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > >> Hadoop
> > >> > -
> > >> > > >>>> HBase
> > >> > > >>>>
> > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yuzhihong@gmail.com
> >
> > >> > wrote:
> > >> > > >>>>
> > >> > > >>>>> Alex:
> > >> > > >>>>> What type of identification should we put in the map of the
> > Scan
> > >> > > object
> > >> > > >>>>> ?
> > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> > user
> > >> > can
> > >> > > >>>>> use same distributor on multiple scans.
> > >> > > >>>>>
> > >> > > >>>>> Please share your thought.
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > >> > > >>>>>>
> > >> > > >>>>>> Alex Baranau
> > >> > > >>>>>> ----
> > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > >> > Hadoop
> > >> > > -
> > >> > > >>>>>> HBase
> > >> > > >>>>>>
> > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > >> > > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>> > My plan was to make regions that have active scanners
> more
> > >> > stable
> > >> > > -
> > >> > > >>>>>> trying
> > >> > > >>>>>> > not to move them when balancing.
> > >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> > Scan
> > >> so
> > >> > > >>>>>> that the
> > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > >> > > >>>>>> >
> > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > >> > > >>>>>> >
> > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > >> > > >>>>>> alex.baranov.v@gmail.com
> > >> > > >>>>>> > >wrote:
> > >> > > >>>>>> >
> > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > >> > > >>>>>> differently) when
> > >> > > >>>>>> > > determining the load?
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > The current code looks like this:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > class DistributedScanner:
> > >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> > >> Scan
> > >> > > >>>>>> original,
> > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > >> IOException {
> > >> > > >>>>>> > >    byte[][] startKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > >> > > >>>>>> > >    byte[][] stopKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    ResultScanner[] rss = new
> > >> ResultScanner[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > >> > > >>>>>> > >  }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> > we
> > >> > need
> > >> > > to
> > >> > > >>>>>> either
> > >> > > >>>>>> > > use some different (derived from Scan) class or add
> some
> > >> > > attribute
> > >> > > >>>>>> to
> > >> > > >>>>>> > them.
> > >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> > >> former,
> > >> > > but
> > >> > > >>>>>> I
> > >> > > >>>>>> > don't
> > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > extra
> > >> > > >>>>>> > functionality)
> > >> > > >>>>>> > > just to distinguish it from the base one.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > >> differently
> > >> > > on
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side, that would be helpful.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > Alex Baranau
> > >> > > >>>>>> > > ----
> > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >> -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > HBase
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > >> yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > > My request would be to make the distributed scan
> > >> > identifiable
> > >> > > >>>>>> from
> > >> > > >>>>>> > server
> > >> > > >>>>>> > > > side.
> > >> > > >>>>>> > > > :-)
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > >> > > >>>>>> > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > >wrote:
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> > data
> > >> > for
> > >> > > >>>>>> the whole
> > >> > > >>>>>> > > > table
> > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > will
> > >> > fire
> > >> > > N
> > >> > > >>>>>> scans
> > >> > > >>>>>> > > > against
> > >> > > >>>>>> > > > > the same region.
> > >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> > >> > regions
> > >> > > >>>>>> for
> > >> > > >>>>>> > single
> > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > With current implementation "distributed" scan
> won't
> > be
> > >> > > >>>>>> recognized as
> > >> > > >>>>>> > > > > something special on the server side. It will be an
> > >> > ordinary
> > >> > > >>>>>> scan.
> > >> > > >>>>>> > > Though
> > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > >> typical
> > >> > > >>>>>> situation is
> > >> > > >>>>>> > > > "many
> > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > >> > > "distributed
> > >> > > >>>>>> scan"
> > >> > > >>>>>> > are
> > >> > > >>>>>> > > > > likely not to hit the same region.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> free
> > >> to
> > >> > ask
> > >> > > >>>>>> more ;)
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Alex Baranau
> > >> > > >>>>>> > > > > ----
> > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > >> Nutch
> > >> > -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > > > HBase
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > >> > > yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > Alex:
> > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > Cheers
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > >wrote:
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > > Hi Ted,
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> where
> > >> data
> > >> > is
> > >> > > >>>>>> consumed
> > >> > > >>>>>> > > by
> > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > >> performance
> > >> > of
> > >> > > >>>>>> pure
> > >> > > >>>>>> > > > > "distributed
> > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> expect
> > >> it
> > >> > to
> > >> > > be
> > >> > > >>>>>> close
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > > > simple
> > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> faster
> > >> > > >>>>>> depending on
> > >> > > >>>>>> > your
> > >> > > >>>>>> > > > > data
> > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > timeseries
> > >> > data
> > >> > > >>>>>> > > (sequential)
> > >> > > >>>>>> > > > > > which
> > >> > > >>>>>> > > > > > > is written into the single region at a time,
> then
> > >> e.g.
> > >> > > if
> > >> > > >>>>>> you
> > >> > > >>>>>> > > access
> > >> > > >>>>>> > > > > > delta
> > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> not
> > >> > single
> > >> > > >>>>>> client)
> > >> > > >>>>>> > > > these
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> > >> regions
> > >> > > at
> > >> > > >>>>>> a time,
> > >> > > >>>>>> > > > which
> > >> > > >>>>>> > > > > > may
> > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > data
> > >> > that
> > >> > > is
> > >> > > >>>>>> much
> > >> > > >>>>>> > > > better
> > >> > > >>>>>> > > > > > > spread over region servers.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > >> affect
> > >> > > >>>>>> reading
> > >> > > >>>>>> > > > > performance
> > >> > > >>>>>> > > > > > at
> > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> times
> > >> more
> > >> > > >>>>>> splits and
> > >> > > >>>>>> > > > hence
> > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> cases
> > >> this
> > >> > > even
> > >> > > >>>>>> > improves
> > >> > > >>>>>> > > > > > overall
> > >> > > >>>>>> > > > > > > performance of the MR job since work is better
> > >> > > distributed
> > >> > > >>>>>> over
> > >> > > >>>>>> > > > cluster
> > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> constantly
> > >> > process
> > >> > > >>>>>> the
> > >> > > >>>>>> > coming
> > >> > > >>>>>> > > > > delta
> > >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> > >> regions
> > >> > > >>>>>> depending
> > >> > > >>>>>> > on
> > >> > > >>>>>> > > > > > > processing frequency).
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > If you can share details on your case, that
> will
> > >> help
> > >> > to
> > >> > > >>>>>> > understand
> > >> > > >>>>>> > > > > what
> > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > ----
> > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > >> -
> > >> > > Nutch
> > >> > > >>>>>> -
> > >> > > >>>>>> > Hadoop
> > >> > > >>>>>> > > -
> > >> > > >>>>>> > > > > > HBase
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > >> > > >>>>>> yuzhihong@gmail.com>
> > >> > > >>>>>> > > wrote:
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> > to
> > >> one
> > >> > > >>>>>> scanner
> > >> > > >>>>>> > > > > > originally,
> > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > impact
> > >> ?
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > Thanks
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> Baranau
> > <
> > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > > > >wrote:
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > > Hello guys,
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > >> project/lib
> > >> > > >>>>>> around
> > >> > > >>>>>> > > HBase:
> > >> > > >>>>>> > > > > > > HBaseWD.
> > >> > > >>>>>> > > > > > > > > It
> > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > load
> > >> > > (across
> > >> > > >>>>>> > > > > regionservers)
> > >> > > >>>>>> > > > > > > > when
> > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> > >> nature)
> > >> > > >>>>>> records.
> > >> > > >>>>>> > It
> > >> > > >>>>>> > > > > > > implements
> > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > times
> > >> on
> > >> > > this
> > >> > > >>>>>> > mailing
> > >> > > >>>>>> > > > list
> > >> > > >>>>>> > > > > > > (e.g.
> > >> > > >>>>>> > > > > > > > > here:
> http://search-hadoop.com/m/gNRA82No5Wk
> > ).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find the sources at
> > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > >> > > >>>>>> > > > > > > > > also
> > >> > > >>>>>> > > > > > > > > a jar of current version for convenience).
> It
> > >> is
> > >> > > very
> > >> > > >>>>>> easy to
> > >> > > >>>>>> > > > make
> > >> > > >>>>>> > > > > > use
> > >> > > >>>>>> > > > > > > of
> > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> > >> with
> > >> > 1+2
> > >> > > >>>>>> lines of
> > >> > > >>>>>> > > > code
> > >> > > >>>>>> > > > > > (one
> > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> configuring
> > >> > > MapReduce
> > >> > > >>>>>> job).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> lib
> > >> [1].
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > > > ----
> > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > >> Lucene
> > >> > -
> > >> > > >>>>>> Nutch -
> > >> > > >>>>>> > > > Hadoop
> > >> > > >>>>>> > > > > -
> > >> > > >>>>>> > > > > > > > HBase
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > [1]
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Description:
> > >> > > >>>>>> > > > > > > > > ------------
> > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> (sequential)
> > >> > Writes.
> > >> > > >>>>>> It was
> > >> > > >>>>>> > > > > inspired
> > >> > > >>>>>> > > > > > by
> > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> the
> > >> > > problem
> > >> > > >>>>>> of
> > >> > > >>>>>> > > choosing
> > >> > > >>>>>> > > > > > > > between:
> > >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> > >> (e.g.
> > >> > > >>>>>> time-series
> > >> > > >>>>>> > > data
> > >> > > >>>>>> > > > > > with
> > >> > > >>>>>> > > > > > > > row
> > >> > > >>>>>> > > > > > > > > key
> > >> > > >>>>>> > > > > > > > >  built based on ts)
> > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> fast
> > >> > range
> > >> > > >>>>>> scans
> > >> > > >>>>>> > with
> > >> > > >>>>>> > > > help
> > >> > > >>>>>> > > > > > of
> > >> > > >>>>>> > > > > > > > > setting
> > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > single
> > >> > > region
> > >> > > >>>>>> server
> > >> > > >>>>>> > > > > > > hot-spotting
> > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> in
> > >> > > sequence
> > >> > > >>>>>> all
> > >> > > >>>>>> > > records
> > >> > > >>>>>> > > > > end
> > >> > > >>>>>> > > > > > > up
> > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > >> > performance
> > >> > > >>>>>> by
> > >> > > >>>>>> > > > > distributing
> > >> > > >>>>>> > > > > > > new
> > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > >> possible
> > >> > > >>>>>> doing fast
> > >> > > >>>>>> > > > range
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > > > against written data.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle
> of
> > >> the
> > >> > > two
> > >> > > >>>>>> above
> > >> > > >>>>>> > and
> > >> > > >>>>>> > > > > > proved
> > >> > > >>>>>> > > > > > > to
> > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> the
> > >> > > cluster
> > >> > > >>>>>> during
> > >> > > >>>>>> > > > > writing
> > >> > > >>>>>> > > > > > > data
> > >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> > >> > provides
> > >> > > >>>>>> very
> > >> > > >>>>>> > > simple
> > >> > > >>>>>> > > > > API
> > >> > > >>>>>> > > > > > to
> > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> with
> > >> > > existing
> > >> > > >>>>>> code.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> info
> > >> as
> > >> > > they
> > >> > > >>>>>> aimed
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > act
> > >> > > >>>>>> > > > > as
> > >> > > >>>>>> > > > > > > > > example.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > >> > > >>>>>> > > > > > > > > ----------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > which
> > >> > are
> > >> > > >>>>>> being
> > >> > > >>>>>> > > written
> > >> > > >>>>>> > > > > in
> > >> > > >>>>>> > > > > > up
> > >> > > >>>>>> > > > > > > > to
> > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > >> distributing
> > >> > > into
> > >> > > >>>>>> 32
> > >> > > >>>>>> > > buckets
> > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > >> > > >>>>>> > > > > > > > >                           new
> > >> > > >>>>>> > > > > > > > >
> > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > >> > > >>>>>> > > > > > > > >      Put put = new
> > >> > > >>>>>> > > > > >
> Put(keyDistributor.getDistributedKey(originalKey));
> > >> > > >>>>>> > > > > > > > >      ... // add values
> > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > >> > > (internally
> > >> > > >>>>>> > > > > <bucketsCount>
> > >> > > >>>>>> > > > > > > > > scanners
> > >> > > >>>>>> > > > > > > > > executed):
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > >> > > >>>>>> > > > > > > > > keyDistributor);
> > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > >> > > >>>>>> > > > > > > > >      ...
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> > >> chunk
> > >> > > >>>>>> specified by
> > >> > > >>>>>> > > > Scan:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > >> > HBaseConfiguration.create();
> > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > "testMapreduceJob");
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >>  TableMapReduceUtil.initTableMapperJob("table",
> > >> > > >>>>>> scan,
> > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > >> > > >>>>>> ImmutableBytesWritable.class,
> > >> > > >>>>>> > > > > > > Result.class,
> > >> > > >>>>>> > > > > > > > > job);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    // Substituting standard
> TableInputFormat
> > >> which
> > >> > > was
> > >> > > >>>>>> set in
> > >> > > >>>>>> > > > > > > > >    //
> > >> TableMapReduceUtil.initTableMapperJob(...)
> > >> > > >>>>>> > > > > > > > >
> > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > >> > > >>>>>> > > > > > > > >
> > >>  keyDistributor.addInfo(job.getConfiguration());
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > >> support
> > >> > > >>>>>> custom row
> > >> > > >>>>>> > > key
> > >> > > >>>>>> > > > > > > > > distribution
> > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > >> distributing
> > >> > > >>>>>> logic just
> > >> > > >>>>>> > > > > > implement
> > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > which
> > >> is
> > >> > > >>>>>> really very
> > >> > > >>>>>> > > > > simple:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    public abstract class
> > >> AbstractRowKeyDistributor
> > >> > > >>>>>> implements
> > >> > > >>>>>> > > > > > > > > Parametrizable {
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> > getDistributedKey(byte[]
> > >> > > >>>>>> > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> getOriginalKey(byte[]
> > >> > > >>>>>> > adjustedKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > >> > > >>>>>> getAllDistributedKeys(byte[]
> > >> > > >>>>>> > > > > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > >
> > >> > > >>>>>> >
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Thank you, I like the second option better to avoid the roundtrip to HBase.
I am trying it out now.

On Wed, May 18, 2011 at 10:03 AM, Alex Baranau <al...@gmail.com>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> > >> rows
> > >> > > with
> > >> > > > Scan object.
> > >> > > > In write() method, we now have:
> > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > >> > > > ...
> > >> > > >       for (Map.Entry<String, byte[]> attr :
> > >> this.attributes.entrySet())
> > >> > {
> > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > >> > > >       }
> > >> > > > The start/end rows may be written twice.
> > >> > > >
> > >> > > > Of course, you have full control over how to generate the unique
> > ID
> > >> for
> > >> > > > "sourceScan" attribute.
> > >> > > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> > the
> > >> > > second
> > >> > > > should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > >
> > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > >> > alex.baranov.v@gmail.com
> > >> > > >wrote:
> > >> > > >
> > >> > > >> > Can you remove the first version ?
> > >> > > >> Isn't it ok to keep it in JIRA issue?
> > >> > > >>
> > >> > > >>
> > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > >> supports
> > >> > > >> setAttribute() ?
> > >> > > >> > If it does, can you encode start row and end row as
> > "sourceScan"
> > >> > > >> attribute ?
> > >> > > >>
> > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> still
> > >> want
> > >> > > to
> > >> > > >> hear from the devs the story about Scan version.
> > >> > > >>
> > >> > > >>
> > >> > > >> > One consideration is that start row or end row may be quite
> > long.
> > >> > > >>
> > >> > > >> Yeah, that is was my though too at first. Though it might be
> ok,
> > >> since
> > >> > > we
> > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > >> > > >>
> > >> > > >> > What do you think ?
> > >> > > >>
> > >> > > >> I'd love to hear from you is this variant I mentioned is what
> we
> > >> are
> > >> > > >> looking at here:
> > >> > > >>
> > >> > > >>
> > >> > > >> > From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >> distributed scan.
> > >> > > >> > I.e. group scans which were fired by single distributed scan.
> > If
> > >> > > that's
> > >> > > >> what you want, distributed
> > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> attribute
> > >> to
> > >> > its
> > >> > > >> value. This way we'll
> > >> > > >> > have <# of distinct "sourceScan" attribute values> = <number
> of
> > >> > > >> distributed scans invoked by
> > >> > > >> > client side> and two scans on server side will have the same
> > >> > > >> "sourceScan" attribute iff they
> > >> > > >> > "belong" to same distributed scan.
> > >> > > >>
> > >> > > >>
> > >> > > >> Alex Baranau
> > >> > > >> ----
> > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > >> -
> > >> > > >> HBase
> > >> > > >>
> > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > > >>
> > >> > > >>> Alex:
> > >> > > >>> Your second patch looks good.
> > >> > > >>> Can you remove the first version ?
> > >> > > >>>
> > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > supports
> > >> > > >>> setAttribute() ?
> > >> > > >>> If it does, can you encode start row and end row as
> "sourceScan"
> > >> > > >>> attribute ?
> > >> > > >>>
> > >> > > >>> One consideration is that start row or end row may be quite
> > long.
> > >> > > >>> Ideally we should store hash code of source Scan object as
> > >> > "sourceScan"
> > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> it,
> > >> that
> > >> > > would
> > >> > > >>> require running all Scan related tests.
> > >> > > >>>
> > >> > > >>> What do you think ?
> > >> > > >>>
> > >> > > >>> Thanks
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > >> > > alex.baranov.v@gmail.com>wrote:
> > >> > > >>>
> > >> > > >>>> Sorry for the delay in response (public holidays here).
> > >> > > >>>>
> > >> > > >>>> This depends on what info you are looking for on server side.
> > >> > > >>>>
> > >> > > >>>> From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >>>> distributed scan. I.e. group scans which were fired by single
> > >> > > distributed
> > >> > > >>>> scan. If that's what you want, distributed scan can generate
> > >> unique
> > >> > ID
> > >> > > and
> > >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> > have
> > >> <#
> > >> > > of
> > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > distributed
> > >> > scans
> > >> > > >>>> invoked by client side> and two scans on server side will
> have
> > >> the
> > >> > > same
> > >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> > >> scan.
> > >> > > >>>>
> > >> > > >>>> Is this what are you looking for?
> > >> > > >>>>
> > >> > > >>>> Alex Baranau
> > >> > > >>>>
> > >> > > >>>> P.S. attached patch for HBASE-3811<
> > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > >> > > >>>> .
> > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > >> > > >>>>
> > >> > > >>>> ----
> > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > >> Hadoop
> > >> > -
> > >> > > >>>> HBase
> > >> > > >>>>
> > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yuzhihong@gmail.com
> >
> > >> > wrote:
> > >> > > >>>>
> > >> > > >>>>> Alex:
> > >> > > >>>>> What type of identification should we put in the map of the
> > Scan
> > >> > > object
> > >> > > >>>>> ?
> > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> > user
> > >> > can
> > >> > > >>>>> use same distributor on multiple scans.
> > >> > > >>>>>
> > >> > > >>>>> Please share your thought.
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > >> > > >>>>>>
> > >> > > >>>>>> Alex Baranau
> > >> > > >>>>>> ----
> > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > >> > Hadoop
> > >> > > -
> > >> > > >>>>>> HBase
> > >> > > >>>>>>
> > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > >> > > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>> > My plan was to make regions that have active scanners
> more
> > >> > stable
> > >> > > -
> > >> > > >>>>>> trying
> > >> > > >>>>>> > not to move them when balancing.
> > >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> > Scan
> > >> so
> > >> > > >>>>>> that the
> > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > >> > > >>>>>> >
> > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > >> > > >>>>>> >
> > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > >> > > >>>>>> alex.baranov.v@gmail.com
> > >> > > >>>>>> > >wrote:
> > >> > > >>>>>> >
> > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > >> > > >>>>>> differently) when
> > >> > > >>>>>> > > determining the load?
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > The current code looks like this:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > class DistributedScanner:
> > >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> > >> Scan
> > >> > > >>>>>> original,
> > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > >> IOException {
> > >> > > >>>>>> > >    byte[][] startKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > >> > > >>>>>> > >    byte[][] stopKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    ResultScanner[] rss = new
> > >> ResultScanner[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > >> > > >>>>>> > >  }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> > we
> > >> > need
> > >> > > to
> > >> > > >>>>>> either
> > >> > > >>>>>> > > use some different (derived from Scan) class or add
> some
> > >> > > attribute
> > >> > > >>>>>> to
> > >> > > >>>>>> > them.
> > >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> > >> former,
> > >> > > but
> > >> > > >>>>>> I
> > >> > > >>>>>> > don't
> > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > extra
> > >> > > >>>>>> > functionality)
> > >> > > >>>>>> > > just to distinguish it from the base one.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > >> differently
> > >> > > on
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side, that would be helpful.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > Alex Baranau
> > >> > > >>>>>> > > ----
> > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >> -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > HBase
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > >> yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > > My request would be to make the distributed scan
> > >> > identifiable
> > >> > > >>>>>> from
> > >> > > >>>>>> > server
> > >> > > >>>>>> > > > side.
> > >> > > >>>>>> > > > :-)
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > >> > > >>>>>> > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > >wrote:
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> > data
> > >> > for
> > >> > > >>>>>> the whole
> > >> > > >>>>>> > > > table
> > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > will
> > >> > fire
> > >> > > N
> > >> > > >>>>>> scans
> > >> > > >>>>>> > > > against
> > >> > > >>>>>> > > > > the same region.
> > >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> > >> > regions
> > >> > > >>>>>> for
> > >> > > >>>>>> > single
> > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > With current implementation "distributed" scan
> won't
> > be
> > >> > > >>>>>> recognized as
> > >> > > >>>>>> > > > > something special on the server side. It will be an
> > >> > ordinary
> > >> > > >>>>>> scan.
> > >> > > >>>>>> > > Though
> > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > >> typical
> > >> > > >>>>>> situation is
> > >> > > >>>>>> > > > "many
> > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > >> > > "distributed
> > >> > > >>>>>> scan"
> > >> > > >>>>>> > are
> > >> > > >>>>>> > > > > likely not to hit the same region.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> free
> > >> to
> > >> > ask
> > >> > > >>>>>> more ;)
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Alex Baranau
> > >> > > >>>>>> > > > > ----
> > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > >> Nutch
> > >> > -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > > > HBase
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > >> > > yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > Alex:
> > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > Cheers
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > >wrote:
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > > Hi Ted,
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> where
> > >> data
> > >> > is
> > >> > > >>>>>> consumed
> > >> > > >>>>>> > > by
> > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > >> performance
> > >> > of
> > >> > > >>>>>> pure
> > >> > > >>>>>> > > > > "distributed
> > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> expect
> > >> it
> > >> > to
> > >> > > be
> > >> > > >>>>>> close
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > > > simple
> > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> faster
> > >> > > >>>>>> depending on
> > >> > > >>>>>> > your
> > >> > > >>>>>> > > > > data
> > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > timeseries
> > >> > data
> > >> > > >>>>>> > > (sequential)
> > >> > > >>>>>> > > > > > which
> > >> > > >>>>>> > > > > > > is written into the single region at a time,
> then
> > >> e.g.
> > >> > > if
> > >> > > >>>>>> you
> > >> > > >>>>>> > > access
> > >> > > >>>>>> > > > > > delta
> > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> not
> > >> > single
> > >> > > >>>>>> client)
> > >> > > >>>>>> > > > these
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> > >> regions
> > >> > > at
> > >> > > >>>>>> a time,
> > >> > > >>>>>> > > > which
> > >> > > >>>>>> > > > > > may
> > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > data
> > >> > that
> > >> > > is
> > >> > > >>>>>> much
> > >> > > >>>>>> > > > better
> > >> > > >>>>>> > > > > > > spread over region servers.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > >> affect
> > >> > > >>>>>> reading
> > >> > > >>>>>> > > > > performance
> > >> > > >>>>>> > > > > > at
> > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> times
> > >> more
> > >> > > >>>>>> splits and
> > >> > > >>>>>> > > > hence
> > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> cases
> > >> this
> > >> > > even
> > >> > > >>>>>> > improves
> > >> > > >>>>>> > > > > > overall
> > >> > > >>>>>> > > > > > > performance of the MR job since work is better
> > >> > > distributed
> > >> > > >>>>>> over
> > >> > > >>>>>> > > > cluster
> > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> constantly
> > >> > process
> > >> > > >>>>>> the
> > >> > > >>>>>> > coming
> > >> > > >>>>>> > > > > delta
> > >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> > >> regions
> > >> > > >>>>>> depending
> > >> > > >>>>>> > on
> > >> > > >>>>>> > > > > > > processing frequency).
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > If you can share details on your case, that
> will
> > >> help
> > >> > to
> > >> > > >>>>>> > understand
> > >> > > >>>>>> > > > > what
> > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > ----
> > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > >> -
> > >> > > Nutch
> > >> > > >>>>>> -
> > >> > > >>>>>> > Hadoop
> > >> > > >>>>>> > > -
> > >> > > >>>>>> > > > > > HBase
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > >> > > >>>>>> yuzhihong@gmail.com>
> > >> > > >>>>>> > > wrote:
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> > to
> > >> one
> > >> > > >>>>>> scanner
> > >> > > >>>>>> > > > > > originally,
> > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > impact
> > >> ?
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > Thanks
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> Baranau
> > <
> > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > > > >wrote:
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > > Hello guys,
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > >> project/lib
> > >> > > >>>>>> around
> > >> > > >>>>>> > > HBase:
> > >> > > >>>>>> > > > > > > HBaseWD.
> > >> > > >>>>>> > > > > > > > > It
> > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > load
> > >> > > (across
> > >> > > >>>>>> > > > > regionservers)
> > >> > > >>>>>> > > > > > > > when
> > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> > >> nature)
> > >> > > >>>>>> records.
> > >> > > >>>>>> > It
> > >> > > >>>>>> > > > > > > implements
> > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > times
> > >> on
> > >> > > this
> > >> > > >>>>>> > mailing
> > >> > > >>>>>> > > > list
> > >> > > >>>>>> > > > > > > (e.g.
> > >> > > >>>>>> > > > > > > > > here:
> http://search-hadoop.com/m/gNRA82No5Wk
> > ).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find the sources at
> > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > >> > > >>>>>> > > > > > > > > also
> > >> > > >>>>>> > > > > > > > > a jar of current version for convenience).
> It
> > >> is
> > >> > > very
> > >> > > >>>>>> easy to
> > >> > > >>>>>> > > > make
> > >> > > >>>>>> > > > > > use
> > >> > > >>>>>> > > > > > > of
> > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> > >> with
> > >> > 1+2
> > >> > > >>>>>> lines of
> > >> > > >>>>>> > > > code
> > >> > > >>>>>> > > > > > (one
> > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> configuring
> > >> > > MapReduce
> > >> > > >>>>>> job).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> lib
> > >> [1].
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > > > ----
> > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > >> Lucene
> > >> > -
> > >> > > >>>>>> Nutch -
> > >> > > >>>>>> > > > Hadoop
> > >> > > >>>>>> > > > > -
> > >> > > >>>>>> > > > > > > > HBase
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > [1]
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Description:
> > >> > > >>>>>> > > > > > > > > ------------
> > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> (sequential)
> > >> > Writes.
> > >> > > >>>>>> It was
> > >> > > >>>>>> > > > > inspired
> > >> > > >>>>>> > > > > > by
> > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> the
> > >> > > problem
> > >> > > >>>>>> of
> > >> > > >>>>>> > > choosing
> > >> > > >>>>>> > > > > > > > between:
> > >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> > >> (e.g.
> > >> > > >>>>>> time-series
> > >> > > >>>>>> > > data
> > >> > > >>>>>> > > > > > with
> > >> > > >>>>>> > > > > > > > row
> > >> > > >>>>>> > > > > > > > > key
> > >> > > >>>>>> > > > > > > > >  built based on ts)
> > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> fast
> > >> > range
> > >> > > >>>>>> scans
> > >> > > >>>>>> > with
> > >> > > >>>>>> > > > help
> > >> > > >>>>>> > > > > > of
> > >> > > >>>>>> > > > > > > > > setting
> > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > single
> > >> > > region
> > >> > > >>>>>> server
> > >> > > >>>>>> > > > > > > hot-spotting
> > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> in
> > >> > > sequence
> > >> > > >>>>>> all
> > >> > > >>>>>> > > records
> > >> > > >>>>>> > > > > end
> > >> > > >>>>>> > > > > > > up
> > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > >> > performance
> > >> > > >>>>>> by
> > >> > > >>>>>> > > > > distributing
> > >> > > >>>>>> > > > > > > new
> > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > >> possible
> > >> > > >>>>>> doing fast
> > >> > > >>>>>> > > > range
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > > > against written data.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle
> of
> > >> the
> > >> > > two
> > >> > > >>>>>> above
> > >> > > >>>>>> > and
> > >> > > >>>>>> > > > > > proved
> > >> > > >>>>>> > > > > > > to
> > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> the
> > >> > > cluster
> > >> > > >>>>>> during
> > >> > > >>>>>> > > > > writing
> > >> > > >>>>>> > > > > > > data
> > >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> > >> > provides
> > >> > > >>>>>> very
> > >> > > >>>>>> > > simple
> > >> > > >>>>>> > > > > API
> > >> > > >>>>>> > > > > > to
> > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> with
> > >> > > existing
> > >> > > >>>>>> code.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> info
> > >> as
> > >> > > they
> > >> > > >>>>>> aimed
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > act
> > >> > > >>>>>> > > > > as
> > >> > > >>>>>> > > > > > > > > example.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > >> > > >>>>>> > > > > > > > > ----------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > which
> > >> > are
> > >> > > >>>>>> being
> > >> > > >>>>>> > > written
> > >> > > >>>>>> > > > > in
> > >> > > >>>>>> > > > > > up
> > >> > > >>>>>> > > > > > > > to
> > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > >> distributing
> > >> > > into
> > >> > > >>>>>> 32
> > >> > > >>>>>> > > buckets
> > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > >> > > >>>>>> > > > > > > > >                           new
> > >> > > >>>>>> > > > > > > > >
> > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > >> > > >>>>>> > > > > > > > >      Put put = new
> > >> > > >>>>>> > > > > >
> Put(keyDistributor.getDistributedKey(originalKey));
> > >> > > >>>>>> > > > > > > > >      ... // add values
> > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > >> > > (internally
> > >> > > >>>>>> > > > > <bucketsCount>
> > >> > > >>>>>> > > > > > > > > scanners
> > >> > > >>>>>> > > > > > > > > executed):
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > >> > > >>>>>> > > > > > > > > keyDistributor);
> > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > >> > > >>>>>> > > > > > > > >      ...
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> > >> chunk
> > >> > > >>>>>> specified by
> > >> > > >>>>>> > > > Scan:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > >> > HBaseConfiguration.create();
> > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > "testMapreduceJob");
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >>  TableMapReduceUtil.initTableMapperJob("table",
> > >> > > >>>>>> scan,
> > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > >> > > >>>>>> ImmutableBytesWritable.class,
> > >> > > >>>>>> > > > > > > Result.class,
> > >> > > >>>>>> > > > > > > > > job);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    // Substituting standard
> TableInputFormat
> > >> which
> > >> > > was
> > >> > > >>>>>> set in
> > >> > > >>>>>> > > > > > > > >    //
> > >> TableMapReduceUtil.initTableMapperJob(...)
> > >> > > >>>>>> > > > > > > > >
> > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > >> > > >>>>>> > > > > > > > >
> > >>  keyDistributor.addInfo(job.getConfiguration());
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > >> support
> > >> > > >>>>>> custom row
> > >> > > >>>>>> > > key
> > >> > > >>>>>> > > > > > > > > distribution
> > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > >> distributing
> > >> > > >>>>>> logic just
> > >> > > >>>>>> > > > > > implement
> > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > which
> > >> is
> > >> > > >>>>>> really very
> > >> > > >>>>>> > > > > simple:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    public abstract class
> > >> AbstractRowKeyDistributor
> > >> > > >>>>>> implements
> > >> > > >>>>>> > > > > > > > > Parametrizable {
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> > getDistributedKey(byte[]
> > >> > > >>>>>> > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> getOriginalKey(byte[]
> > >> > > >>>>>> > adjustedKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > >> > > >>>>>> getAllDistributedKeys(byte[]
> > >> > > >>>>>> > > > > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > >
> > >> > > >>>>>> >
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Thank you, I like the second option better to avoid the roundtrip to HBase.
I am trying it out now.

On Wed, May 18, 2011 at 10:03 AM, Alex Baranau <al...@gmail.com>wrote:

> There are several options here. E.g.:
>
> 1) Given that you have "original key" of the record, you can fetch the
> stored record key from HBase and use it to create Put with updated (or new)
> cells.
>
> Currently you'll need to use distributes scan for that, there's not
> analogue
> for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).
>
> Note: you need to first find out the real key of stored record by fetching
> data from HBase in case you use included in current lib
> RowKeyDistributorByOneBytePrefix. Alternatively, see next option:
>
> 2) You can create your own RowKeyDistributor implementation which will
> create "distributed key" based on original key value so that later when you
> have original key and want to update the record you can calculate
> distributed key without roundtrip to HBase.
>
> E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
> original key (https://github.com/sematext/HBaseWD/issues/2).
>
>
>
> In either way you don't need to delete record to update some cells of it or
> add new cells.
>
> Please let me know if you have more Qs!
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > I have another question. For overwriting, do I need to delete the
> existing
> > one before re-writing it?
> >
> > On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> > >
> > >
> > > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >wrote:
> > >
> > >> Thanks for the interest!
> > >>
> > >> We are using it in production. It is simple and hence quite stable.
> > Though
> > >> some minor pieces are missing (like
> > >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> > >> stability
> > >> and/or major functionality.
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> > >> wrote:
> > >>
> > >> > What's the status on this package? Is it mature enough?
> > >> >  I am using it in my project, tried out the write method yesterday
> and
> > >> > going
> > >> > to incorporate into read method tomorrow.
> > >> >
> > >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > > The start/end rows may be written twice.
> > >> > >
> > >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> > "bearable"
> > >> in
> > >> > > attribute value no matter how long are they (keys), since we
> already
> > >> OK
> > >> > > with
> > >> > > transferring them initially (i.e. we should be OK with
> transferring
> > 2x
> > >> > > times
> > >> > > more).
> > >> > >
> > >> > > So, what about the suggestion of sourceScan attribute value I
> > >> mentioned?
> > >> > If
> > >> > > you can tell why it isn't sufficient in your case, I'd have more
> > info
> > >> to
> > >> > > think about better suggestion ;)
> > >> > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA.
> > >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > >
> > >> > > np. Can do that. Just thought that they (patches) can be sorted by
> > >> date
> > >> > to
> > >> > > find out the final one (aka "convention over naming-rules").
> > >> > >
> > >> > > Alex.
> > >> > >
> > >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> > >> rows
> > >> > > with
> > >> > > > Scan object.
> > >> > > > In write() method, we now have:
> > >> > > >     Bytes.writeByteArray(out, this.startRow);
> > >> > > >     Bytes.writeByteArray(out, this.stopRow);
> > >> > > > ...
> > >> > > >       for (Map.Entry<String, byte[]> attr :
> > >> this.attributes.entrySet())
> > >> > {
> > >> > > >         WritableUtils.writeString(out, attr.getKey());
> > >> > > >         Bytes.writeByteArray(out, attr.getValue());
> > >> > > >       }
> > >> > > > The start/end rows may be written twice.
> > >> > > >
> > >> > > > Of course, you have full control over how to generate the unique
> > ID
> > >> for
> > >> > > > "sourceScan" attribute.
> > >> > > >
> > >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> > the
> > >> > > second
> > >> > > > should be named HBASE-3811-v2.patch<
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >> > > >?
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > >
> > >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > >> > alex.baranov.v@gmail.com
> > >> > > >wrote:
> > >> > > >
> > >> > > >> > Can you remove the first version ?
> > >> > > >> Isn't it ok to keep it in JIRA issue?
> > >> > > >>
> > >> > > >>
> > >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> > >> supports
> > >> > > >> setAttribute() ?
> > >> > > >> > If it does, can you encode start row and end row as
> > "sourceScan"
> > >> > > >> attribute ?
> > >> > > >>
> > >> > > >> Yeah, smth like this is going to be implemented. Though I'd
> still
> > >> want
> > >> > > to
> > >> > > >> hear from the devs the story about Scan version.
> > >> > > >>
> > >> > > >>
> > >> > > >> > One consideration is that start row or end row may be quite
> > long.
> > >> > > >>
> > >> > > >> Yeah, that is was my though too at first. Though it might be
> ok,
> > >> since
> > >> > > we
> > >> > > >> anyways "transfer" start/stop rows with Scan object.
> > >> > > >>
> > >> > > >> > What do you think ?
> > >> > > >>
> > >> > > >> I'd love to hear from you is this variant I mentioned is what
> we
> > >> are
> > >> > > >> looking at here:
> > >> > > >>
> > >> > > >>
> > >> > > >> > From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >> distributed scan.
> > >> > > >> > I.e. group scans which were fired by single distributed scan.
> > If
> > >> > > that's
> > >> > > >> what you want, distributed
> > >> > > >> > scan can generate unique ID and set, say "sourceScan"
> attribute
> > >> to
> > >> > its
> > >> > > >> value. This way we'll
> > >> > > >> > have <# of distinct "sourceScan" attribute values> = <number
> of
> > >> > > >> distributed scans invoked by
> > >> > > >> > client side> and two scans on server side will have the same
> > >> > > >> "sourceScan" attribute iff they
> > >> > > >> > "belong" to same distributed scan.
> > >> > > >>
> > >> > > >>
> > >> > > >> Alex Baranau
> > >> > > >> ----
> > >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > >> -
> > >> > > >> HBase
> > >> > > >>
> > >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> > >> wrote:
> > >> > > >>
> > >> > > >>> Alex:
> > >> > > >>> Your second patch looks good.
> > >> > > >>> Can you remove the first version ?
> > >> > > >>>
> > >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> > supports
> > >> > > >>> setAttribute() ?
> > >> > > >>> If it does, can you encode start row and end row as
> "sourceScan"
> > >> > > >>> attribute ?
> > >> > > >>>
> > >> > > >>> One consideration is that start row or end row may be quite
> > long.
> > >> > > >>> Ideally we should store hash code of source Scan object as
> > >> > "sourceScan"
> > >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add
> it,
> > >> that
> > >> > > would
> > >> > > >>> require running all Scan related tests.
> > >> > > >>>
> > >> > > >>> What do you think ?
> > >> > > >>>
> > >> > > >>> Thanks
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > >> > > alex.baranov.v@gmail.com>wrote:
> > >> > > >>>
> > >> > > >>>> Sorry for the delay in response (public holidays here).
> > >> > > >>>>
> > >> > > >>>> This depends on what info you are looking for on server side.
> > >> > > >>>>
> > >> > > >>>> From what I understand, you want to distinguish scans fired
> by
> > >> the
> > >> > > same
> > >> > > >>>> distributed scan. I.e. group scans which were fired by single
> > >> > > distributed
> > >> > > >>>> scan. If that's what you want, distributed scan can generate
> > >> unique
> > >> > ID
> > >> > > and
> > >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> > have
> > >> <#
> > >> > > of
> > >> > > >>>> distinct "sourceScan" attribute values> = <number of
> > distributed
> > >> > scans
> > >> > > >>>> invoked by client side> and two scans on server side will
> have
> > >> the
> > >> > > same
> > >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> > >> scan.
> > >> > > >>>>
> > >> > > >>>> Is this what are you looking for?
> > >> > > >>>>
> > >> > > >>>> Alex Baranau
> > >> > > >>>>
> > >> > > >>>> P.S. attached patch for HBASE-3811<
> > >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > >> > > >>>> .
> > >> > > >>>> P.S-2. should this conversation be moved to dev list?
> > >> > > >>>>
> > >> > > >>>> ----
> > >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > >> Hadoop
> > >> > -
> > >> > > >>>> HBase
> > >> > > >>>>
> > >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yuzhihong@gmail.com
> >
> > >> > wrote:
> > >> > > >>>>
> > >> > > >>>>> Alex:
> > >> > > >>>>> What type of identification should we put in the map of the
> > Scan
> > >> > > object
> > >> > > >>>>> ?
> > >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> > user
> > >> > can
> > >> > > >>>>> use same distributor on multiple scans.
> > >> > > >>>>>
> > >> > > >>>>> Please share your thought.
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > >> > > >>>>>>
> > >> > > >>>>>> Alex Baranau
> > >> > > >>>>>> ----
> > >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > >> > Hadoop
> > >> > > -
> > >> > > >>>>>> HBase
> > >> > > >>>>>>
> > >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > >> > > wrote:
> > >> > > >>>>>>
> > >> > > >>>>>> > My plan was to make regions that have active scanners
> more
> > >> > stable
> > >> > > -
> > >> > > >>>>>> trying
> > >> > > >>>>>> > not to move them when balancing.
> > >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> > Scan
> > >> so
> > >> > > >>>>>> that the
> > >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > >> > > >>>>>> >
> > >> > > >>>>>> > If you can file a JIRA, that would be great.
> > >> > > >>>>>> >
> > >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > >> > > >>>>>> alex.baranov.v@gmail.com
> > >> > > >>>>>> > >wrote:
> > >> > > >>>>>> >
> > >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > >> > > >>>>>> differently) when
> > >> > > >>>>>> > > determining the load?
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > The current code looks like this:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > class DistributedScanner:
> > >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> > >> Scan
> > >> > > >>>>>> original,
> > >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> > >> IOException {
> > >> > > >>>>>> > >    byte[][] startKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > >> > > >>>>>> > >    byte[][] stopKeys =
> > >> > > >>>>>> > >
> > >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> > >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > >> > > >>>>>> > >      scans[i] = new Scan(original);
> > >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    ResultScanner[] rss = new
> > >> ResultScanner[startKeys.length];
> > >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > >> > > >>>>>> > >    }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > >    return new DistributedScanner(rss);
> > >> > > >>>>>> > >  }
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> > we
> > >> > need
> > >> > > to
> > >> > > >>>>>> either
> > >> > > >>>>>> > > use some different (derived from Scan) class or add
> some
> > >> > > attribute
> > >> > > >>>>>> to
> > >> > > >>>>>> > them.
> > >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> > >> former,
> > >> > > but
> > >> > > >>>>>> I
> > >> > > >>>>>> > don't
> > >> > > >>>>>> > > really like the idea of creating extra class (with no
> > extra
> > >> > > >>>>>> > functionality)
> > >> > > >>>>>> > > just to distinguish it from the base one.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > If you can share why/how do you want to treat them
> > >> differently
> > >> > > on
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side, that would be helpful.
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > Alex Baranau
> > >> > > >>>>>> > > ----
> > >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >> -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > HBase
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> > >> yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > >
> > >> > > >>>>>> > > > My request would be to make the distributed scan
> > >> > identifiable
> > >> > > >>>>>> from
> > >> > > >>>>>> > server
> > >> > > >>>>>> > > > side.
> > >> > > >>>>>> > > > :-)
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > >> > > >>>>>> > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > >wrote:
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> > data
> > >> > for
> > >> > > >>>>>> the whole
> > >> > > >>>>>> > > > table
> > >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> > will
> > >> > fire
> > >> > > N
> > >> > > >>>>>> scans
> > >> > > >>>>>> > > > against
> > >> > > >>>>>> > > > > the same region.
> > >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> > >> > regions
> > >> > > >>>>>> for
> > >> > > >>>>>> > single
> > >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > With current implementation "distributed" scan
> won't
> > be
> > >> > > >>>>>> recognized as
> > >> > > >>>>>> > > > > something special on the server side. It will be an
> > >> > ordinary
> > >> > > >>>>>> scan.
> > >> > > >>>>>> > > Though
> > >> > > >>>>>> > > > > the number of scan will increase, given that the
> > >> typical
> > >> > > >>>>>> situation is
> > >> > > >>>>>> > > > "many
> > >> > > >>>>>> > > > > regions for single table", the scans of the same
> > >> > > "distributed
> > >> > > >>>>>> scan"
> > >> > > >>>>>> > are
> > >> > > >>>>>> > > > > likely not to hit the same region.
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel
> free
> > >> to
> > >> > ask
> > >> > > >>>>>> more ;)
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > Alex Baranau
> > >> > > >>>>>> > > > > ----
> > >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > >> Nutch
> > >> > -
> > >> > > >>>>>> Hadoop -
> > >> > > >>>>>> > > > HBase
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > >> > > yuzhihong@gmail.com>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > > > > Alex:
> > >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> > >> scan"
> > >> > at
> > >> > > >>>>>> server
> > >> > > >>>>>> > > side.
> > >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> > >> regions
> > >> > for
> > >> > > >>>>>> the
> > >> > > >>>>>> > > > underlying
> > >> > > >>>>>> > > > > > table.
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > Cheers
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > >> > > >>>>>> > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > >wrote:
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > > > > Hi Ted,
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > We currently use this tool in the scenario
> where
> > >> data
> > >> > is
> > >> > > >>>>>> consumed
> > >> > > >>>>>> > > by
> > >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> > >> performance
> > >> > of
> > >> > > >>>>>> pure
> > >> > > >>>>>> > > > > "distributed
> > >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I
> expect
> > >> it
> > >> > to
> > >> > > be
> > >> > > >>>>>> close
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > > > simple
> > >> > > >>>>>> > > > > > > scan performance, or may be sometimes even
> faster
> > >> > > >>>>>> depending on
> > >> > > >>>>>> > your
> > >> > > >>>>>> > > > > data
> > >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> > timeseries
> > >> > data
> > >> > > >>>>>> > > (sequential)
> > >> > > >>>>>> > > > > > which
> > >> > > >>>>>> > > > > > > is written into the single region at a time,
> then
> > >> e.g.
> > >> > > if
> > >> > > >>>>>> you
> > >> > > >>>>>> > > access
> > >> > > >>>>>> > > > > > delta
> > >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from
> not
> > >> > single
> > >> > > >>>>>> client)
> > >> > > >>>>>> > > > these
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> > >> regions
> > >> > > at
> > >> > > >>>>>> a time,
> > >> > > >>>>>> > > > which
> > >> > > >>>>>> > > > > > may
> > >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> > data
> > >> > that
> > >> > > is
> > >> > > >>>>>> much
> > >> > > >>>>>> > > > better
> > >> > > >>>>>> > > > > > > spread over region servers.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> > >> affect
> > >> > > >>>>>> reading
> > >> > > >>>>>> > > > > performance
> > >> > > >>>>>> > > > > > at
> > >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount
> times
> > >> more
> > >> > > >>>>>> splits and
> > >> > > >>>>>> > > > hence
> > >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many
> cases
> > >> this
> > >> > > even
> > >> > > >>>>>> > improves
> > >> > > >>>>>> > > > > > overall
> > >> > > >>>>>> > > > > > > performance of the MR job since work is better
> > >> > > distributed
> > >> > > >>>>>> over
> > >> > > >>>>>> > > > cluster
> > >> > > >>>>>> > > > > > > (esp. in situation when the aim is to
> constantly
> > >> > process
> > >> > > >>>>>> the
> > >> > > >>>>>> > coming
> > >> > > >>>>>> > > > > delta
> > >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> > >> regions
> > >> > > >>>>>> depending
> > >> > > >>>>>> > on
> > >> > > >>>>>> > > > > > > processing frequency).
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > If you can share details on your case, that
> will
> > >> help
> > >> > to
> > >> > > >>>>>> > understand
> > >> > > >>>>>> > > > > what
> > >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > ----
> > >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > Lucene
> > >> -
> > >> > > Nutch
> > >> > > >>>>>> -
> > >> > > >>>>>> > Hadoop
> > >> > > >>>>>> > > -
> > >> > > >>>>>> > > > > > HBase
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > >> > > >>>>>> yuzhihong@gmail.com>
> > >> > > >>>>>> > > wrote:
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > > > > Interesting project, Alex.
> > >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> > to
> > >> one
> > >> > > >>>>>> scanner
> > >> > > >>>>>> > > > > > originally,
> > >> > > >>>>>> > > > > > > > have you performed load testing to see the
> > impact
> > >> ?
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > Thanks
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex
> Baranau
> > <
> > >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > >> > > >>>>>> > > > > > > > >wrote:
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > > > > Hello guys,
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> > >> project/lib
> > >> > > >>>>>> around
> > >> > > >>>>>> > > HBase:
> > >> > > >>>>>> > > > > > > HBaseWD.
> > >> > > >>>>>> > > > > > > > > It
> > >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> > load
> > >> > > (across
> > >> > > >>>>>> > > > > regionservers)
> > >> > > >>>>>> > > > > > > > when
> > >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> > >> nature)
> > >> > > >>>>>> records.
> > >> > > >>>>>> > It
> > >> > > >>>>>> > > > > > > implements
> > >> > > >>>>>> > > > > > > > > the solution which was discussed several
> > times
> > >> on
> > >> > > this
> > >> > > >>>>>> > mailing
> > >> > > >>>>>> > > > list
> > >> > > >>>>>> > > > > > > (e.g.
> > >> > > >>>>>> > > > > > > > > here:
> http://search-hadoop.com/m/gNRA82No5Wk
> > ).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find the sources at
> > >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > >> > > >>>>>> > > > > > > > > also
> > >> > > >>>>>> > > > > > > > > a jar of current version for convenience).
> It
> > >> is
> > >> > > very
> > >> > > >>>>>> easy to
> > >> > > >>>>>> > > > make
> > >> > > >>>>>> > > > > > use
> > >> > > >>>>>> > > > > > > of
> > >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> > >> with
> > >> > 1+2
> > >> > > >>>>>> lines of
> > >> > > >>>>>> > > > code
> > >> > > >>>>>> > > > > > (one
> > >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for
> configuring
> > >> > > MapReduce
> > >> > > >>>>>> job).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please find below the short intro to the
> lib
> > >> [1].
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Alex Baranau
> > >> > > >>>>>> > > > > > > > > ----
> > >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> > >> Lucene
> > >> > -
> > >> > > >>>>>> Nutch -
> > >> > > >>>>>> > > > Hadoop
> > >> > > >>>>>> > > > > -
> > >> > > >>>>>> > > > > > > > HBase
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > [1]
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Description:
> > >> > > >>>>>> > > > > > > > > ------------
> > >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing
> (sequential)
> > >> > Writes.
> > >> > > >>>>>> It was
> > >> > > >>>>>> > > > > inspired
> > >> > > >>>>>> > > > > > by
> > >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around
> the
> > >> > > problem
> > >> > > >>>>>> of
> > >> > > >>>>>> > > choosing
> > >> > > >>>>>> > > > > > > > between:
> > >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> > >> (e.g.
> > >> > > >>>>>> time-series
> > >> > > >>>>>> > > data
> > >> > > >>>>>> > > > > > with
> > >> > > >>>>>> > > > > > > > row
> > >> > > >>>>>> > > > > > > > > key
> > >> > > >>>>>> > > > > > > > >  built based on ts)
> > >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > First approach makes possible to perform
> fast
> > >> > range
> > >> > > >>>>>> scans
> > >> > > >>>>>> > with
> > >> > > >>>>>> > > > help
> > >> > > >>>>>> > > > > > of
> > >> > > >>>>>> > > > > > > > > setting
> > >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> > single
> > >> > > region
> > >> > > >>>>>> server
> > >> > > >>>>>> > > > > > > hot-spotting
> > >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go
> in
> > >> > > sequence
> > >> > > >>>>>> all
> > >> > > >>>>>> > > records
> > >> > > >>>>>> > > > > end
> > >> > > >>>>>> > > > > > > up
> > >> > > >>>>>> > > > > > > > > written into a single region at a time).
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > >> > performance
> > >> > > >>>>>> by
> > >> > > >>>>>> > > > > distributing
> > >> > > >>>>>> > > > > > > new
> > >> > > >>>>>> > > > > > > > > records over random regions but makes not
> > >> possible
> > >> > > >>>>>> doing fast
> > >> > > >>>>>> > > > range
> > >> > > >>>>>> > > > > > > scans
> > >> > > >>>>>> > > > > > > > > against written data.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle
> of
> > >> the
> > >> > > two
> > >> > > >>>>>> above
> > >> > > >>>>>> > and
> > >> > > >>>>>> > > > > > proved
> > >> > > >>>>>> > > > > > > to
> > >> > > >>>>>> > > > > > > > > perform well by distributing records over
> the
> > >> > > cluster
> > >> > > >>>>>> during
> > >> > > >>>>>> > > > > writing
> > >> > > >>>>>> > > > > > > data
> > >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> > >> > provides
> > >> > > >>>>>> very
> > >> > > >>>>>> > > simple
> > >> > > >>>>>> > > > > API
> > >> > > >>>>>> > > > > > to
> > >> > > >>>>>> > > > > > > > > work with which makes it perfect to use
> with
> > >> > > existing
> > >> > > >>>>>> code.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage
> info
> > >> as
> > >> > > they
> > >> > > >>>>>> aimed
> > >> > > >>>>>> > to
> > >> > > >>>>>> > > > act
> > >> > > >>>>>> > > > > as
> > >> > > >>>>>> > > > > > > > > example.
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > >> > > >>>>>> > > > > > > > > ----------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> > which
> > >> > are
> > >> > > >>>>>> being
> > >> > > >>>>>> > > written
> > >> > > >>>>>> > > > > in
> > >> > > >>>>>> > > > > > up
> > >> > > >>>>>> > > > > > > > to
> > >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> > >> distributing
> > >> > > into
> > >> > > >>>>>> 32
> > >> > > >>>>>> > > buckets
> > >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > >> > > >>>>>> > > > > > > > >                           new
> > >> > > >>>>>> > > > > > > > >
> > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > >> > > >>>>>> > > > > > > > >      Put put = new
> > >> > > >>>>>> > > > > >
> Put(keyDistributor.getDistributedKey(originalKey));
> > >> > > >>>>>> > > > > > > > >      ... // add values
> > >> > > >>>>>> > > > > > > > >      hTable.put(put);
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > >> > > (internally
> > >> > > >>>>>> > > > > <bucketsCount>
> > >> > > >>>>>> > > > > > > > > scanners
> > >> > > >>>>>> > > > > > > > > executed):
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > >> > > >>>>>> DistributedScanner.create(hTable, scan,
> > >> > > >>>>>> > > > > > > > > keyDistributor);
> > >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > >> > > >>>>>> > > > > > > > >      ...
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> > >> chunk
> > >> > > >>>>>> specified by
> > >> > > >>>>>> > > > Scan:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Configuration conf =
> > >> > HBaseConfiguration.create();
> > >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> > "testMapreduceJob");
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >>  TableMapReduceUtil.initTableMapperJob("table",
> > >> > > >>>>>> scan,
> > >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > >> > > >>>>>> ImmutableBytesWritable.class,
> > >> > > >>>>>> > > > > > > Result.class,
> > >> > > >>>>>> > > > > > > > > job);
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    // Substituting standard
> TableInputFormat
> > >> which
> > >> > > was
> > >> > > >>>>>> set in
> > >> > > >>>>>> > > > > > > > >    //
> > >> TableMapReduceUtil.initTableMapperJob(...)
> > >> > > >>>>>> > > > > > > > >
> > >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > >> > > >>>>>> > > > > > > > >
> > >>  keyDistributor.addInfo(job.getConfiguration());
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > >> > > >>>>>> > > > > > > > > -----------------------------------------
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> > >> support
> > >> > > >>>>>> custom row
> > >> > > >>>>>> > > key
> > >> > > >>>>>> > > > > > > > > distribution
> > >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> > >> distributing
> > >> > > >>>>>> logic just
> > >> > > >>>>>> > > > > > implement
> > >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> > which
> > >> is
> > >> > > >>>>>> really very
> > >> > > >>>>>> > > > > simple:
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > > >    public abstract class
> > >> AbstractRowKeyDistributor
> > >> > > >>>>>> implements
> > >> > > >>>>>> > > > > > > > > Parametrizable {
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> > getDistributedKey(byte[]
> > >> > > >>>>>> > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[]
> > >> getOriginalKey(byte[]
> > >> > > >>>>>> > adjustedKey);
> > >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > >> > > >>>>>> getAllDistributedKeys(byte[]
> > >> > > >>>>>> > > > > > > originalKey);
> > >> > > >>>>>> > > > > > > > >      ... // some utility methods
> > >> > > >>>>>> > > > > > > > >    }
> > >> > > >>>>>> > > > > > > > >
> > >> > > >>>>>> > > > > > > >
> > >> > > >>>>>> > > > > > >
> > >> > > >>>>>> > > > > >
> > >> > > >>>>>> > > > >
> > >> > > >>>>>> > > >
> > >> > > >>>>>> > >
> > >> > > >>>>>> >
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

There are several options here. E.g.:

1) Given that you have "original key" of the record, you can fetch the
stored record key from HBase and use it to create Put with updated (or new)
cells.

Currently you'll need to use distributes scan for that, there's not analogue
for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).

Note: you need to first find out the real key of stored record by fetching
data from HBase in case you use included in current lib
RowKeyDistributorByOneBytePrefix. Alternatively, see next option:

2) You can create your own RowKeyDistributor implementation which will
create "distributed key" based on original key value so that later when you
have original key and want to update the record you can calculate
distributed key without roundtrip to HBase.

E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
original key (https://github.com/sematext/HBaseWD/issues/2).



In either way you don't need to delete record to update some cells of it or
add new cells.

Please let me know if you have more Qs!

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com> wrote:

> I have another question. For overwriting, do I need to delete the existing
> one before re-writing it?
>
> On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> >
> >
> > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
> >
> >> Thanks for the interest!
> >>
> >> We are using it in production. It is simple and hence quite stable.
> Though
> >> some minor pieces are missing (like
> >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> >> stability
> >> and/or major functionality.
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >>
> >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> >> wrote:
> >>
> >> > What's the status on this package? Is it mature enough?
> >> >  I am using it in my project, tried out the write method yesterday and
> >> > going
> >> > to incorporate into read method tomorrow.
> >> >
> >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> >> > >wrote:
> >> >
> >> > > > The start/end rows may be written twice.
> >> > >
> >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> "bearable"
> >> in
> >> > > attribute value no matter how long are they (keys), since we already
> >> OK
> >> > > with
> >> > > transferring them initially (i.e. we should be OK with transferring
> 2x
> >> > > times
> >> > > more).
> >> > >
> >> > > So, what about the suggestion of sourceScan attribute value I
> >> mentioned?
> >> > If
> >> > > you can tell why it isn't sufficient in your case, I'd have more
> info
> >> to
> >> > > think about better suggestion ;)
> >> > >
> >> > > > It is Okay to keep all versions of your patch in the JIRA.
> >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >> > > >?
> >> > >
> >> > > np. Can do that. Just thought that they (patches) can be sorted by
> >> date
> >> > to
> >> > > find out the final one (aka "convention over naming-rules").
> >> > >
> >> > > Alex.
> >> > >
> >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> > >
> >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> >> rows
> >> > > with
> >> > > > Scan object.
> >> > > > In write() method, we now have:
> >> > > >     Bytes.writeByteArray(out, this.startRow);
> >> > > >     Bytes.writeByteArray(out, this.stopRow);
> >> > > > ...
> >> > > >       for (Map.Entry<String, byte[]> attr :
> >> this.attributes.entrySet())
> >> > {
> >> > > >         WritableUtils.writeString(out, attr.getKey());
> >> > > >         Bytes.writeByteArray(out, attr.getValue());
> >> > > >       }
> >> > > > The start/end rows may be written twice.
> >> > > >
> >> > > > Of course, you have full control over how to generate the unique
> ID
> >> for
> >> > > > "sourceScan" attribute.
> >> > > >
> >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> the
> >> > > second
> >> > > > should be named HBASE-3811-v2.patch<
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >> > > >?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> >> > alex.baranov.v@gmail.com
> >> > > >wrote:
> >> > > >
> >> > > >> > Can you remove the first version ?
> >> > > >> Isn't it ok to keep it in JIRA issue?
> >> > > >>
> >> > > >>
> >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> >> supports
> >> > > >> setAttribute() ?
> >> > > >> > If it does, can you encode start row and end row as
> "sourceScan"
> >> > > >> attribute ?
> >> > > >>
> >> > > >> Yeah, smth like this is going to be implemented. Though I'd still
> >> want
> >> > > to
> >> > > >> hear from the devs the story about Scan version.
> >> > > >>
> >> > > >>
> >> > > >> > One consideration is that start row or end row may be quite
> long.
> >> > > >>
> >> > > >> Yeah, that is was my though too at first. Though it might be ok,
> >> since
> >> > > we
> >> > > >> anyways "transfer" start/stop rows with Scan object.
> >> > > >>
> >> > > >> > What do you think ?
> >> > > >>
> >> > > >> I'd love to hear from you is this variant I mentioned is what we
> >> are
> >> > > >> looking at here:
> >> > > >>
> >> > > >>
> >> > > >> > From what I understand, you want to distinguish scans fired by
> >> the
> >> > > same
> >> > > >> distributed scan.
> >> > > >> > I.e. group scans which were fired by single distributed scan.
> If
> >> > > that's
> >> > > >> what you want, distributed
> >> > > >> > scan can generate unique ID and set, say "sourceScan" attribute
> >> to
> >> > its
> >> > > >> value. This way we'll
> >> > > >> > have <# of distinct "sourceScan" attribute values> = <number of
> >> > > >> distributed scans invoked by
> >> > > >> > client side> and two scans on server side will have the same
> >> > > >> "sourceScan" attribute iff they
> >> > > >> > "belong" to same distributed scan.
> >> > > >>
> >> > > >>
> >> > > >> Alex Baranau
> >> > > >> ----
> >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> >> -
> >> > > >> HBase
> >> > > >>
> >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> >> wrote:
> >> > > >>
> >> > > >>> Alex:
> >> > > >>> Your second patch looks good.
> >> > > >>> Can you remove the first version ?
> >> > > >>>
> >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> supports
> >> > > >>> setAttribute() ?
> >> > > >>> If it does, can you encode start row and end row as "sourceScan"
> >> > > >>> attribute ?
> >> > > >>>
> >> > > >>> One consideration is that start row or end row may be quite
> long.
> >> > > >>> Ideally we should store hash code of source Scan object as
> >> > "sourceScan"
> >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add it,
> >> that
> >> > > would
> >> > > >>> require running all Scan related tests.
> >> > > >>>
> >> > > >>> What do you think ?
> >> > > >>>
> >> > > >>> Thanks
> >> > > >>>
> >> > > >>>
> >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> >> > > alex.baranov.v@gmail.com>wrote:
> >> > > >>>
> >> > > >>>> Sorry for the delay in response (public holidays here).
> >> > > >>>>
> >> > > >>>> This depends on what info you are looking for on server side.
> >> > > >>>>
> >> > > >>>> From what I understand, you want to distinguish scans fired by
> >> the
> >> > > same
> >> > > >>>> distributed scan. I.e. group scans which were fired by single
> >> > > distributed
> >> > > >>>> scan. If that's what you want, distributed scan can generate
> >> unique
> >> > ID
> >> > > and
> >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> have
> >> <#
> >> > > of
> >> > > >>>> distinct "sourceScan" attribute values> = <number of
> distributed
> >> > scans
> >> > > >>>> invoked by client side> and two scans on server side will have
> >> the
> >> > > same
> >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> >> scan.
> >> > > >>>>
> >> > > >>>> Is this what are you looking for?
> >> > > >>>>
> >> > > >>>> Alex Baranau
> >> > > >>>>
> >> > > >>>> P.S. attached patch for HBASE-3811<
> >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> >> > > >>>> .
> >> > > >>>> P.S-2. should this conversation be moved to dev list?
> >> > > >>>>
> >> > > >>>> ----
> >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> Hadoop
> >> > -
> >> > > >>>> HBase
> >> > > >>>>
> >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com>
> >> > wrote:
> >> > > >>>>
> >> > > >>>>> Alex:
> >> > > >>>>> What type of identification should we put in the map of the
> Scan
> >> > > object
> >> > > >>>>> ?
> >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> user
> >> > can
> >> > > >>>>> use same distributor on multiple scans.
> >> > > >>>>>
> >> > > >>>>> Please share your thought.
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> >> > > >>>>>
> >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> >> > > >>>>>>
> >> > > >>>>>> Alex Baranau
> >> > > >>>>>> ----
> >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> > Hadoop
> >> > > -
> >> > > >>>>>> HBase
> >> > > >>>>>>
> >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yuzhihong@gmail.com
> >
> >> > > wrote:
> >> > > >>>>>>
> >> > > >>>>>> > My plan was to make regions that have active scanners more
> >> > stable
> >> > > -
> >> > > >>>>>> trying
> >> > > >>>>>> > not to move them when balancing.
> >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> Scan
> >> so
> >> > > >>>>>> that the
> >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> >> > > >>>>>> >
> >> > > >>>>>> > If you can file a JIRA, that would be great.
> >> > > >>>>>> >
> >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> >> > > >>>>>> alex.baranov.v@gmail.com
> >> > > >>>>>> > >wrote:
> >> > > >>>>>> >
> >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> >> > > >>>>>> differently) when
> >> > > >>>>>> > > determining the load?
> >> > > >>>>>> > >
> >> > > >>>>>> > > The current code looks like this:
> >> > > >>>>>> > >
> >> > > >>>>>> > > class DistributedScanner:
> >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> >> Scan
> >> > > >>>>>> original,
> >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> >> IOException {
> >> > > >>>>>> > >    byte[][] startKeys =
> >> > > >>>>>> > >
> >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> >> > > >>>>>> > >    byte[][] stopKeys =
> >> > > >>>>>> > >
> >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> >> > > >>>>>> > >      scans[i] = new Scan(original);
> >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> >> > > >>>>>> > >    }
> >> > > >>>>>> > >
> >> > > >>>>>> > >    ResultScanner[] rss = new
> >> ResultScanner[startKeys.length];
> >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> >> > > >>>>>> > >    }
> >> > > >>>>>> > >
> >> > > >>>>>> > >    return new DistributedScanner(rss);
> >> > > >>>>>> > >  }
> >> > > >>>>>> > >
> >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> we
> >> > need
> >> > > to
> >> > > >>>>>> either
> >> > > >>>>>> > > use some different (derived from Scan) class or add some
> >> > > attribute
> >> > > >>>>>> to
> >> > > >>>>>> > them.
> >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> >> former,
> >> > > but
> >> > > >>>>>> I
> >> > > >>>>>> > don't
> >> > > >>>>>> > > really like the idea of creating extra class (with no
> extra
> >> > > >>>>>> > functionality)
> >> > > >>>>>> > > just to distinguish it from the base one.
> >> > > >>>>>> > >
> >> > > >>>>>> > > If you can share why/how do you want to treat them
> >> differently
> >> > > on
> >> > > >>>>>> server
> >> > > >>>>>> > > side, that would be helpful.
> >> > > >>>>>> > >
> >> > > >>>>>> > > Alex Baranau
> >> > > >>>>>> > > ----
> >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> >> -
> >> > > >>>>>> Hadoop -
> >> > > >>>>>> > HBase
> >> > > >>>>>> > >
> >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> >> yuzhihong@gmail.com>
> >> > > >>>>>> wrote:
> >> > > >>>>>> > >
> >> > > >>>>>> > > > My request would be to make the distributed scan
> >> > identifiable
> >> > > >>>>>> from
> >> > > >>>>>> > server
> >> > > >>>>>> > > > side.
> >> > > >>>>>> > > > :-)
> >> > > >>>>>> > > >
> >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> >> > > >>>>>> > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > >wrote:
> >> > > >>>>>> > > >
> >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> >> regions
> >> > for
> >> > > >>>>>> the
> >> > > >>>>>> > > > underlying
> >> > > >>>>>> > > > > > table.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> data
> >> > for
> >> > > >>>>>> the whole
> >> > > >>>>>> > > > table
> >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> will
> >> > fire
> >> > > N
> >> > > >>>>>> scans
> >> > > >>>>>> > > > against
> >> > > >>>>>> > > > > the same region.
> >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> >> > regions
> >> > > >>>>>> for
> >> > > >>>>>> > single
> >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> >> scan"
> >> > at
> >> > > >>>>>> server
> >> > > >>>>>> > > side.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > With current implementation "distributed" scan won't
> be
> >> > > >>>>>> recognized as
> >> > > >>>>>> > > > > something special on the server side. It will be an
> >> > ordinary
> >> > > >>>>>> scan.
> >> > > >>>>>> > > Though
> >> > > >>>>>> > > > > the number of scan will increase, given that the
> >> typical
> >> > > >>>>>> situation is
> >> > > >>>>>> > > > "many
> >> > > >>>>>> > > > > regions for single table", the scans of the same
> >> > > "distributed
> >> > > >>>>>> scan"
> >> > > >>>>>> > are
> >> > > >>>>>> > > > > likely not to hit the same region.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel free
> >> to
> >> > ask
> >> > > >>>>>> more ;)
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > Alex Baranau
> >> > > >>>>>> > > > > ----
> >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> >> Nutch
> >> > -
> >> > > >>>>>> Hadoop -
> >> > > >>>>>> > > > HBase
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> >> > > yuzhihong@gmail.com>
> >> > > >>>>>> wrote:
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > > Alex:
> >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> >> scan"
> >> > at
> >> > > >>>>>> server
> >> > > >>>>>> > > side.
> >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> >> regions
> >> > for
> >> > > >>>>>> the
> >> > > >>>>>> > > > underlying
> >> > > >>>>>> > > > > > table.
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > Cheers
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> >> > > >>>>>> > > > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > > > >wrote:
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > > Hi Ted,
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > We currently use this tool in the scenario where
> >> data
> >> > is
> >> > > >>>>>> consumed
> >> > > >>>>>> > > by
> >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> >> performance
> >> > of
> >> > > >>>>>> pure
> >> > > >>>>>> > > > > "distributed
> >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect
> >> it
> >> > to
> >> > > be
> >> > > >>>>>> close
> >> > > >>>>>> > to
> >> > > >>>>>> > > > > > simple
> >> > > >>>>>> > > > > > > scan performance, or may be sometimes even faster
> >> > > >>>>>> depending on
> >> > > >>>>>> > your
> >> > > >>>>>> > > > > data
> >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> timeseries
> >> > data
> >> > > >>>>>> > > (sequential)
> >> > > >>>>>> > > > > > which
> >> > > >>>>>> > > > > > > is written into the single region at a time, then
> >> e.g.
> >> > > if
> >> > > >>>>>> you
> >> > > >>>>>> > > access
> >> > > >>>>>> > > > > > delta
> >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from not
> >> > single
> >> > > >>>>>> client)
> >> > > >>>>>> > > > these
> >> > > >>>>>> > > > > > > scans
> >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> >> regions
> >> > > at
> >> > > >>>>>> a time,
> >> > > >>>>>> > > > which
> >> > > >>>>>> > > > > > may
> >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> data
> >> > that
> >> > > is
> >> > > >>>>>> much
> >> > > >>>>>> > > > better
> >> > > >>>>>> > > > > > > spread over region servers.
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> >> affect
> >> > > >>>>>> reading
> >> > > >>>>>> > > > > performance
> >> > > >>>>>> > > > > > at
> >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount times
> >> more
> >> > > >>>>>> splits and
> >> > > >>>>>> > > > hence
> >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases
> >> this
> >> > > even
> >> > > >>>>>> > improves
> >> > > >>>>>> > > > > > overall
> >> > > >>>>>> > > > > > > performance of the MR job since work is better
> >> > > distributed
> >> > > >>>>>> over
> >> > > >>>>>> > > > cluster
> >> > > >>>>>> > > > > > > (esp. in situation when the aim is to constantly
> >> > process
> >> > > >>>>>> the
> >> > > >>>>>> > coming
> >> > > >>>>>> > > > > delta
> >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> >> regions
> >> > > >>>>>> depending
> >> > > >>>>>> > on
> >> > > >>>>>> > > > > > > processing frequency).
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > If you can share details on your case, that will
> >> help
> >> > to
> >> > > >>>>>> > understand
> >> > > >>>>>> > > > > what
> >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > Alex Baranau
> >> > > >>>>>> > > > > > > ----
> >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> >> -
> >> > > Nutch
> >> > > >>>>>> -
> >> > > >>>>>> > Hadoop
> >> > > >>>>>> > > -
> >> > > >>>>>> > > > > > HBase
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> >> > > >>>>>> yuzhihong@gmail.com>
> >> > > >>>>>> > > wrote:
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > > Interesting project, Alex.
> >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> to
> >> one
> >> > > >>>>>> scanner
> >> > > >>>>>> > > > > > originally,
> >> > > >>>>>> > > > > > > > have you performed load testing to see the
> impact
> >> ?
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > Thanks
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau
> <
> >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > > > > > >wrote:
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > > Hello guys,
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> >> project/lib
> >> > > >>>>>> around
> >> > > >>>>>> > > HBase:
> >> > > >>>>>> > > > > > > HBaseWD.
> >> > > >>>>>> > > > > > > > > It
> >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> load
> >> > > (across
> >> > > >>>>>> > > > > regionservers)
> >> > > >>>>>> > > > > > > > when
> >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> >> nature)
> >> > > >>>>>> records.
> >> > > >>>>>> > It
> >> > > >>>>>> > > > > > > implements
> >> > > >>>>>> > > > > > > > > the solution which was discussed several
> times
> >> on
> >> > > this
> >> > > >>>>>> > mailing
> >> > > >>>>>> > > > list
> >> > > >>>>>> > > > > > > (e.g.
> >> > > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk
> ).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please find the sources at
> >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> >> > > >>>>>> > > > > > > > > also
> >> > > >>>>>> > > > > > > > > a jar of current version for convenience). It
> >> is
> >> > > very
> >> > > >>>>>> easy to
> >> > > >>>>>> > > > make
> >> > > >>>>>> > > > > > use
> >> > > >>>>>> > > > > > > of
> >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> >> with
> >> > 1+2
> >> > > >>>>>> lines of
> >> > > >>>>>> > > > code
> >> > > >>>>>> > > > > > (one
> >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
> >> > > MapReduce
> >> > > >>>>>> job).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please find below the short intro to the lib
> >> [1].
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Alex Baranau
> >> > > >>>>>> > > > > > > > > ----
> >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> >> Lucene
> >> > -
> >> > > >>>>>> Nutch -
> >> > > >>>>>> > > > Hadoop
> >> > > >>>>>> > > > > -
> >> > > >>>>>> > > > > > > > HBase
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > [1]
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Description:
> >> > > >>>>>> > > > > > > > > ------------
> >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential)
> >> > Writes.
> >> > > >>>>>> It was
> >> > > >>>>>> > > > > inspired
> >> > > >>>>>> > > > > > by
> >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around the
> >> > > problem
> >> > > >>>>>> of
> >> > > >>>>>> > > choosing
> >> > > >>>>>> > > > > > > > between:
> >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> >> (e.g.
> >> > > >>>>>> time-series
> >> > > >>>>>> > > data
> >> > > >>>>>> > > > > > with
> >> > > >>>>>> > > > > > > > row
> >> > > >>>>>> > > > > > > > > key
> >> > > >>>>>> > > > > > > > >  built based on ts)
> >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > First approach makes possible to perform fast
> >> > range
> >> > > >>>>>> scans
> >> > > >>>>>> > with
> >> > > >>>>>> > > > help
> >> > > >>>>>> > > > > > of
> >> > > >>>>>> > > > > > > > > setting
> >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> single
> >> > > region
> >> > > >>>>>> server
> >> > > >>>>>> > > > > > > hot-spotting
> >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go in
> >> > > sequence
> >> > > >>>>>> all
> >> > > >>>>>> > > records
> >> > > >>>>>> > > > > end
> >> > > >>>>>> > > > > > > up
> >> > > >>>>>> > > > > > > > > written into a single region at a time).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> >> > performance
> >> > > >>>>>> by
> >> > > >>>>>> > > > > distributing
> >> > > >>>>>> > > > > > > new
> >> > > >>>>>> > > > > > > > > records over random regions but makes not
> >> possible
> >> > > >>>>>> doing fast
> >> > > >>>>>> > > > range
> >> > > >>>>>> > > > > > > scans
> >> > > >>>>>> > > > > > > > > against written data.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle of
> >> the
> >> > > two
> >> > > >>>>>> above
> >> > > >>>>>> > and
> >> > > >>>>>> > > > > > proved
> >> > > >>>>>> > > > > > > to
> >> > > >>>>>> > > > > > > > > perform well by distributing records over the
> >> > > cluster
> >> > > >>>>>> during
> >> > > >>>>>> > > > > writing
> >> > > >>>>>> > > > > > > data
> >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> >> > provides
> >> > > >>>>>> very
> >> > > >>>>>> > > simple
> >> > > >>>>>> > > > > API
> >> > > >>>>>> > > > > > to
> >> > > >>>>>> > > > > > > > > work with which makes it perfect to use with
> >> > > existing
> >> > > >>>>>> code.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info
> >> as
> >> > > they
> >> > > >>>>>> aimed
> >> > > >>>>>> > to
> >> > > >>>>>> > > > act
> >> > > >>>>>> > > > > as
> >> > > >>>>>> > > > > > > > > example.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> >> > > >>>>>> > > > > > > > > ----------------------------
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> which
> >> > are
> >> > > >>>>>> being
> >> > > >>>>>> > > written
> >> > > >>>>>> > > > > in
> >> > > >>>>>> > > > > > up
> >> > > >>>>>> > > > > > > > to
> >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> >> distributing
> >> > > into
> >> > > >>>>>> 32
> >> > > >>>>>> > > buckets
> >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> >> > > >>>>>> > > > > > > > >                           new
> >> > > >>>>>> > > > > > > > >
> RowKeyDistributorByOneBytePrefix(bucketsCount);
> >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> >> > > >>>>>> > > > > > > > >      Put put = new
> >> > > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
> >> > > >>>>>> > > > > > > > >      ... // add values
> >> > > >>>>>> > > > > > > > >      hTable.put(put);
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> >> > > (internally
> >> > > >>>>>> > > > > <bucketsCount>
> >> > > >>>>>> > > > > > > > > scanners
> >> > > >>>>>> > > > > > > > > executed):
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> >> > > >>>>>> DistributedScanner.create(hTable, scan,
> >> > > >>>>>> > > > > > > > > keyDistributor);
> >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> >> > > >>>>>> > > > > > > > >      ...
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> >> chunk
> >> > > >>>>>> specified by
> >> > > >>>>>> > > > Scan:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Configuration conf =
> >> > HBaseConfiguration.create();
> >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> "testMapreduceJob");
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >>  TableMapReduceUtil.initTableMapperJob("table",
> >> > > >>>>>> scan,
> >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> >> > > >>>>>> ImmutableBytesWritable.class,
> >> > > >>>>>> > > > > > > Result.class,
> >> > > >>>>>> > > > > > > > > job);
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    // Substituting standard TableInputFormat
> >> which
> >> > > was
> >> > > >>>>>> set in
> >> > > >>>>>> > > > > > > > >    //
> >> TableMapReduceUtil.initTableMapperJob(...)
> >> > > >>>>>> > > > > > > > >
> >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> >> > > >>>>>> > > > > > > > >
> >>  keyDistributor.addInfo(job.getConfiguration());
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> >> > > >>>>>> > > > > > > > > -----------------------------------------
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> >> support
> >> > > >>>>>> custom row
> >> > > >>>>>> > > key
> >> > > >>>>>> > > > > > > > > distribution
> >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> >> distributing
> >> > > >>>>>> logic just
> >> > > >>>>>> > > > > > implement
> >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> which
> >> is
> >> > > >>>>>> really very
> >> > > >>>>>> > > > > simple:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    public abstract class
> >> AbstractRowKeyDistributor
> >> > > >>>>>> implements
> >> > > >>>>>> > > > > > > > > Parametrizable {
> >> > > >>>>>> > > > > > > > >      public abstract byte[]
> >> > getDistributedKey(byte[]
> >> > > >>>>>> > > > originalKey);
> >> > > >>>>>> > > > > > > > >      public abstract byte[]
> >> getOriginalKey(byte[]
> >> > > >>>>>> > adjustedKey);
> >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> >> > > >>>>>> getAllDistributedKeys(byte[]
> >> > > >>>>>> > > > > > > originalKey);
> >> > > >>>>>> > > > > > > > >      ... // some utility methods
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > >
> >> > > >>>>>> > >
> >> > > >>>>>> >
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

There are several options here. E.g.:

1) Given that you have "original key" of the record, you can fetch the
stored record key from HBase and use it to create Put with updated (or new)
cells.

Currently you'll need to use distributes scan for that, there's not analogue
for Get operation yet (see https://github.com/sematext/HBaseWD/issues/1).

Note: you need to first find out the real key of stored record by fetching
data from HBase in case you use included in current lib
RowKeyDistributorByOneBytePrefix. Alternatively, see next option:

2) You can create your own RowKeyDistributor implementation which will
create "distributed key" based on original key value so that later when you
have original key and want to update the record you can calculate
distributed key without roundtrip to HBase.

E.g. your RowKeyDistributor implementation you can calculate 1-byte hash of
original key (https://github.com/sematext/HBaseWD/issues/2).



In either way you don't need to delete record to update some cells of it or
add new cells.

Please let me know if you have more Qs!

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, May 18, 2011 at 1:19 AM, Weishung Chung <we...@gmail.com> wrote:

> I have another question. For overwriting, do I need to delete the existing
> one before re-writing it?
>
> On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Yes, it's simple yet useful. I am integrating it. Thanks alot :)
> >
> >
> > On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
> >
> >> Thanks for the interest!
> >>
> >> We are using it in production. It is simple and hence quite stable.
> Though
> >> some minor pieces are missing (like
> >> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> >> stability
> >> and/or major functionality.
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >>
> >> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> >> wrote:
> >>
> >> > What's the status on this package? Is it mature enough?
> >> >  I am using it in my project, tried out the write method yesterday and
> >> > going
> >> > to incorporate into read method tomorrow.
> >> >
> >> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> >> > >wrote:
> >> >
> >> > > > The start/end rows may be written twice.
> >> > >
> >> > > Yeah, I know. I meant that size of startRow+stopRow data is
> "bearable"
> >> in
> >> > > attribute value no matter how long are they (keys), since we already
> >> OK
> >> > > with
> >> > > transferring them initially (i.e. we should be OK with transferring
> 2x
> >> > > times
> >> > > more).
> >> > >
> >> > > So, what about the suggestion of sourceScan attribute value I
> >> mentioned?
> >> > If
> >> > > you can tell why it isn't sufficient in your case, I'd have more
> info
> >> to
> >> > > think about better suggestion ;)
> >> > >
> >> > > > It is Okay to keep all versions of your patch in the JIRA.
> >> > > > Maybe the second should be named HBASE-3811-v2.patch<
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >> > > >?
> >> > >
> >> > > np. Can do that. Just thought that they (patches) can be sorted by
> >> date
> >> > to
> >> > > find out the final one (aka "convention over naming-rules").
> >> > >
> >> > > Alex.
> >> > >
> >> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> > >
> >> > > > >> Though it might be ok, since we anyways "transfer" start/stop
> >> rows
> >> > > with
> >> > > > Scan object.
> >> > > > In write() method, we now have:
> >> > > >     Bytes.writeByteArray(out, this.startRow);
> >> > > >     Bytes.writeByteArray(out, this.stopRow);
> >> > > > ...
> >> > > >       for (Map.Entry<String, byte[]> attr :
> >> this.attributes.entrySet())
> >> > {
> >> > > >         WritableUtils.writeString(out, attr.getKey());
> >> > > >         Bytes.writeByteArray(out, attr.getValue());
> >> > > >       }
> >> > > > The start/end rows may be written twice.
> >> > > >
> >> > > > Of course, you have full control over how to generate the unique
> ID
> >> for
> >> > > > "sourceScan" attribute.
> >> > > >
> >> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe
> the
> >> > > second
> >> > > > should be named HBASE-3811-v2.patch<
> >> > >
> >> >
> >>
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >> > > >?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >
> >> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> >> > alex.baranov.v@gmail.com
> >> > > >wrote:
> >> > > >
> >> > > >> > Can you remove the first version ?
> >> > > >> Isn't it ok to keep it in JIRA issue?
> >> > > >>
> >> > > >>
> >> > > >> > In HBaseWD, can you use reflection to detect whether Scan
> >> supports
> >> > > >> setAttribute() ?
> >> > > >> > If it does, can you encode start row and end row as
> "sourceScan"
> >> > > >> attribute ?
> >> > > >>
> >> > > >> Yeah, smth like this is going to be implemented. Though I'd still
> >> want
> >> > > to
> >> > > >> hear from the devs the story about Scan version.
> >> > > >>
> >> > > >>
> >> > > >> > One consideration is that start row or end row may be quite
> long.
> >> > > >>
> >> > > >> Yeah, that is was my though too at first. Though it might be ok,
> >> since
> >> > > we
> >> > > >> anyways "transfer" start/stop rows with Scan object.
> >> > > >>
> >> > > >> > What do you think ?
> >> > > >>
> >> > > >> I'd love to hear from you is this variant I mentioned is what we
> >> are
> >> > > >> looking at here:
> >> > > >>
> >> > > >>
> >> > > >> > From what I understand, you want to distinguish scans fired by
> >> the
> >> > > same
> >> > > >> distributed scan.
> >> > > >> > I.e. group scans which were fired by single distributed scan.
> If
> >> > > that's
> >> > > >> what you want, distributed
> >> > > >> > scan can generate unique ID and set, say "sourceScan" attribute
> >> to
> >> > its
> >> > > >> value. This way we'll
> >> > > >> > have <# of distinct "sourceScan" attribute values> = <number of
> >> > > >> distributed scans invoked by
> >> > > >> > client side> and two scans on server side will have the same
> >> > > >> "sourceScan" attribute iff they
> >> > > >> > "belong" to same distributed scan.
> >> > > >>
> >> > > >>
> >> > > >> Alex Baranau
> >> > > >> ----
> >> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> >> -
> >> > > >> HBase
> >> > > >>
> >> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> >> wrote:
> >> > > >>
> >> > > >>> Alex:
> >> > > >>> Your second patch looks good.
> >> > > >>> Can you remove the first version ?
> >> > > >>>
> >> > > >>> In HBaseWD, can you use reflection to detect whether Scan
> supports
> >> > > >>> setAttribute() ?
> >> > > >>> If it does, can you encode start row and end row as "sourceScan"
> >> > > >>> attribute ?
> >> > > >>>
> >> > > >>> One consideration is that start row or end row may be quite
> long.
> >> > > >>> Ideally we should store hash code of source Scan object as
> >> > "sourceScan"
> >> > > >>> attribute. But Scan doesn't implement hashCode(). We can add it,
> >> that
> >> > > would
> >> > > >>> require running all Scan related tests.
> >> > > >>>
> >> > > >>> What do you think ?
> >> > > >>>
> >> > > >>> Thanks
> >> > > >>>
> >> > > >>>
> >> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> >> > > alex.baranov.v@gmail.com>wrote:
> >> > > >>>
> >> > > >>>> Sorry for the delay in response (public holidays here).
> >> > > >>>>
> >> > > >>>> This depends on what info you are looking for on server side.
> >> > > >>>>
> >> > > >>>> From what I understand, you want to distinguish scans fired by
> >> the
> >> > > same
> >> > > >>>> distributed scan. I.e. group scans which were fired by single
> >> > > distributed
> >> > > >>>> scan. If that's what you want, distributed scan can generate
> >> unique
> >> > ID
> >> > > and
> >> > > >>>> set, say "sourceScan" attribute to its value. This way we'll
> have
> >> <#
> >> > > of
> >> > > >>>> distinct "sourceScan" attribute values> = <number of
> distributed
> >> > scans
> >> > > >>>> invoked by client side> and two scans on server side will have
> >> the
> >> > > same
> >> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
> >> scan.
> >> > > >>>>
> >> > > >>>> Is this what are you looking for?
> >> > > >>>>
> >> > > >>>> Alex Baranau
> >> > > >>>>
> >> > > >>>> P.S. attached patch for HBASE-3811<
> >> > > https://issues.apache.org/jira/browse/HBASE-3811>
> >> > > >>>> .
> >> > > >>>> P.S-2. should this conversation be moved to dev list?
> >> > > >>>>
> >> > > >>>> ----
> >> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> Hadoop
> >> > -
> >> > > >>>> HBase
> >> > > >>>>
> >> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com>
> >> > wrote:
> >> > > >>>>
> >> > > >>>>> Alex:
> >> > > >>>>> What type of identification should we put in the map of the
> Scan
> >> > > object
> >> > > >>>>> ?
> >> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the
> user
> >> > can
> >> > > >>>>> use same distributor on multiple scans.
> >> > > >>>>>
> >> > > >>>>> Please share your thought.
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> >> > > >>>>> alex.baranov.v@gmail.com> wrote:
> >> > > >>>>>
> >> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> >> > > >>>>>>
> >> > > >>>>>> Alex Baranau
> >> > > >>>>>> ----
> >> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >> > Hadoop
> >> > > -
> >> > > >>>>>> HBase
> >> > > >>>>>>
> >> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yuzhihong@gmail.com
> >
> >> > > wrote:
> >> > > >>>>>>
> >> > > >>>>>> > My plan was to make regions that have active scanners more
> >> > stable
> >> > > -
> >> > > >>>>>> trying
> >> > > >>>>>> > not to move them when balancing.
> >> > > >>>>>> > I prefer second approach - adding custom attribute(s) to
> Scan
> >> so
> >> > > >>>>>> that the
> >> > > >>>>>> > Scans created by the method below can be 'grouped'.
> >> > > >>>>>> >
> >> > > >>>>>> > If you can file a JIRA, that would be great.
> >> > > >>>>>> >
> >> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> >> > > >>>>>> alex.baranov.v@gmail.com
> >> > > >>>>>> > >wrote:
> >> > > >>>>>> >
> >> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> >> > > >>>>>> differently) when
> >> > > >>>>>> > > determining the load?
> >> > > >>>>>> > >
> >> > > >>>>>> > > The current code looks like this:
> >> > > >>>>>> > >
> >> > > >>>>>> > > class DistributedScanner:
> >> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
> >> Scan
> >> > > >>>>>> original,
> >> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
> >> IOException {
> >> > > >>>>>> > >    byte[][] startKeys =
> >> > > >>>>>> > >
> >> keyDistributor.getAllDistributedKeys(original.getStartRow());
> >> > > >>>>>> > >    byte[][] stopKeys =
> >> > > >>>>>> > >
> >> keyDistributor.getAllDistributedKeys(original.getStopRow());
> >> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> >> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> >> > > >>>>>> > >      scans[i] = new Scan(original);
> >> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> >> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> >> > > >>>>>> > >    }
> >> > > >>>>>> > >
> >> > > >>>>>> > >    ResultScanner[] rss = new
> >> ResultScanner[startKeys.length];
> >> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> >> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> >> > > >>>>>> > >    }
> >> > > >>>>>> > >
> >> > > >>>>>> > >    return new DistributedScanner(rss);
> >> > > >>>>>> > >  }
> >> > > >>>>>> > >
> >> > > >>>>>> > > This is client code. To make these scans "identifiable"
> we
> >> > need
> >> > > to
> >> > > >>>>>> either
> >> > > >>>>>> > > use some different (derived from Scan) class or add some
> >> > > attribute
> >> > > >>>>>> to
> >> > > >>>>>> > them.
> >> > > >>>>>> > > There's no API for doing the latter. But we can do the
> >> former,
> >> > > but
> >> > > >>>>>> I
> >> > > >>>>>> > don't
> >> > > >>>>>> > > really like the idea of creating extra class (with no
> extra
> >> > > >>>>>> > functionality)
> >> > > >>>>>> > > just to distinguish it from the base one.
> >> > > >>>>>> > >
> >> > > >>>>>> > > If you can share why/how do you want to treat them
> >> differently
> >> > > on
> >> > > >>>>>> server
> >> > > >>>>>> > > side, that would be helpful.
> >> > > >>>>>> > >
> >> > > >>>>>> > > Alex Baranau
> >> > > >>>>>> > > ----
> >> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> >> -
> >> > > >>>>>> Hadoop -
> >> > > >>>>>> > HBase
> >> > > >>>>>> > >
> >> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> >> yuzhihong@gmail.com>
> >> > > >>>>>> wrote:
> >> > > >>>>>> > >
> >> > > >>>>>> > > > My request would be to make the distributed scan
> >> > identifiable
> >> > > >>>>>> from
> >> > > >>>>>> > server
> >> > > >>>>>> > > > side.
> >> > > >>>>>> > > > :-)
> >> > > >>>>>> > > >
> >> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> >> > > >>>>>> > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > >wrote:
> >> > > >>>>>> > > >
> >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> >> regions
> >> > for
> >> > > >>>>>> the
> >> > > >>>>>> > > > underlying
> >> > > >>>>>> > > > > > table.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > True: e.g. when there's only one region that holds
> data
> >> > for
> >> > > >>>>>> the whole
> >> > > >>>>>> > > > table
> >> > > >>>>>> > > > > (not many records in table yet), distributed scan
> will
> >> > fire
> >> > > N
> >> > > >>>>>> scans
> >> > > >>>>>> > > > against
> >> > > >>>>>> > > > > the same region.
> >> > > >>>>>> > > > > On the other hand, in case there are huge number of
> >> > regions
> >> > > >>>>>> for
> >> > > >>>>>> > single
> >> > > >>>>>> > > > > table, each scan can span over multiple regions.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> >> scan"
> >> > at
> >> > > >>>>>> server
> >> > > >>>>>> > > side.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > With current implementation "distributed" scan won't
> be
> >> > > >>>>>> recognized as
> >> > > >>>>>> > > > > something special on the server side. It will be an
> >> > ordinary
> >> > > >>>>>> scan.
> >> > > >>>>>> > > Though
> >> > > >>>>>> > > > > the number of scan will increase, given that the
> >> typical
> >> > > >>>>>> situation is
> >> > > >>>>>> > > > "many
> >> > > >>>>>> > > > > regions for single table", the scans of the same
> >> > > "distributed
> >> > > >>>>>> scan"
> >> > > >>>>>> > are
> >> > > >>>>>> > > > > likely not to hit the same region.
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > Not sure if I answered your questions here. Feel free
> >> to
> >> > ask
> >> > > >>>>>> more ;)
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > Alex Baranau
> >> > > >>>>>> > > > > ----
> >> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> >> Nutch
> >> > -
> >> > > >>>>>> Hadoop -
> >> > > >>>>>> > > > HBase
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> >> > > yuzhihong@gmail.com>
> >> > > >>>>>> wrote:
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > > > > Alex:
> >> > > >>>>>> > > > > > If you read this, you would know why I asked:
> >> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
> >> scan"
> >> > at
> >> > > >>>>>> server
> >> > > >>>>>> > > side.
> >> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
> >> regions
> >> > for
> >> > > >>>>>> the
> >> > > >>>>>> > > > underlying
> >> > > >>>>>> > > > > > table.
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > Cheers
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> >> > > >>>>>> > > > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > > > >wrote:
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > > > > Hi Ted,
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > We currently use this tool in the scenario where
> >> data
> >> > is
> >> > > >>>>>> consumed
> >> > > >>>>>> > > by
> >> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
> >> performance
> >> > of
> >> > > >>>>>> pure
> >> > > >>>>>> > > > > "distributed
> >> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect
> >> it
> >> > to
> >> > > be
> >> > > >>>>>> close
> >> > > >>>>>> > to
> >> > > >>>>>> > > > > > simple
> >> > > >>>>>> > > > > > > scan performance, or may be sometimes even faster
> >> > > >>>>>> depending on
> >> > > >>>>>> > your
> >> > > >>>>>> > > > > data
> >> > > >>>>>> > > > > > > access patterns. E.g. in case you write
> timeseries
> >> > data
> >> > > >>>>>> > > (sequential)
> >> > > >>>>>> > > > > > which
> >> > > >>>>>> > > > > > > is written into the single region at a time, then
> >> e.g.
> >> > > if
> >> > > >>>>>> you
> >> > > >>>>>> > > access
> >> > > >>>>>> > > > > > delta
> >> > > >>>>>> > > > > > > for further processing/analysis (esp. if from not
> >> > single
> >> > > >>>>>> client)
> >> > > >>>>>> > > > these
> >> > > >>>>>> > > > > > > scans
> >> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> >> regions
> >> > > at
> >> > > >>>>>> a time,
> >> > > >>>>>> > > > which
> >> > > >>>>>> > > > > > may
> >> > > >>>>>> > > > > > > perform worse comparing to many scans hitting
> data
> >> > that
> >> > > is
> >> > > >>>>>> much
> >> > > >>>>>> > > > better
> >> > > >>>>>> > > > > > > spread over region servers.
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > As for map-reduce job the approach should not
> >> affect
> >> > > >>>>>> reading
> >> > > >>>>>> > > > > performance
> >> > > >>>>>> > > > > > at
> >> > > >>>>>> > > > > > > all: it's just that there are bucketsCount times
> >> more
> >> > > >>>>>> splits and
> >> > > >>>>>> > > > hence
> >> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases
> >> this
> >> > > even
> >> > > >>>>>> > improves
> >> > > >>>>>> > > > > > overall
> >> > > >>>>>> > > > > > > performance of the MR job since work is better
> >> > > distributed
> >> > > >>>>>> over
> >> > > >>>>>> > > > cluster
> >> > > >>>>>> > > > > > > (esp. in situation when the aim is to constantly
> >> > process
> >> > > >>>>>> the
> >> > > >>>>>> > coming
> >> > > >>>>>> > > > > delta
> >> > > >>>>>> > > > > > > which usually resides in one or just couple of
> >> regions
> >> > > >>>>>> depending
> >> > > >>>>>> > on
> >> > > >>>>>> > > > > > > processing frequency).
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > If you can share details on your case, that will
> >> help
> >> > to
> >> > > >>>>>> > understand
> >> > > >>>>>> > > > > what
> >> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > Alex Baranau
> >> > > >>>>>> > > > > > > ----
> >> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> >> -
> >> > > Nutch
> >> > > >>>>>> -
> >> > > >>>>>> > Hadoop
> >> > > >>>>>> > > -
> >> > > >>>>>> > > > > > HBase
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> >> > > >>>>>> yuzhihong@gmail.com>
> >> > > >>>>>> > > wrote:
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > > > > Interesting project, Alex.
> >> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared
> to
> >> one
> >> > > >>>>>> scanner
> >> > > >>>>>> > > > > > originally,
> >> > > >>>>>> > > > > > > > have you performed load testing to see the
> impact
> >> ?
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > Thanks
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau
> <
> >> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> >> > > >>>>>> > > > > > > > >wrote:
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > > > > Hello guys,
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> >> project/lib
> >> > > >>>>>> around
> >> > > >>>>>> > > HBase:
> >> > > >>>>>> > > > > > > HBaseWD.
> >> > > >>>>>> > > > > > > > > It
> >> > > >>>>>> > > > > > > > > is aimed to help with distribution of the
> load
> >> > > (across
> >> > > >>>>>> > > > > regionservers)
> >> > > >>>>>> > > > > > > > when
> >> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> >> nature)
> >> > > >>>>>> records.
> >> > > >>>>>> > It
> >> > > >>>>>> > > > > > > implements
> >> > > >>>>>> > > > > > > > > the solution which was discussed several
> times
> >> on
> >> > > this
> >> > > >>>>>> > mailing
> >> > > >>>>>> > > > list
> >> > > >>>>>> > > > > > > (e.g.
> >> > > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk
> ).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please find the sources at
> >> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> >> > > >>>>>> > > > > > > > > also
> >> > > >>>>>> > > > > > > > > a jar of current version for convenience). It
> >> is
> >> > > very
> >> > > >>>>>> easy to
> >> > > >>>>>> > > > make
> >> > > >>>>>> > > > > > use
> >> > > >>>>>> > > > > > > of
> >> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
> >> with
> >> > 1+2
> >> > > >>>>>> lines of
> >> > > >>>>>> > > > code
> >> > > >>>>>> > > > > > (one
> >> > > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
> >> > > MapReduce
> >> > > >>>>>> job).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please find below the short intro to the lib
> >> [1].
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Alex Baranau
> >> > > >>>>>> > > > > > > > > ----
> >> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> >> Lucene
> >> > -
> >> > > >>>>>> Nutch -
> >> > > >>>>>> > > > Hadoop
> >> > > >>>>>> > > > > -
> >> > > >>>>>> > > > > > > > HBase
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > [1]
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Description:
> >> > > >>>>>> > > > > > > > > ------------
> >> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential)
> >> > Writes.
> >> > > >>>>>> It was
> >> > > >>>>>> > > > > inspired
> >> > > >>>>>> > > > > > by
> >> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around the
> >> > > problem
> >> > > >>>>>> of
> >> > > >>>>>> > > choosing
> >> > > >>>>>> > > > > > > > between:
> >> > > >>>>>> > > > > > > > > * writing records with sequential row keys
> >> (e.g.
> >> > > >>>>>> time-series
> >> > > >>>>>> > > data
> >> > > >>>>>> > > > > > with
> >> > > >>>>>> > > > > > > > row
> >> > > >>>>>> > > > > > > > > key
> >> > > >>>>>> > > > > > > > >  built based on ts)
> >> > > >>>>>> > > > > > > > > * using random unique IDs for records
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > First approach makes possible to perform fast
> >> > range
> >> > > >>>>>> scans
> >> > > >>>>>> > with
> >> > > >>>>>> > > > help
> >> > > >>>>>> > > > > > of
> >> > > >>>>>> > > > > > > > > setting
> >> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates
> single
> >> > > region
> >> > > >>>>>> server
> >> > > >>>>>> > > > > > > hot-spotting
> >> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go in
> >> > > sequence
> >> > > >>>>>> all
> >> > > >>>>>> > > records
> >> > > >>>>>> > > > > end
> >> > > >>>>>> > > > > > > up
> >> > > >>>>>> > > > > > > > > written into a single region at a time).
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> >> > performance
> >> > > >>>>>> by
> >> > > >>>>>> > > > > distributing
> >> > > >>>>>> > > > > > > new
> >> > > >>>>>> > > > > > > > > records over random regions but makes not
> >> possible
> >> > > >>>>>> doing fast
> >> > > >>>>>> > > > range
> >> > > >>>>>> > > > > > > scans
> >> > > >>>>>> > > > > > > > > against written data.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > The suggested approach stays in the middle of
> >> the
> >> > > two
> >> > > >>>>>> above
> >> > > >>>>>> > and
> >> > > >>>>>> > > > > > proved
> >> > > >>>>>> > > > > > > to
> >> > > >>>>>> > > > > > > > > perform well by distributing records over the
> >> > > cluster
> >> > > >>>>>> during
> >> > > >>>>>> > > > > writing
> >> > > >>>>>> > > > > > > data
> >> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> >> > provides
> >> > > >>>>>> very
> >> > > >>>>>> > > simple
> >> > > >>>>>> > > > > API
> >> > > >>>>>> > > > > > to
> >> > > >>>>>> > > > > > > > > work with which makes it perfect to use with
> >> > > existing
> >> > > >>>>>> code.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info
> >> as
> >> > > they
> >> > > >>>>>> aimed
> >> > > >>>>>> > to
> >> > > >>>>>> > > > act
> >> > > >>>>>> > > > > as
> >> > > >>>>>> > > > > > > > > example.
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> >> > > >>>>>> > > > > > > > > ----------------------------
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Distributing records with sequential keys
> which
> >> > are
> >> > > >>>>>> being
> >> > > >>>>>> > > written
> >> > > >>>>>> > > > > in
> >> > > >>>>>> > > > > > up
> >> > > >>>>>> > > > > > > > to
> >> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> >> distributing
> >> > > into
> >> > > >>>>>> 32
> >> > > >>>>>> > > buckets
> >> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> >> > > >>>>>> > > > > > > > >                           new
> >> > > >>>>>> > > > > > > > >
> RowKeyDistributorByOneBytePrefix(bucketsCount);
> >> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> >> > > >>>>>> > > > > > > > >      Put put = new
> >> > > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
> >> > > >>>>>> > > > > > > > >      ... // add values
> >> > > >>>>>> > > > > > > > >      hTable.put(put);
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Performing a range scan over written data
> >> > > (internally
> >> > > >>>>>> > > > > <bucketsCount>
> >> > > >>>>>> > > > > > > > > scanners
> >> > > >>>>>> > > > > > > > > executed):
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >> > > >>>>>> > > > > > > > >    ResultScanner rs =
> >> > > >>>>>> DistributedScanner.create(hTable, scan,
> >> > > >>>>>> > > > > > > > > keyDistributor);
> >> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> >> > > >>>>>> > > > > > > > >      ...
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
> >> chunk
> >> > > >>>>>> specified by
> >> > > >>>>>> > > > Scan:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Configuration conf =
> >> > HBaseConfiguration.create();
> >> > > >>>>>> > > > > > > > >    Job job = new Job(conf,
> "testMapreduceJob");
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >>  TableMapReduceUtil.initTableMapperJob("table",
> >> > > >>>>>> scan,
> >> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> >> > > >>>>>> ImmutableBytesWritable.class,
> >> > > >>>>>> > > > > > > Result.class,
> >> > > >>>>>> > > > > > > > > job);
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    // Substituting standard TableInputFormat
> >> which
> >> > > was
> >> > > >>>>>> set in
> >> > > >>>>>> > > > > > > > >    //
> >> TableMapReduceUtil.initTableMapperJob(...)
> >> > > >>>>>> > > > > > > > >
> >> > >  job.setInputFormatClass(WdTableInputFormat.class);
> >> > > >>>>>> > > > > > > > >
> >>  keyDistributor.addInfo(job.getConfiguration());
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> >> > > >>>>>> > > > > > > > > -----------------------------------------
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> >> support
> >> > > >>>>>> custom row
> >> > > >>>>>> > > key
> >> > > >>>>>> > > > > > > > > distribution
> >> > > >>>>>> > > > > > > > > approaches. To define custom row key
> >> distributing
> >> > > >>>>>> logic just
> >> > > >>>>>> > > > > > implement
> >> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class
> which
> >> is
> >> > > >>>>>> really very
> >> > > >>>>>> > > > > simple:
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > > >    public abstract class
> >> AbstractRowKeyDistributor
> >> > > >>>>>> implements
> >> > > >>>>>> > > > > > > > > Parametrizable {
> >> > > >>>>>> > > > > > > > >      public abstract byte[]
> >> > getDistributedKey(byte[]
> >> > > >>>>>> > > > originalKey);
> >> > > >>>>>> > > > > > > > >      public abstract byte[]
> >> getOriginalKey(byte[]
> >> > > >>>>>> > adjustedKey);
> >> > > >>>>>> > > > > > > > >      public abstract byte[][]
> >> > > >>>>>> getAllDistributedKeys(byte[]
> >> > > >>>>>> > > > > > > originalKey);
> >> > > >>>>>> > > > > > > > >      ... // some utility methods
> >> > > >>>>>> > > > > > > > >    }
> >> > > >>>>>> > > > > > > > >
> >> > > >>>>>> > > > > > > >
> >> > > >>>>>> > > > > > >
> >> > > >>>>>> > > > > >
> >> > > >>>>>> > > > >
> >> > > >>>>>> > > >
> >> > > >>>>>> > >
> >> > > >>>>>> >
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

I have another question. For overwriting, do I need to delete the existing
one before re-writing it?

On Sat, May 14, 2011 at 10:17 AM, Weishung Chung <we...@gmail.com> wrote:

> Yes, it's simple yet useful. I am integrating it. Thanks alot :)
>
>
> On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <al...@gmail.com>wrote:
>
>> Thanks for the interest!
>>
>> We are using it in production. It is simple and hence quite stable. Though
>> some minor pieces are missing (like
>> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
>> stability
>> and/or major functionality.
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > What's the status on this package? Is it mature enough?
>> >  I am using it in my project, tried out the write method yesterday and
>> > going
>> > to incorporate into read method tomorrow.
>> >
>> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <alex.baranov.v@gmail.com
>> > >wrote:
>> >
>> > > > The start/end rows may be written twice.
>> > >
>> > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable"
>> in
>> > > attribute value no matter how long are they (keys), since we already
>> OK
>> > > with
>> > > transferring them initially (i.e. we should be OK with transferring 2x
>> > > times
>> > > more).
>> > >
>> > > So, what about the suggestion of sourceScan attribute value I
>> mentioned?
>> > If
>> > > you can tell why it isn't sufficient in your case, I'd have more info
>> to
>> > > think about better suggestion ;)
>> > >
>> > > > It is Okay to keep all versions of your patch in the JIRA.
>> > > > Maybe the second should be named HBASE-3811-v2.patch<
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > >?
>> > >
>> > > np. Can do that. Just thought that they (patches) can be sorted by
>> date
>> > to
>> > > find out the final one (aka "convention over naming-rules").
>> > >
>> > > Alex.
>> > >
>> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > >> Though it might be ok, since we anyways "transfer" start/stop
>> rows
>> > > with
>> > > > Scan object.
>> > > > In write() method, we now have:
>> > > >     Bytes.writeByteArray(out, this.startRow);
>> > > >     Bytes.writeByteArray(out, this.stopRow);
>> > > > ...
>> > > >       for (Map.Entry<String, byte[]> attr :
>> this.attributes.entrySet())
>> > {
>> > > >         WritableUtils.writeString(out, attr.getKey());
>> > > >         Bytes.writeByteArray(out, attr.getValue());
>> > > >       }
>> > > > The start/end rows may be written twice.
>> > > >
>> > > > Of course, you have full control over how to generate the unique ID
>> for
>> > > > "sourceScan" attribute.
>> > > >
>> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the
>> > > second
>> > > > should be named HBASE-3811-v2.patch<
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
>> > > >?
>> > > >
>> > > > Thanks
>> > > >
>> > > >
>> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
>> > alex.baranov.v@gmail.com
>> > > >wrote:
>> > > >
>> > > >> > Can you remove the first version ?
>> > > >> Isn't it ok to keep it in JIRA issue?
>> > > >>
>> > > >>
>> > > >> > In HBaseWD, can you use reflection to detect whether Scan
>> supports
>> > > >> setAttribute() ?
>> > > >> > If it does, can you encode start row and end row as "sourceScan"
>> > > >> attribute ?
>> > > >>
>> > > >> Yeah, smth like this is going to be implemented. Though I'd still
>> want
>> > > to
>> > > >> hear from the devs the story about Scan version.
>> > > >>
>> > > >>
>> > > >> > One consideration is that start row or end row may be quite long.
>> > > >>
>> > > >> Yeah, that is was my though too at first. Though it might be ok,
>> since
>> > > we
>> > > >> anyways "transfer" start/stop rows with Scan object.
>> > > >>
>> > > >> > What do you think ?
>> > > >>
>> > > >> I'd love to hear from you is this variant I mentioned is what we
>> are
>> > > >> looking at here:
>> > > >>
>> > > >>
>> > > >> > From what I understand, you want to distinguish scans fired by
>> the
>> > > same
>> > > >> distributed scan.
>> > > >> > I.e. group scans which were fired by single distributed scan. If
>> > > that's
>> > > >> what you want, distributed
>> > > >> > scan can generate unique ID and set, say "sourceScan" attribute
>> to
>> > its
>> > > >> value. This way we'll
>> > > >> > have <# of distinct "sourceScan" attribute values> = <number of
>> > > >> distributed scans invoked by
>> > > >> > client side> and two scans on server side will have the same
>> > > >> "sourceScan" attribute iff they
>> > > >> > "belong" to same distributed scan.
>> > > >>
>> > > >>
>> > > >> Alex Baranau
>> > > >> ----
>> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
>> -
>> > > >> HBase
>> > > >>
>> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > >>
>> > > >>> Alex:
>> > > >>> Your second patch looks good.
>> > > >>> Can you remove the first version ?
>> > > >>>
>> > > >>> In HBaseWD, can you use reflection to detect whether Scan supports
>> > > >>> setAttribute() ?
>> > > >>> If it does, can you encode start row and end row as "sourceScan"
>> > > >>> attribute ?
>> > > >>>
>> > > >>> One consideration is that start row or end row may be quite long.
>> > > >>> Ideally we should store hash code of source Scan object as
>> > "sourceScan"
>> > > >>> attribute. But Scan doesn't implement hashCode(). We can add it,
>> that
>> > > would
>> > > >>> require running all Scan related tests.
>> > > >>>
>> > > >>> What do you think ?
>> > > >>>
>> > > >>> Thanks
>> > > >>>
>> > > >>>
>> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
>> > > alex.baranov.v@gmail.com>wrote:
>> > > >>>
>> > > >>>> Sorry for the delay in response (public holidays here).
>> > > >>>>
>> > > >>>> This depends on what info you are looking for on server side.
>> > > >>>>
>> > > >>>> From what I understand, you want to distinguish scans fired by
>> the
>> > > same
>> > > >>>> distributed scan. I.e. group scans which were fired by single
>> > > distributed
>> > > >>>> scan. If that's what you want, distributed scan can generate
>> unique
>> > ID
>> > > and
>> > > >>>> set, say "sourceScan" attribute to its value. This way we'll have
>> <#
>> > > of
>> > > >>>> distinct "sourceScan" attribute values> = <number of distributed
>> > scans
>> > > >>>> invoked by client side> and two scans on server side will have
>> the
>> > > same
>> > > >>>> "sourceScan" attribute iff they "belong" to same distributed
>> scan.
>> > > >>>>
>> > > >>>> Is this what are you looking for?
>> > > >>>>
>> > > >>>> Alex Baranau
>> > > >>>>
>> > > >>>> P.S. attached patch for HBASE-3811<
>> > > https://issues.apache.org/jira/browse/HBASE-3811>
>> > > >>>> .
>> > > >>>> P.S-2. should this conversation be moved to dev list?
>> > > >>>>
>> > > >>>> ----
>> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>> Hadoop
>> > -
>> > > >>>> HBase
>> > > >>>>
>> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com>
>> > wrote:
>> > > >>>>
>> > > >>>>> Alex:
>> > > >>>>> What type of identification should we put in the map of the Scan
>> > > object
>> > > >>>>> ?
>> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the user
>> > can
>> > > >>>>> use same distributor on multiple scans.
>> > > >>>>>
>> > > >>>>> Please share your thought.
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
>> > > >>>>> alex.baranov.v@gmail.com> wrote:
>> > > >>>>>
>> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
>> > > >>>>>>
>> > > >>>>>> Alex Baranau
>> > > >>>>>> ----
>> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>> > Hadoop
>> > > -
>> > > >>>>>> HBase
>> > > >>>>>>
>> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com>
>> > > wrote:
>> > > >>>>>>
>> > > >>>>>> > My plan was to make regions that have active scanners more
>> > stable
>> > > -
>> > > >>>>>> trying
>> > > >>>>>> > not to move them when balancing.
>> > > >>>>>> > I prefer second approach - adding custom attribute(s) to Scan
>> so
>> > > >>>>>> that the
>> > > >>>>>> > Scans created by the method below can be 'grouped'.
>> > > >>>>>> >
>> > > >>>>>> > If you can file a JIRA, that would be great.
>> > > >>>>>> >
>> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
>> > > >>>>>> alex.baranov.v@gmail.com
>> > > >>>>>> > >wrote:
>> > > >>>>>> >
>> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
>> > > >>>>>> differently) when
>> > > >>>>>> > > determining the load?
>> > > >>>>>> > >
>> > > >>>>>> > > The current code looks like this:
>> > > >>>>>> > >
>> > > >>>>>> > > class DistributedScanner:
>> > > >>>>>> > >  public static DistributedScanner create(HTable hTable,
>> Scan
>> > > >>>>>> original,
>> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws
>> IOException {
>> > > >>>>>> > >    byte[][] startKeys =
>> > > >>>>>> > >
>> keyDistributor.getAllDistributedKeys(original.getStartRow());
>> > > >>>>>> > >    byte[][] stopKeys =
>> > > >>>>>> > >
>> keyDistributor.getAllDistributedKeys(original.getStopRow());
>> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
>> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
>> > > >>>>>> > >      scans[i] = new Scan(original);
>> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
>> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
>> > > >>>>>> > >    }
>> > > >>>>>> > >
>> > > >>>>>> > >    ResultScanner[] rss = new
>> ResultScanner[startKeys.length];
>> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
>> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
>> > > >>>>>> > >    }
>> > > >>>>>> > >
>> > > >>>>>> > >    return new DistributedScanner(rss);
>> > > >>>>>> > >  }
>> > > >>>>>> > >
>> > > >>>>>> > > This is client code. To make these scans "identifiable" we
>> > need
>> > > to
>> > > >>>>>> either
>> > > >>>>>> > > use some different (derived from Scan) class or add some
>> > > attribute
>> > > >>>>>> to
>> > > >>>>>> > them.
>> > > >>>>>> > > There's no API for doing the latter. But we can do the
>> former,
>> > > but
>> > > >>>>>> I
>> > > >>>>>> > don't
>> > > >>>>>> > > really like the idea of creating extra class (with no extra
>> > > >>>>>> > functionality)
>> > > >>>>>> > > just to distinguish it from the base one.
>> > > >>>>>> > >
>> > > >>>>>> > > If you can share why/how do you want to treat them
>> differently
>> > > on
>> > > >>>>>> server
>> > > >>>>>> > > side, that would be helpful.
>> > > >>>>>> > >
>> > > >>>>>> > > Alex Baranau
>> > > >>>>>> > > ----
>> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> -
>> > > >>>>>> Hadoop -
>> > > >>>>>> > HBase
>> > > >>>>>> > >
>> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
>> yuzhihong@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>> > >
>> > > >>>>>> > > > My request would be to make the distributed scan
>> > identifiable
>> > > >>>>>> from
>> > > >>>>>> > server
>> > > >>>>>> > > > side.
>> > > >>>>>> > > > :-)
>> > > >>>>>> > > >
>> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
>> > > >>>>>> > alex.baranov.v@gmail.com
>> > > >>>>>> > > > >wrote:
>> > > >>>>>> > > >
>> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
>> regions
>> > for
>> > > >>>>>> the
>> > > >>>>>> > > > underlying
>> > > >>>>>> > > > > > table.
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > True: e.g. when there's only one region that holds data
>> > for
>> > > >>>>>> the whole
>> > > >>>>>> > > > table
>> > > >>>>>> > > > > (not many records in table yet), distributed scan will
>> > fire
>> > > N
>> > > >>>>>> scans
>> > > >>>>>> > > > against
>> > > >>>>>> > > > > the same region.
>> > > >>>>>> > > > > On the other hand, in case there are huge number of
>> > regions
>> > > >>>>>> for
>> > > >>>>>> > single
>> > > >>>>>> > > > > table, each scan can span over multiple regions.
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
>> scan"
>> > at
>> > > >>>>>> server
>> > > >>>>>> > > side.
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > With current implementation "distributed" scan won't be
>> > > >>>>>> recognized as
>> > > >>>>>> > > > > something special on the server side. It will be an
>> > ordinary
>> > > >>>>>> scan.
>> > > >>>>>> > > Though
>> > > >>>>>> > > > > the number of scan will increase, given that the
>> typical
>> > > >>>>>> situation is
>> > > >>>>>> > > > "many
>> > > >>>>>> > > > > regions for single table", the scans of the same
>> > > "distributed
>> > > >>>>>> scan"
>> > > >>>>>> > are
>> > > >>>>>> > > > > likely not to hit the same region.
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > Not sure if I answered your questions here. Feel free
>> to
>> > ask
>> > > >>>>>> more ;)
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > Alex Baranau
>> > > >>>>>> > > > > ----
>> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>> Nutch
>> > -
>> > > >>>>>> Hadoop -
>> > > >>>>>> > > > HBase
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
>> > > yuzhihong@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>> > > > >
>> > > >>>>>> > > > > > Alex:
>> > > >>>>>> > > > > > If you read this, you would know why I asked:
>> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
>> > > >>>>>> > > > > >
>> > > >>>>>> > > > > > I need to deal with normal scan and "distributed
>> scan"
>> > at
>> > > >>>>>> server
>> > > >>>>>> > > side.
>> > > >>>>>> > > > > > Basically bucketsCount may not equal number of
>> regions
>> > for
>> > > >>>>>> the
>> > > >>>>>> > > > underlying
>> > > >>>>>> > > > > > table.
>> > > >>>>>> > > > > >
>> > > >>>>>> > > > > > Cheers
>> > > >>>>>> > > > > >
>> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
>> > > >>>>>> > > > alex.baranov.v@gmail.com
>> > > >>>>>> > > > > > >wrote:
>> > > >>>>>> > > > > >
>> > > >>>>>> > > > > > > Hi Ted,
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > We currently use this tool in the scenario where
>> data
>> > is
>> > > >>>>>> consumed
>> > > >>>>>> > > by
>> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the
>> performance
>> > of
>> > > >>>>>> pure
>> > > >>>>>> > > > > "distributed
>> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect
>> it
>> > to
>> > > be
>> > > >>>>>> close
>> > > >>>>>> > to
>> > > >>>>>> > > > > > simple
>> > > >>>>>> > > > > > > scan performance, or may be sometimes even faster
>> > > >>>>>> depending on
>> > > >>>>>> > your
>> > > >>>>>> > > > > data
>> > > >>>>>> > > > > > > access patterns. E.g. in case you write timeseries
>> > data
>> > > >>>>>> > > (sequential)
>> > > >>>>>> > > > > > which
>> > > >>>>>> > > > > > > is written into the single region at a time, then
>> e.g.
>> > > if
>> > > >>>>>> you
>> > > >>>>>> > > access
>> > > >>>>>> > > > > > delta
>> > > >>>>>> > > > > > > for further processing/analysis (esp. if from not
>> > single
>> > > >>>>>> client)
>> > > >>>>>> > > > these
>> > > >>>>>> > > > > > > scans
>> > > >>>>>> > > > > > > are likely to hit the same region or couple of
>> regions
>> > > at
>> > > >>>>>> a time,
>> > > >>>>>> > > > which
>> > > >>>>>> > > > > > may
>> > > >>>>>> > > > > > > perform worse comparing to many scans hitting data
>> > that
>> > > is
>> > > >>>>>> much
>> > > >>>>>> > > > better
>> > > >>>>>> > > > > > > spread over region servers.
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > As for map-reduce job the approach should not
>> affect
>> > > >>>>>> reading
>> > > >>>>>> > > > > performance
>> > > >>>>>> > > > > > at
>> > > >>>>>> > > > > > > all: it's just that there are bucketsCount times
>> more
>> > > >>>>>> splits and
>> > > >>>>>> > > > hence
>> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases
>> this
>> > > even
>> > > >>>>>> > improves
>> > > >>>>>> > > > > > overall
>> > > >>>>>> > > > > > > performance of the MR job since work is better
>> > > distributed
>> > > >>>>>> over
>> > > >>>>>> > > > cluster
>> > > >>>>>> > > > > > > (esp. in situation when the aim is to constantly
>> > process
>> > > >>>>>> the
>> > > >>>>>> > coming
>> > > >>>>>> > > > > delta
>> > > >>>>>> > > > > > > which usually resides in one or just couple of
>> regions
>> > > >>>>>> depending
>> > > >>>>>> > on
>> > > >>>>>> > > > > > > processing frequency).
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > If you can share details on your case, that will
>> help
>> > to
>> > > >>>>>> > understand
>> > > >>>>>> > > > > what
>> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > Alex Baranau
>> > > >>>>>> > > > > > > ----
>> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
>> -
>> > > Nutch
>> > > >>>>>> -
>> > > >>>>>> > Hadoop
>> > > >>>>>> > > -
>> > > >>>>>> > > > > > HBase
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
>> > > >>>>>> yuzhihong@gmail.com>
>> > > >>>>>> > > wrote:
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > > > > Interesting project, Alex.
>> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared to
>> one
>> > > >>>>>> scanner
>> > > >>>>>> > > > > > originally,
>> > > >>>>>> > > > > > > > have you performed load testing to see the impact
>> ?
>> > > >>>>>> > > > > > > >
>> > > >>>>>> > > > > > > > Thanks
>> > > >>>>>> > > > > > > >
>> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
>> > > >>>>>> > > > > > alex.baranov.v@gmail.com
>> > > >>>>>> > > > > > > > >wrote:
>> > > >>>>>> > > > > > > >
>> > > >>>>>> > > > > > > > > Hello guys,
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
>> project/lib
>> > > >>>>>> around
>> > > >>>>>> > > HBase:
>> > > >>>>>> > > > > > > HBaseWD.
>> > > >>>>>> > > > > > > > > It
>> > > >>>>>> > > > > > > > > is aimed to help with distribution of the load
>> > > (across
>> > > >>>>>> > > > > regionservers)
>> > > >>>>>> > > > > > > > when
>> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
>> nature)
>> > > >>>>>> records.
>> > > >>>>>> > It
>> > > >>>>>> > > > > > > implements
>> > > >>>>>> > > > > > > > > the solution which was discussed several times
>> on
>> > > this
>> > > >>>>>> > mailing
>> > > >>>>>> > > > list
>> > > >>>>>> > > > > > > (e.g.
>> > > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Please find the sources at
>> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
>> > > >>>>>> > > > > > > > > also
>> > > >>>>>> > > > > > > > > a jar of current version for convenience). It
>> is
>> > > very
>> > > >>>>>> easy to
>> > > >>>>>> > > > make
>> > > >>>>>> > > > > > use
>> > > >>>>>> > > > > > > of
>> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project
>> with
>> > 1+2
>> > > >>>>>> lines of
>> > > >>>>>> > > > code
>> > > >>>>>> > > > > > (one
>> > > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
>> > > MapReduce
>> > > >>>>>> job).
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Please find below the short intro to the lib
>> [1].
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Alex Baranau
>> > > >>>>>> > > > > > > > > ----
>> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
>> Lucene
>> > -
>> > > >>>>>> Nutch -
>> > > >>>>>> > > > Hadoop
>> > > >>>>>> > > > > -
>> > > >>>>>> > > > > > > > HBase
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > [1]
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Description:
>> > > >>>>>> > > > > > > > > ------------
>> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential)
>> > Writes.
>> > > >>>>>> It was
>> > > >>>>>> > > > > inspired
>> > > >>>>>> > > > > > by
>> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around the
>> > > problem
>> > > >>>>>> of
>> > > >>>>>> > > choosing
>> > > >>>>>> > > > > > > > between:
>> > > >>>>>> > > > > > > > > * writing records with sequential row keys
>> (e.g.
>> > > >>>>>> time-series
>> > > >>>>>> > > data
>> > > >>>>>> > > > > > with
>> > > >>>>>> > > > > > > > row
>> > > >>>>>> > > > > > > > > key
>> > > >>>>>> > > > > > > > >  built based on ts)
>> > > >>>>>> > > > > > > > > * using random unique IDs for records
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > First approach makes possible to perform fast
>> > range
>> > > >>>>>> scans
>> > > >>>>>> > with
>> > > >>>>>> > > > help
>> > > >>>>>> > > > > > of
>> > > >>>>>> > > > > > > > > setting
>> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates single
>> > > region
>> > > >>>>>> server
>> > > >>>>>> > > > > > > hot-spotting
>> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go in
>> > > sequence
>> > > >>>>>> all
>> > > >>>>>> > > records
>> > > >>>>>> > > > > end
>> > > >>>>>> > > > > > > up
>> > > >>>>>> > > > > > > > > written into a single region at a time).
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
>> > performance
>> > > >>>>>> by
>> > > >>>>>> > > > > distributing
>> > > >>>>>> > > > > > > new
>> > > >>>>>> > > > > > > > > records over random regions but makes not
>> possible
>> > > >>>>>> doing fast
>> > > >>>>>> > > > range
>> > > >>>>>> > > > > > > scans
>> > > >>>>>> > > > > > > > > against written data.
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > The suggested approach stays in the middle of
>> the
>> > > two
>> > > >>>>>> above
>> > > >>>>>> > and
>> > > >>>>>> > > > > > proved
>> > > >>>>>> > > > > > > to
>> > > >>>>>> > > > > > > > > perform well by distributing records over the
>> > > cluster
>> > > >>>>>> during
>> > > >>>>>> > > > > writing
>> > > >>>>>> > > > > > > data
>> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
>> > provides
>> > > >>>>>> very
>> > > >>>>>> > > simple
>> > > >>>>>> > > > > API
>> > > >>>>>> > > > > > to
>> > > >>>>>> > > > > > > > > work with which makes it perfect to use with
>> > > existing
>> > > >>>>>> code.
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info
>> as
>> > > they
>> > > >>>>>> aimed
>> > > >>>>>> > to
>> > > >>>>>> > > > act
>> > > >>>>>> > > > > as
>> > > >>>>>> > > > > > > > > example.
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
>> > > >>>>>> > > > > > > > > ----------------------------
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Distributing records with sequential keys which
>> > are
>> > > >>>>>> being
>> > > >>>>>> > > written
>> > > >>>>>> > > > > in
>> > > >>>>>> > > > > > up
>> > > >>>>>> > > > > > > > to
>> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
>> distributing
>> > > into
>> > > >>>>>> 32
>> > > >>>>>> > > buckets
>> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
>> > > >>>>>> > > > > > > > >                           new
>> > > >>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
>> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
>> > > >>>>>> > > > > > > > >      Put put = new
>> > > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
>> > > >>>>>> > > > > > > > >      ... // add values
>> > > >>>>>> > > > > > > > >      hTable.put(put);
>> > > >>>>>> > > > > > > > >    }
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Performing a range scan over written data
>> > > (internally
>> > > >>>>>> > > > > <bucketsCount>
>> > > >>>>>> > > > > > > > > scanners
>> > > >>>>>> > > > > > > > > executed):
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>> > > >>>>>> > > > > > > > >    ResultScanner rs =
>> > > >>>>>> DistributedScanner.create(hTable, scan,
>> > > >>>>>> > > > > > > > > keyDistributor);
>> > > >>>>>> > > > > > > > >    for (Result current : rs) {
>> > > >>>>>> > > > > > > > >      ...
>> > > >>>>>> > > > > > > > >    }
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Performing mapreduce job over written data
>> chunk
>> > > >>>>>> specified by
>> > > >>>>>> > > > Scan:
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    Configuration conf =
>> > HBaseConfiguration.create();
>> > > >>>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >
>>  TableMapReduceUtil.initTableMapperJob("table",
>> > > >>>>>> scan,
>> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
>> > > >>>>>> ImmutableBytesWritable.class,
>> > > >>>>>> > > > > > > Result.class,
>> > > >>>>>> > > > > > > > > job);
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    // Substituting standard TableInputFormat
>> which
>> > > was
>> > > >>>>>> set in
>> > > >>>>>> > > > > > > > >    //
>> TableMapReduceUtil.initTableMapperJob(...)
>> > > >>>>>> > > > > > > > >
>> > >  job.setInputFormatClass(WdTableInputFormat.class);
>> > > >>>>>> > > > > > > > >
>>  keyDistributor.addInfo(job.getConfiguration());
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
>> > > >>>>>> > > > > > > > > -----------------------------------------
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
>> support
>> > > >>>>>> custom row
>> > > >>>>>> > > key
>> > > >>>>>> > > > > > > > > distribution
>> > > >>>>>> > > > > > > > > approaches. To define custom row key
>> distributing
>> > > >>>>>> logic just
>> > > >>>>>> > > > > > implement
>> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which
>> is
>> > > >>>>>> really very
>> > > >>>>>> > > > > simple:
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > > >    public abstract class
>> AbstractRowKeyDistributor
>> > > >>>>>> implements
>> > > >>>>>> > > > > > > > > Parametrizable {
>> > > >>>>>> > > > > > > > >      public abstract byte[]
>> > getDistributedKey(byte[]
>> > > >>>>>> > > > originalKey);
>> > > >>>>>> > > > > > > > >      public abstract byte[]
>> getOriginalKey(byte[]
>> > > >>>>>> > adjustedKey);
>> > > >>>>>> > > > > > > > >      public abstract byte[][]
>> > > >>>>>> getAllDistributedKeys(byte[]
>> > > >>>>>> > > > > > > originalKey);
>> > > >>>>>> > > > > > > > >      ... // some utility methods
>> > > >>>>>> > > > > > > > >    }
>> > > >>>>>> > > > > > > > >
>> > > >>>>>> > > > > > > >
>> > > >>>>>> > > > > > >
>> > > >>>>>> > > > > >
>> > > >>>>>> > > > >
>> > > >>>>>> > > >
>> > > >>>>>> > >
>> > > >>>>>> >
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Yes, it's simple yet useful. I am integrating it. Thanks alot :)

On Fri, May 13, 2011 at 3:12 PM, Alex Baranau <al...@gmail.com>wrote:

> Thanks for the interest!
>
> We are using it in production. It is simple and hence quite stable. Though
> some minor pieces are missing (like
> https://github.com/sematext/HBaseWD/issues/1) this doesn't affect
> stability
> and/or major functionality.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > What's the status on this package? Is it mature enough?
> >  I am using it in my project, tried out the write method yesterday and
> > going
> > to incorporate into read method tomorrow.
> >
> > On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > > The start/end rows may be written twice.
> > >
> > > Yeah, I know. I meant that size of startRow+stopRow data is "bearable"
> in
> > > attribute value no matter how long are they (keys), since we already OK
> > > with
> > > transferring them initially (i.e. we should be OK with transferring 2x
> > > times
> > > more).
> > >
> > > So, what about the suggestion of sourceScan attribute value I
> mentioned?
> > If
> > > you can tell why it isn't sufficient in your case, I'd have more info
> to
> > > think about better suggestion ;)
> > >
> > > > It is Okay to keep all versions of your patch in the JIRA.
> > > > Maybe the second should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > >
> > > np. Can do that. Just thought that they (patches) can be sorted by date
> > to
> > > find out the final one (aka "convention over naming-rules").
> > >
> > > Alex.
> > >
> > > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > >> Though it might be ok, since we anyways "transfer" start/stop rows
> > > with
> > > > Scan object.
> > > > In write() method, we now have:
> > > >     Bytes.writeByteArray(out, this.startRow);
> > > >     Bytes.writeByteArray(out, this.stopRow);
> > > > ...
> > > >       for (Map.Entry<String, byte[]> attr :
> this.attributes.entrySet())
> > {
> > > >         WritableUtils.writeString(out, attr.getKey());
> > > >         Bytes.writeByteArray(out, attr.getValue());
> > > >       }
> > > > The start/end rows may be written twice.
> > > >
> > > > Of course, you have full control over how to generate the unique ID
> for
> > > > "sourceScan" attribute.
> > > >
> > > > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> > > second
> > > > should be named HBASE-3811-v2.patch<
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > > >?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > >wrote:
> > > >
> > > >> > Can you remove the first version ?
> > > >> Isn't it ok to keep it in JIRA issue?
> > > >>
> > > >>
> > > >> > In HBaseWD, can you use reflection to detect whether Scan supports
> > > >> setAttribute() ?
> > > >> > If it does, can you encode start row and end row as "sourceScan"
> > > >> attribute ?
> > > >>
> > > >> Yeah, smth like this is going to be implemented. Though I'd still
> want
> > > to
> > > >> hear from the devs the story about Scan version.
> > > >>
> > > >>
> > > >> > One consideration is that start row or end row may be quite long.
> > > >>
> > > >> Yeah, that is was my though too at first. Though it might be ok,
> since
> > > we
> > > >> anyways "transfer" start/stop rows with Scan object.
> > > >>
> > > >> > What do you think ?
> > > >>
> > > >> I'd love to hear from you is this variant I mentioned is what we are
> > > >> looking at here:
> > > >>
> > > >>
> > > >> > From what I understand, you want to distinguish scans fired by the
> > > same
> > > >> distributed scan.
> > > >> > I.e. group scans which were fired by single distributed scan. If
> > > that's
> > > >> what you want, distributed
> > > >> > scan can generate unique ID and set, say "sourceScan" attribute to
> > its
> > > >> value. This way we'll
> > > >> > have <# of distinct "sourceScan" attribute values> = <number of
> > > >> distributed scans invoked by
> > > >> > client side> and two scans on server side will have the same
> > > >> "sourceScan" attribute iff they
> > > >> > "belong" to same distributed scan.
> > > >>
> > > >>
> > > >> Alex Baranau
> > > >> ----
> > > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > >> HBase
> > > >>
> > > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >>
> > > >>> Alex:
> > > >>> Your second patch looks good.
> > > >>> Can you remove the first version ?
> > > >>>
> > > >>> In HBaseWD, can you use reflection to detect whether Scan supports
> > > >>> setAttribute() ?
> > > >>> If it does, can you encode start row and end row as "sourceScan"
> > > >>> attribute ?
> > > >>>
> > > >>> One consideration is that start row or end row may be quite long.
> > > >>> Ideally we should store hash code of source Scan object as
> > "sourceScan"
> > > >>> attribute. But Scan doesn't implement hashCode(). We can add it,
> that
> > > would
> > > >>> require running all Scan related tests.
> > > >>>
> > > >>> What do you think ?
> > > >>>
> > > >>> Thanks
> > > >>>
> > > >>>
> > > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > > alex.baranov.v@gmail.com>wrote:
> > > >>>
> > > >>>> Sorry for the delay in response (public holidays here).
> > > >>>>
> > > >>>> This depends on what info you are looking for on server side.
> > > >>>>
> > > >>>> From what I understand, you want to distinguish scans fired by the
> > > same
> > > >>>> distributed scan. I.e. group scans which were fired by single
> > > distributed
> > > >>>> scan. If that's what you want, distributed scan can generate
> unique
> > ID
> > > and
> > > >>>> set, say "sourceScan" attribute to its value. This way we'll have
> <#
> > > of
> > > >>>> distinct "sourceScan" attribute values> = <number of distributed
> > scans
> > > >>>> invoked by client side> and two scans on server side will have the
> > > same
> > > >>>> "sourceScan" attribute iff they "belong" to same distributed scan.
> > > >>>>
> > > >>>> Is this what are you looking for?
> > > >>>>
> > > >>>> Alex Baranau
> > > >>>>
> > > >>>> P.S. attached patch for HBASE-3811<
> > > https://issues.apache.org/jira/browse/HBASE-3811>
> > > >>>> .
> > > >>>> P.S-2. should this conversation be moved to dev list?
> > > >>>>
> > > >>>> ----
> > > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > > >>>> HBase
> > > >>>>
> > > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>>> Alex:
> > > >>>>> What type of identification should we put in the map of the Scan
> > > object
> > > >>>>> ?
> > > >>>>> I am thinking of using the Id of RowKeyDistributor. But the user
> > can
> > > >>>>> use same distributor on multiple scans.
> > > >>>>>
> > > >>>>> Please share your thought.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > > >>>>> alex.baranov.v@gmail.com> wrote:
> > > >>>>>
> > > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > > >>>>>>
> > > >>>>>> Alex Baranau
> > > >>>>>> ----
> > > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > > -
> > > >>>>>> HBase
> > > >>>>>>
> > > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > >>>>>>
> > > >>>>>> > My plan was to make regions that have active scanners more
> > stable
> > > -
> > > >>>>>> trying
> > > >>>>>> > not to move them when balancing.
> > > >>>>>> > I prefer second approach - adding custom attribute(s) to Scan
> so
> > > >>>>>> that the
> > > >>>>>> > Scans created by the method below can be 'grouped'.
> > > >>>>>> >
> > > >>>>>> > If you can file a JIRA, that would be great.
> > > >>>>>> >
> > > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > > >>>>>> alex.baranov.v@gmail.com
> > > >>>>>> > >wrote:
> > > >>>>>> >
> > > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > > >>>>>> differently) when
> > > >>>>>> > > determining the load?
> > > >>>>>> > >
> > > >>>>>> > > The current code looks like this:
> > > >>>>>> > >
> > > >>>>>> > > class DistributedScanner:
> > > >>>>>> > >  public static DistributedScanner create(HTable hTable, Scan
> > > >>>>>> original,
> > > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException
> {
> > > >>>>>> > >    byte[][] startKeys =
> > > >>>>>> > >
> keyDistributor.getAllDistributedKeys(original.getStartRow());
> > > >>>>>> > >    byte[][] stopKeys =
> > > >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow());
> > > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > > >>>>>> > >      scans[i] = new Scan(original);
> > > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > > >>>>>> > >    }
> > > >>>>>> > >
> > > >>>>>> > >    ResultScanner[] rss = new
> ResultScanner[startKeys.length];
> > > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > > >>>>>> > >    }
> > > >>>>>> > >
> > > >>>>>> > >    return new DistributedScanner(rss);
> > > >>>>>> > >  }
> > > >>>>>> > >
> > > >>>>>> > > This is client code. To make these scans "identifiable" we
> > need
> > > to
> > > >>>>>> either
> > > >>>>>> > > use some different (derived from Scan) class or add some
> > > attribute
> > > >>>>>> to
> > > >>>>>> > them.
> > > >>>>>> > > There's no API for doing the latter. But we can do the
> former,
> > > but
> > > >>>>>> I
> > > >>>>>> > don't
> > > >>>>>> > > really like the idea of creating extra class (with no extra
> > > >>>>>> > functionality)
> > > >>>>>> > > just to distinguish it from the base one.
> > > >>>>>> > >
> > > >>>>>> > > If you can share why/how do you want to treat them
> differently
> > > on
> > > >>>>>> server
> > > >>>>>> > > side, that would be helpful.
> > > >>>>>> > >
> > > >>>>>> > > Alex Baranau
> > > >>>>>> > > ----
> > > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > >>>>>> Hadoop -
> > > >>>>>> > HBase
> > > >>>>>> > >
> > > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <
> yuzhihong@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>> > >
> > > >>>>>> > > > My request would be to make the distributed scan
> > identifiable
> > > >>>>>> from
> > > >>>>>> > server
> > > >>>>>> > > > side.
> > > >>>>>> > > > :-)
> > > >>>>>> > > >
> > > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > > >>>>>> > alex.baranov.v@gmail.com
> > > >>>>>> > > > >wrote:
> > > >>>>>> > > >
> > > >>>>>> > > > > > Basically bucketsCount may not equal number of regions
> > for
> > > >>>>>> the
> > > >>>>>> > > > underlying
> > > >>>>>> > > > > > table.
> > > >>>>>> > > > >
> > > >>>>>> > > > > True: e.g. when there's only one region that holds data
> > for
> > > >>>>>> the whole
> > > >>>>>> > > > table
> > > >>>>>> > > > > (not many records in table yet), distributed scan will
> > fire
> > > N
> > > >>>>>> scans
> > > >>>>>> > > > against
> > > >>>>>> > > > > the same region.
> > > >>>>>> > > > > On the other hand, in case there are huge number of
> > regions
> > > >>>>>> for
> > > >>>>>> > single
> > > >>>>>> > > > > table, each scan can span over multiple regions.
> > > >>>>>> > > > >
> > > >>>>>> > > > > > I need to deal with normal scan and "distributed scan"
> > at
> > > >>>>>> server
> > > >>>>>> > > side.
> > > >>>>>> > > > >
> > > >>>>>> > > > > With current implementation "distributed" scan won't be
> > > >>>>>> recognized as
> > > >>>>>> > > > > something special on the server side. It will be an
> > ordinary
> > > >>>>>> scan.
> > > >>>>>> > > Though
> > > >>>>>> > > > > the number of scan will increase, given that the typical
> > > >>>>>> situation is
> > > >>>>>> > > > "many
> > > >>>>>> > > > > regions for single table", the scans of the same
> > > "distributed
> > > >>>>>> scan"
> > > >>>>>> > are
> > > >>>>>> > > > > likely not to hit the same region.
> > > >>>>>> > > > >
> > > >>>>>> > > > > Not sure if I answered your questions here. Feel free to
> > ask
> > > >>>>>> more ;)
> > > >>>>>> > > > >
> > > >>>>>> > > > > Alex Baranau
> > > >>>>>> > > > > ----
> > > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> > -
> > > >>>>>> Hadoop -
> > > >>>>>> > > > HBase
> > > >>>>>> > > > >
> > > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > > yuzhihong@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>> > > > >
> > > >>>>>> > > > > > Alex:
> > > >>>>>> > > > > > If you read this, you would know why I asked:
> > > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > > >>>>>> > > > > >
> > > >>>>>> > > > > > I need to deal with normal scan and "distributed scan"
> > at
> > > >>>>>> server
> > > >>>>>> > > side.
> > > >>>>>> > > > > > Basically bucketsCount may not equal number of regions
> > for
> > > >>>>>> the
> > > >>>>>> > > > underlying
> > > >>>>>> > > > > > table.
> > > >>>>>> > > > > >
> > > >>>>>> > > > > > Cheers
> > > >>>>>> > > > > >
> > > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > > >>>>>> > > > alex.baranov.v@gmail.com
> > > >>>>>> > > > > > >wrote:
> > > >>>>>> > > > > >
> > > >>>>>> > > > > > > Hi Ted,
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > We currently use this tool in the scenario where
> data
> > is
> > > >>>>>> consumed
> > > >>>>>> > > by
> > > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the performance
> > of
> > > >>>>>> pure
> > > >>>>>> > > > > "distributed
> > > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it
> > to
> > > be
> > > >>>>>> close
> > > >>>>>> > to
> > > >>>>>> > > > > > simple
> > > >>>>>> > > > > > > scan performance, or may be sometimes even faster
> > > >>>>>> depending on
> > > >>>>>> > your
> > > >>>>>> > > > > data
> > > >>>>>> > > > > > > access patterns. E.g. in case you write timeseries
> > data
> > > >>>>>> > > (sequential)
> > > >>>>>> > > > > > which
> > > >>>>>> > > > > > > is written into the single region at a time, then
> e.g.
> > > if
> > > >>>>>> you
> > > >>>>>> > > access
> > > >>>>>> > > > > > delta
> > > >>>>>> > > > > > > for further processing/analysis (esp. if from not
> > single
> > > >>>>>> client)
> > > >>>>>> > > > these
> > > >>>>>> > > > > > > scans
> > > >>>>>> > > > > > > are likely to hit the same region or couple of
> regions
> > > at
> > > >>>>>> a time,
> > > >>>>>> > > > which
> > > >>>>>> > > > > > may
> > > >>>>>> > > > > > > perform worse comparing to many scans hitting data
> > that
> > > is
> > > >>>>>> much
> > > >>>>>> > > > better
> > > >>>>>> > > > > > > spread over region servers.
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > As for map-reduce job the approach should not affect
> > > >>>>>> reading
> > > >>>>>> > > > > performance
> > > >>>>>> > > > > > at
> > > >>>>>> > > > > > > all: it's just that there are bucketsCount times
> more
> > > >>>>>> splits and
> > > >>>>>> > > > hence
> > > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases
> this
> > > even
> > > >>>>>> > improves
> > > >>>>>> > > > > > overall
> > > >>>>>> > > > > > > performance of the MR job since work is better
> > > distributed
> > > >>>>>> over
> > > >>>>>> > > > cluster
> > > >>>>>> > > > > > > (esp. in situation when the aim is to constantly
> > process
> > > >>>>>> the
> > > >>>>>> > coming
> > > >>>>>> > > > > delta
> > > >>>>>> > > > > > > which usually resides in one or just couple of
> regions
> > > >>>>>> depending
> > > >>>>>> > on
> > > >>>>>> > > > > > > processing frequency).
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > If you can share details on your case, that will
> help
> > to
> > > >>>>>> > understand
> > > >>>>>> > > > > what
> > > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > Alex Baranau
> > > >>>>>> > > > > > > ----
> > > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > > Nutch
> > > >>>>>> -
> > > >>>>>> > Hadoop
> > > >>>>>> > > -
> > > >>>>>> > > > > > HBase
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > > >>>>>> yuzhihong@gmail.com>
> > > >>>>>> > > wrote:
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > > > > Interesting project, Alex.
> > > >>>>>> > > > > > > > Since there're bucketsCount scanners compared to
> one
> > > >>>>>> scanner
> > > >>>>>> > > > > > originally,
> > > >>>>>> > > > > > > > have you performed load testing to see the impact
> ?
> > > >>>>>> > > > > > > >
> > > >>>>>> > > > > > > > Thanks
> > > >>>>>> > > > > > > >
> > > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > > >>>>>> > > > > > alex.baranov.v@gmail.com
> > > >>>>>> > > > > > > > >wrote:
> > > >>>>>> > > > > > > >
> > > >>>>>> > > > > > > > > Hello guys,
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > I'd like to introduce a new small java
> project/lib
> > > >>>>>> around
> > > >>>>>> > > HBase:
> > > >>>>>> > > > > > > HBaseWD.
> > > >>>>>> > > > > > > > > It
> > > >>>>>> > > > > > > > > is aimed to help with distribution of the load
> > > (across
> > > >>>>>> > > > > regionservers)
> > > >>>>>> > > > > > > > when
> > > >>>>>> > > > > > > > > writing sequential (becasue of the row key
> nature)
> > > >>>>>> records.
> > > >>>>>> > It
> > > >>>>>> > > > > > > implements
> > > >>>>>> > > > > > > > > the solution which was discussed several times
> on
> > > this
> > > >>>>>> > mailing
> > > >>>>>> > > > list
> > > >>>>>> > > > > > > (e.g.
> > > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Please find the sources at
> > > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > > >>>>>> > > > > > > > > also
> > > >>>>>> > > > > > > > > a jar of current version for convenience). It is
> > > very
> > > >>>>>> easy to
> > > >>>>>> > > > make
> > > >>>>>> > > > > > use
> > > >>>>>> > > > > > > of
> > > >>>>>> > > > > > > > > it: e.g. I added it to one existing project with
> > 1+2
> > > >>>>>> lines of
> > > >>>>>> > > > code
> > > >>>>>> > > > > > (one
> > > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
> > > MapReduce
> > > >>>>>> job).
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Please find below the short intro to the lib
> [1].
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Alex Baranau
> > > >>>>>> > > > > > > > > ----
> > > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr -
> Lucene
> > -
> > > >>>>>> Nutch -
> > > >>>>>> > > > Hadoop
> > > >>>>>> > > > > -
> > > >>>>>> > > > > > > > HBase
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > [1]
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Description:
> > > >>>>>> > > > > > > > > ------------
> > > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential)
> > Writes.
> > > >>>>>> It was
> > > >>>>>> > > > > inspired
> > > >>>>>> > > > > > by
> > > >>>>>> > > > > > > > > discussions on HBase mailing lists around the
> > > problem
> > > >>>>>> of
> > > >>>>>> > > choosing
> > > >>>>>> > > > > > > > between:
> > > >>>>>> > > > > > > > > * writing records with sequential row keys (e.g.
> > > >>>>>> time-series
> > > >>>>>> > > data
> > > >>>>>> > > > > > with
> > > >>>>>> > > > > > > > row
> > > >>>>>> > > > > > > > > key
> > > >>>>>> > > > > > > > >  built based on ts)
> > > >>>>>> > > > > > > > > * using random unique IDs for records
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > First approach makes possible to perform fast
> > range
> > > >>>>>> scans
> > > >>>>>> > with
> > > >>>>>> > > > help
> > > >>>>>> > > > > > of
> > > >>>>>> > > > > > > > > setting
> > > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates single
> > > region
> > > >>>>>> server
> > > >>>>>> > > > > > > hot-spotting
> > > >>>>>> > > > > > > > > problem upon writing data (as row keys go in
> > > sequence
> > > >>>>>> all
> > > >>>>>> > > records
> > > >>>>>> > > > > end
> > > >>>>>> > > > > > > up
> > > >>>>>> > > > > > > > > written into a single region at a time).
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Second approach aims for fastest writing
> > performance
> > > >>>>>> by
> > > >>>>>> > > > > distributing
> > > >>>>>> > > > > > > new
> > > >>>>>> > > > > > > > > records over random regions but makes not
> possible
> > > >>>>>> doing fast
> > > >>>>>> > > > range
> > > >>>>>> > > > > > > scans
> > > >>>>>> > > > > > > > > against written data.
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > The suggested approach stays in the middle of
> the
> > > two
> > > >>>>>> above
> > > >>>>>> > and
> > > >>>>>> > > > > > proved
> > > >>>>>> > > > > > > to
> > > >>>>>> > > > > > > > > perform well by distributing records over the
> > > cluster
> > > >>>>>> during
> > > >>>>>> > > > > writing
> > > >>>>>> > > > > > > data
> > > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> > provides
> > > >>>>>> very
> > > >>>>>> > > simple
> > > >>>>>> > > > > API
> > > >>>>>> > > > > > to
> > > >>>>>> > > > > > > > > work with which makes it perfect to use with
> > > existing
> > > >>>>>> code.
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info as
> > > they
> > > >>>>>> aimed
> > > >>>>>> > to
> > > >>>>>> > > > act
> > > >>>>>> > > > > as
> > > >>>>>> > > > > > > > > example.
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > > >>>>>> > > > > > > > > ----------------------------
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Distributing records with sequential keys which
> > are
> > > >>>>>> being
> > > >>>>>> > > written
> > > >>>>>> > > > > in
> > > >>>>>> > > > > > up
> > > >>>>>> > > > > > > > to
> > > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; //
> distributing
> > > into
> > > >>>>>> 32
> > > >>>>>> > > buckets
> > > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > > >>>>>> > > > > > > > >                           new
> > > >>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > > >>>>>> > > > > > > > >      Put put = new
> > > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
> > > >>>>>> > > > > > > > >      ... // add values
> > > >>>>>> > > > > > > > >      hTable.put(put);
> > > >>>>>> > > > > > > > >    }
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Performing a range scan over written data
> > > (internally
> > > >>>>>> > > > > <bucketsCount>
> > > >>>>>> > > > > > > > > scanners
> > > >>>>>> > > > > > > > > executed):
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > >>>>>> > > > > > > > >    ResultScanner rs =
> > > >>>>>> DistributedScanner.create(hTable, scan,
> > > >>>>>> > > > > > > > > keyDistributor);
> > > >>>>>> > > > > > > > >    for (Result current : rs) {
> > > >>>>>> > > > > > > > >      ...
> > > >>>>>> > > > > > > > >    }
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Performing mapreduce job over written data chunk
> > > >>>>>> specified by
> > > >>>>>> > > > Scan:
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    Configuration conf =
> > HBaseConfiguration.create();
> > > >>>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >
>  TableMapReduceUtil.initTableMapperJob("table",
> > > >>>>>> scan,
> > > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > > >>>>>> ImmutableBytesWritable.class,
> > > >>>>>> > > > > > > Result.class,
> > > >>>>>> > > > > > > > > job);
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    // Substituting standard TableInputFormat
> which
> > > was
> > > >>>>>> set in
> > > >>>>>> > > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > >>>>>> > > > > > > > >
> > >  job.setInputFormatClass(WdTableInputFormat.class);
> > > >>>>>> > > > > > > > >
>  keyDistributor.addInfo(job.getConfiguration());
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > > >>>>>> > > > > > > > > -----------------------------------------
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to
> support
> > > >>>>>> custom row
> > > >>>>>> > > key
> > > >>>>>> > > > > > > > > distribution
> > > >>>>>> > > > > > > > > approaches. To define custom row key
> distributing
> > > >>>>>> logic just
> > > >>>>>> > > > > > implement
> > > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which
> is
> > > >>>>>> really very
> > > >>>>>> > > > > simple:
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > > >    public abstract class
> AbstractRowKeyDistributor
> > > >>>>>> implements
> > > >>>>>> > > > > > > > > Parametrizable {
> > > >>>>>> > > > > > > > >      public abstract byte[]
> > getDistributedKey(byte[]
> > > >>>>>> > > > originalKey);
> > > >>>>>> > > > > > > > >      public abstract byte[]
> getOriginalKey(byte[]
> > > >>>>>> > adjustedKey);
> > > >>>>>> > > > > > > > >      public abstract byte[][]
> > > >>>>>> getAllDistributedKeys(byte[]
> > > >>>>>> > > > > > > originalKey);
> > > >>>>>> > > > > > > > >      ... // some utility methods
> > > >>>>>> > > > > > > > >    }
> > > >>>>>> > > > > > > > >
> > > >>>>>> > > > > > > >
> > > >>>>>> > > > > > >
> > > >>>>>> > > > > >
> > > >>>>>> > > > >
> > > >>>>>> > > >
> > > >>>>>> > >
> > > >>>>>> >
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

Thanks for the interest!

We are using it in production. It is simple and hence quite stable. Though
some minor pieces are missing (like
https://github.com/sematext/HBaseWD/issues/1) this doesn't affect stability
and/or major functionality.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Fri, May 13, 2011 at 10:45 AM, Weishung Chung <we...@gmail.com> wrote:

> What's the status on this package? Is it mature enough?
>  I am using it in my project, tried out the write method yesterday and
> going
> to incorporate into read method tomorrow.
>
> On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > > The start/end rows may be written twice.
> >
> > Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in
> > attribute value no matter how long are they (keys), since we already OK
> > with
> > transferring them initially (i.e. we should be OK with transferring 2x
> > times
> > more).
> >
> > So, what about the suggestion of sourceScan attribute value I mentioned?
> If
> > you can tell why it isn't sufficient in your case, I'd have more info to
> > think about better suggestion ;)
> >
> > > It is Okay to keep all versions of your patch in the JIRA.
> > > Maybe the second should be named HBASE-3811-v2.patch<
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >?
> >
> > np. Can do that. Just thought that they (patches) can be sorted by date
> to
> > find out the final one (aka "convention over naming-rules").
> >
> > Alex.
> >
> > On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > >> Though it might be ok, since we anyways "transfer" start/stop rows
> > with
> > > Scan object.
> > > In write() method, we now have:
> > >     Bytes.writeByteArray(out, this.startRow);
> > >     Bytes.writeByteArray(out, this.stopRow);
> > > ...
> > >       for (Map.Entry<String, byte[]> attr : this.attributes.entrySet())
> {
> > >         WritableUtils.writeString(out, attr.getKey());
> > >         Bytes.writeByteArray(out, attr.getValue());
> > >       }
> > > The start/end rows may be written twice.
> > >
> > > Of course, you have full control over how to generate the unique ID for
> > > "sourceScan" attribute.
> > >
> > > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> > second
> > > should be named HBASE-3811-v2.patch<
> >
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> > >?
> > >
> > > Thanks
> > >
> > >
> > > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >wrote:
> > >
> > >> > Can you remove the first version ?
> > >> Isn't it ok to keep it in JIRA issue?
> > >>
> > >>
> > >> > In HBaseWD, can you use reflection to detect whether Scan supports
> > >> setAttribute() ?
> > >> > If it does, can you encode start row and end row as "sourceScan"
> > >> attribute ?
> > >>
> > >> Yeah, smth like this is going to be implemented. Though I'd still want
> > to
> > >> hear from the devs the story about Scan version.
> > >>
> > >>
> > >> > One consideration is that start row or end row may be quite long.
> > >>
> > >> Yeah, that is was my though too at first. Though it might be ok, since
> > we
> > >> anyways "transfer" start/stop rows with Scan object.
> > >>
> > >> > What do you think ?
> > >>
> > >> I'd love to hear from you is this variant I mentioned is what we are
> > >> looking at here:
> > >>
> > >>
> > >> > From what I understand, you want to distinguish scans fired by the
> > same
> > >> distributed scan.
> > >> > I.e. group scans which were fired by single distributed scan. If
> > that's
> > >> what you want, distributed
> > >> > scan can generate unique ID and set, say "sourceScan" attribute to
> its
> > >> value. This way we'll
> > >> > have <# of distinct "sourceScan" attribute values> = <number of
> > >> distributed scans invoked by
> > >> > client side> and two scans on server side will have the same
> > >> "sourceScan" attribute iff they
> > >> > "belong" to same distributed scan.
> > >>
> > >>
> > >> Alex Baranau
> > >> ----
> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > >> HBase
> > >>
> > >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com> wrote:
> > >>
> > >>> Alex:
> > >>> Your second patch looks good.
> > >>> Can you remove the first version ?
> > >>>
> > >>> In HBaseWD, can you use reflection to detect whether Scan supports
> > >>> setAttribute() ?
> > >>> If it does, can you encode start row and end row as "sourceScan"
> > >>> attribute ?
> > >>>
> > >>> One consideration is that start row or end row may be quite long.
> > >>> Ideally we should store hash code of source Scan object as
> "sourceScan"
> > >>> attribute. But Scan doesn't implement hashCode(). We can add it, that
> > would
> > >>> require running all Scan related tests.
> > >>>
> > >>> What do you think ?
> > >>>
> > >>> Thanks
> > >>>
> > >>>
> > >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> > alex.baranov.v@gmail.com>wrote:
> > >>>
> > >>>> Sorry for the delay in response (public holidays here).
> > >>>>
> > >>>> This depends on what info you are looking for on server side.
> > >>>>
> > >>>> From what I understand, you want to distinguish scans fired by the
> > same
> > >>>> distributed scan. I.e. group scans which were fired by single
> > distributed
> > >>>> scan. If that's what you want, distributed scan can generate unique
> ID
> > and
> > >>>> set, say "sourceScan" attribute to its value. This way we'll have <#
> > of
> > >>>> distinct "sourceScan" attribute values> = <number of distributed
> scans
> > >>>> invoked by client side> and two scans on server side will have the
> > same
> > >>>> "sourceScan" attribute iff they "belong" to same distributed scan.
> > >>>>
> > >>>> Is this what are you looking for?
> > >>>>
> > >>>> Alex Baranau
> > >>>>
> > >>>> P.S. attached patch for HBASE-3811<
> > https://issues.apache.org/jira/browse/HBASE-3811>
> > >>>> .
> > >>>> P.S-2. should this conversation be moved to dev list?
> > >>>>
> > >>>> ----
> > >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > >>>> HBase
> > >>>>
> > >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > >>>>
> > >>>>> Alex:
> > >>>>> What type of identification should we put in the map of the Scan
> > object
> > >>>>> ?
> > >>>>> I am thinking of using the Id of RowKeyDistributor. But the user
> can
> > >>>>> use same distributor on multiple scans.
> > >>>>>
> > >>>>> Please share your thought.
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> > >>>>> alex.baranov.v@gmail.com> wrote:
> > >>>>>
> > >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> > >>>>>>
> > >>>>>> Alex Baranau
> > >>>>>> ----
> > >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > >>>>>> HBase
> > >>>>>>
> > >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >>>>>>
> > >>>>>> > My plan was to make regions that have active scanners more
> stable
> > -
> > >>>>>> trying
> > >>>>>> > not to move them when balancing.
> > >>>>>> > I prefer second approach - adding custom attribute(s) to Scan so
> > >>>>>> that the
> > >>>>>> > Scans created by the method below can be 'grouped'.
> > >>>>>> >
> > >>>>>> > If you can file a JIRA, that would be great.
> > >>>>>> >
> > >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> > >>>>>> alex.baranov.v@gmail.com
> > >>>>>> > >wrote:
> > >>>>>> >
> > >>>>>> > > Aha, so you want to "count" it as single scan (or just
> > >>>>>> differently) when
> > >>>>>> > > determining the load?
> > >>>>>> > >
> > >>>>>> > > The current code looks like this:
> > >>>>>> > >
> > >>>>>> > > class DistributedScanner:
> > >>>>>> > >  public static DistributedScanner create(HTable hTable, Scan
> > >>>>>> original,
> > >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException {
> > >>>>>> > >    byte[][] startKeys =
> > >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStartRow());
> > >>>>>> > >    byte[][] stopKeys =
> > >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow());
> > >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> > >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> > >>>>>> > >      scans[i] = new Scan(original);
> > >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> > >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> > >>>>>> > >    }
> > >>>>>> > >
> > >>>>>> > >    ResultScanner[] rss = new ResultScanner[startKeys.length];
> > >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> > >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> > >>>>>> > >    }
> > >>>>>> > >
> > >>>>>> > >    return new DistributedScanner(rss);
> > >>>>>> > >  }
> > >>>>>> > >
> > >>>>>> > > This is client code. To make these scans "identifiable" we
> need
> > to
> > >>>>>> either
> > >>>>>> > > use some different (derived from Scan) class or add some
> > attribute
> > >>>>>> to
> > >>>>>> > them.
> > >>>>>> > > There's no API for doing the latter. But we can do the former,
> > but
> > >>>>>> I
> > >>>>>> > don't
> > >>>>>> > > really like the idea of creating extra class (with no extra
> > >>>>>> > functionality)
> > >>>>>> > > just to distinguish it from the base one.
> > >>>>>> > >
> > >>>>>> > > If you can share why/how do you want to treat them differently
> > on
> > >>>>>> server
> > >>>>>> > > side, that would be helpful.
> > >>>>>> > >
> > >>>>>> > > Alex Baranau
> > >>>>>> > > ----
> > >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > >>>>>> Hadoop -
> > >>>>>> > HBase
> > >>>>>> > >
> > >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com>
> > >>>>>> wrote:
> > >>>>>> > >
> > >>>>>> > > > My request would be to make the distributed scan
> identifiable
> > >>>>>> from
> > >>>>>> > server
> > >>>>>> > > > side.
> > >>>>>> > > > :-)
> > >>>>>> > > >
> > >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> > >>>>>> > alex.baranov.v@gmail.com
> > >>>>>> > > > >wrote:
> > >>>>>> > > >
> > >>>>>> > > > > > Basically bucketsCount may not equal number of regions
> for
> > >>>>>> the
> > >>>>>> > > > underlying
> > >>>>>> > > > > > table.
> > >>>>>> > > > >
> > >>>>>> > > > > True: e.g. when there's only one region that holds data
> for
> > >>>>>> the whole
> > >>>>>> > > > table
> > >>>>>> > > > > (not many records in table yet), distributed scan will
> fire
> > N
> > >>>>>> scans
> > >>>>>> > > > against
> > >>>>>> > > > > the same region.
> > >>>>>> > > > > On the other hand, in case there are huge number of
> regions
> > >>>>>> for
> > >>>>>> > single
> > >>>>>> > > > > table, each scan can span over multiple regions.
> > >>>>>> > > > >
> > >>>>>> > > > > > I need to deal with normal scan and "distributed scan"
> at
> > >>>>>> server
> > >>>>>> > > side.
> > >>>>>> > > > >
> > >>>>>> > > > > With current implementation "distributed" scan won't be
> > >>>>>> recognized as
> > >>>>>> > > > > something special on the server side. It will be an
> ordinary
> > >>>>>> scan.
> > >>>>>> > > Though
> > >>>>>> > > > > the number of scan will increase, given that the typical
> > >>>>>> situation is
> > >>>>>> > > > "many
> > >>>>>> > > > > regions for single table", the scans of the same
> > "distributed
> > >>>>>> scan"
> > >>>>>> > are
> > >>>>>> > > > > likely not to hit the same region.
> > >>>>>> > > > >
> > >>>>>> > > > > Not sure if I answered your questions here. Feel free to
> ask
> > >>>>>> more ;)
> > >>>>>> > > > >
> > >>>>>> > > > > Alex Baranau
> > >>>>>> > > > > ----
> > >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> -
> > >>>>>> Hadoop -
> > >>>>>> > > > HBase
> > >>>>>> > > > >
> > >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> > yuzhihong@gmail.com>
> > >>>>>> wrote:
> > >>>>>> > > > >
> > >>>>>> > > > > > Alex:
> > >>>>>> > > > > > If you read this, you would know why I asked:
> > >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > >>>>>> > > > > >
> > >>>>>> > > > > > I need to deal with normal scan and "distributed scan"
> at
> > >>>>>> server
> > >>>>>> > > side.
> > >>>>>> > > > > > Basically bucketsCount may not equal number of regions
> for
> > >>>>>> the
> > >>>>>> > > > underlying
> > >>>>>> > > > > > table.
> > >>>>>> > > > > >
> > >>>>>> > > > > > Cheers
> > >>>>>> > > > > >
> > >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > >>>>>> > > > alex.baranov.v@gmail.com
> > >>>>>> > > > > > >wrote:
> > >>>>>> > > > > >
> > >>>>>> > > > > > > Hi Ted,
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > We currently use this tool in the scenario where data
> is
> > >>>>>> consumed
> > >>>>>> > > by
> > >>>>>> > > > > > > MapReduce jobs, so we haven't tested the performance
> of
> > >>>>>> pure
> > >>>>>> > > > > "distributed
> > >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it
> to
> > be
> > >>>>>> close
> > >>>>>> > to
> > >>>>>> > > > > > simple
> > >>>>>> > > > > > > scan performance, or may be sometimes even faster
> > >>>>>> depending on
> > >>>>>> > your
> > >>>>>> > > > > data
> > >>>>>> > > > > > > access patterns. E.g. in case you write timeseries
> data
> > >>>>>> > > (sequential)
> > >>>>>> > > > > > which
> > >>>>>> > > > > > > is written into the single region at a time, then e.g.
> > if
> > >>>>>> you
> > >>>>>> > > access
> > >>>>>> > > > > > delta
> > >>>>>> > > > > > > for further processing/analysis (esp. if from not
> single
> > >>>>>> client)
> > >>>>>> > > > these
> > >>>>>> > > > > > > scans
> > >>>>>> > > > > > > are likely to hit the same region or couple of regions
> > at
> > >>>>>> a time,
> > >>>>>> > > > which
> > >>>>>> > > > > > may
> > >>>>>> > > > > > > perform worse comparing to many scans hitting data
> that
> > is
> > >>>>>> much
> > >>>>>> > > > better
> > >>>>>> > > > > > > spread over region servers.
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > As for map-reduce job the approach should not affect
> > >>>>>> reading
> > >>>>>> > > > > performance
> > >>>>>> > > > > > at
> > >>>>>> > > > > > > all: it's just that there are bucketsCount times more
> > >>>>>> splits and
> > >>>>>> > > > hence
> > >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases this
> > even
> > >>>>>> > improves
> > >>>>>> > > > > > overall
> > >>>>>> > > > > > > performance of the MR job since work is better
> > distributed
> > >>>>>> over
> > >>>>>> > > > cluster
> > >>>>>> > > > > > > (esp. in situation when the aim is to constantly
> process
> > >>>>>> the
> > >>>>>> > coming
> > >>>>>> > > > > delta
> > >>>>>> > > > > > > which usually resides in one or just couple of regions
> > >>>>>> depending
> > >>>>>> > on
> > >>>>>> > > > > > > processing frequency).
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > If you can share details on your case, that will help
> to
> > >>>>>> > understand
> > >>>>>> > > > > what
> > >>>>>> > > > > > > effect(s) to expect from using this approach.
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > Alex Baranau
> > >>>>>> > > > > > > ----
> > >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> > Nutch
> > >>>>>> -
> > >>>>>> > Hadoop
> > >>>>>> > > -
> > >>>>>> > > > > > HBase
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> > >>>>>> yuzhihong@gmail.com>
> > >>>>>> > > wrote:
> > >>>>>> > > > > > >
> > >>>>>> > > > > > > > Interesting project, Alex.
> > >>>>>> > > > > > > > Since there're bucketsCount scanners compared to one
> > >>>>>> scanner
> > >>>>>> > > > > > originally,
> > >>>>>> > > > > > > > have you performed load testing to see the impact ?
> > >>>>>> > > > > > > >
> > >>>>>> > > > > > > > Thanks
> > >>>>>> > > > > > > >
> > >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > >>>>>> > > > > > alex.baranov.v@gmail.com
> > >>>>>> > > > > > > > >wrote:
> > >>>>>> > > > > > > >
> > >>>>>> > > > > > > > > Hello guys,
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > I'd like to introduce a new small java project/lib
> > >>>>>> around
> > >>>>>> > > HBase:
> > >>>>>> > > > > > > HBaseWD.
> > >>>>>> > > > > > > > > It
> > >>>>>> > > > > > > > > is aimed to help with distribution of the load
> > (across
> > >>>>>> > > > > regionservers)
> > >>>>>> > > > > > > > when
> > >>>>>> > > > > > > > > writing sequential (becasue of the row key nature)
> > >>>>>> records.
> > >>>>>> > It
> > >>>>>> > > > > > > implements
> > >>>>>> > > > > > > > > the solution which was discussed several times on
> > this
> > >>>>>> > mailing
> > >>>>>> > > > list
> > >>>>>> > > > > > > (e.g.
> > >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Please find the sources at
> > >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> > >>>>>> > > > > > > > > also
> > >>>>>> > > > > > > > > a jar of current version for convenience). It is
> > very
> > >>>>>> easy to
> > >>>>>> > > > make
> > >>>>>> > > > > > use
> > >>>>>> > > > > > > of
> > >>>>>> > > > > > > > > it: e.g. I added it to one existing project with
> 1+2
> > >>>>>> lines of
> > >>>>>> > > > code
> > >>>>>> > > > > > (one
> > >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
> > MapReduce
> > >>>>>> job).
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Any feedback is highly appreciated!
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Please find below the short intro to the lib [1].
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Alex Baranau
> > >>>>>> > > > > > > > > ----
> > >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene
> -
> > >>>>>> Nutch -
> > >>>>>> > > > Hadoop
> > >>>>>> > > > > -
> > >>>>>> > > > > > > > HBase
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > [1]
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Description:
> > >>>>>> > > > > > > > > ------------
> > >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential)
> Writes.
> > >>>>>> It was
> > >>>>>> > > > > inspired
> > >>>>>> > > > > > by
> > >>>>>> > > > > > > > > discussions on HBase mailing lists around the
> > problem
> > >>>>>> of
> > >>>>>> > > choosing
> > >>>>>> > > > > > > > between:
> > >>>>>> > > > > > > > > * writing records with sequential row keys (e.g.
> > >>>>>> time-series
> > >>>>>> > > data
> > >>>>>> > > > > > with
> > >>>>>> > > > > > > > row
> > >>>>>> > > > > > > > > key
> > >>>>>> > > > > > > > >  built based on ts)
> > >>>>>> > > > > > > > > * using random unique IDs for records
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > First approach makes possible to perform fast
> range
> > >>>>>> scans
> > >>>>>> > with
> > >>>>>> > > > help
> > >>>>>> > > > > > of
> > >>>>>> > > > > > > > > setting
> > >>>>>> > > > > > > > > start/stop keys on Scanner, but creates single
> > region
> > >>>>>> server
> > >>>>>> > > > > > > hot-spotting
> > >>>>>> > > > > > > > > problem upon writing data (as row keys go in
> > sequence
> > >>>>>> all
> > >>>>>> > > records
> > >>>>>> > > > > end
> > >>>>>> > > > > > > up
> > >>>>>> > > > > > > > > written into a single region at a time).
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Second approach aims for fastest writing
> performance
> > >>>>>> by
> > >>>>>> > > > > distributing
> > >>>>>> > > > > > > new
> > >>>>>> > > > > > > > > records over random regions but makes not possible
> > >>>>>> doing fast
> > >>>>>> > > > range
> > >>>>>> > > > > > > scans
> > >>>>>> > > > > > > > > against written data.
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > The suggested approach stays in the middle of the
> > two
> > >>>>>> above
> > >>>>>> > and
> > >>>>>> > > > > > proved
> > >>>>>> > > > > > > to
> > >>>>>> > > > > > > > > perform well by distributing records over the
> > cluster
> > >>>>>> during
> > >>>>>> > > > > writing
> > >>>>>> > > > > > > data
> > >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD
> provides
> > >>>>>> very
> > >>>>>> > > simple
> > >>>>>> > > > > API
> > >>>>>> > > > > > to
> > >>>>>> > > > > > > > > work with which makes it perfect to use with
> > existing
> > >>>>>> code.
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info as
> > they
> > >>>>>> aimed
> > >>>>>> > to
> > >>>>>> > > > act
> > >>>>>> > > > > as
> > >>>>>> > > > > > > > > example.
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Brief Usage Info (Examples):
> > >>>>>> > > > > > > > > ----------------------------
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Distributing records with sequential keys which
> are
> > >>>>>> being
> > >>>>>> > > written
> > >>>>>> > > > > in
> > >>>>>> > > > > > up
> > >>>>>> > > > > > > > to
> > >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; // distributing
> > into
> > >>>>>> 32
> > >>>>>> > > buckets
> > >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> > >>>>>> > > > > > > > >                           new
> > >>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> > >>>>>> > > > > > > > >      Put put = new
> > >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
> > >>>>>> > > > > > > > >      ... // add values
> > >>>>>> > > > > > > > >      hTable.put(put);
> > >>>>>> > > > > > > > >    }
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Performing a range scan over written data
> > (internally
> > >>>>>> > > > > <bucketsCount>
> > >>>>>> > > > > > > > > scanners
> > >>>>>> > > > > > > > > executed):
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >>>>>> > > > > > > > >    ResultScanner rs =
> > >>>>>> DistributedScanner.create(hTable, scan,
> > >>>>>> > > > > > > > > keyDistributor);
> > >>>>>> > > > > > > > >    for (Result current : rs) {
> > >>>>>> > > > > > > > >      ...
> > >>>>>> > > > > > > > >    }
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Performing mapreduce job over written data chunk
> > >>>>>> specified by
> > >>>>>> > > > Scan:
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    Configuration conf =
> HBaseConfiguration.create();
> > >>>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    TableMapReduceUtil.initTableMapperJob("table",
> > >>>>>> scan,
> > >>>>>> > > > > > > > >      RowCounterMapper.class,
> > >>>>>> ImmutableBytesWritable.class,
> > >>>>>> > > > > > > Result.class,
> > >>>>>> > > > > > > > > job);
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    // Substituting standard TableInputFormat which
> > was
> > >>>>>> set in
> > >>>>>> > > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > >>>>>> > > > > > > > >
> >  job.setInputFormatClass(WdTableInputFormat.class);
> > >>>>>> > > > > > > > >    keyDistributor.addInfo(job.getConfiguration());
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> > >>>>>> > > > > > > > > -----------------------------------------
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to support
> > >>>>>> custom row
> > >>>>>> > > key
> > >>>>>> > > > > > > > > distribution
> > >>>>>> > > > > > > > > approaches. To define custom row key distributing
> > >>>>>> logic just
> > >>>>>> > > > > > implement
> > >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which is
> > >>>>>> really very
> > >>>>>> > > > > simple:
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > > >    public abstract class AbstractRowKeyDistributor
> > >>>>>> implements
> > >>>>>> > > > > > > > > Parametrizable {
> > >>>>>> > > > > > > > >      public abstract byte[]
> getDistributedKey(byte[]
> > >>>>>> > > > originalKey);
> > >>>>>> > > > > > > > >      public abstract byte[] getOriginalKey(byte[]
> > >>>>>> > adjustedKey);
> > >>>>>> > > > > > > > >      public abstract byte[][]
> > >>>>>> getAllDistributedKeys(byte[]
> > >>>>>> > > > > > > originalKey);
> > >>>>>> > > > > > > > >      ... // some utility methods
> > >>>>>> > > > > > > > >    }
> > >>>>>> > > > > > > > >
> > >>>>>> > > > > > > >
> > >>>>>> > > > > > >
> > >>>>>> > > > > >
> > >>>>>> > > > >
> > >>>>>> > > >
> > >>>>>> > >
> > >>>>>> >
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

What's the status on this package? Is it mature enough?
 I am using it in my project, tried out the write method yesterday and going
to incorporate into read method tomorrow.

On Wed, May 11, 2011 at 3:41 PM, Alex Baranau <al...@gmail.com>wrote:

> > The start/end rows may be written twice.
>
> Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in
> attribute value no matter how long are they (keys), since we already OK
> with
> transferring them initially (i.e. we should be OK with transferring 2x
> times
> more).
>
> So, what about the suggestion of sourceScan attribute value I mentioned? If
> you can tell why it isn't sufficient in your case, I'd have more info to
> think about better suggestion ;)
>
> > It is Okay to keep all versions of your patch in the JIRA.
> > Maybe the second should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
>
> np. Can do that. Just thought that they (patches) can be sorted by date to
> find out the final one (aka "convention over naming-rules").
>
> Alex.
>
> On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > >> Though it might be ok, since we anyways "transfer" start/stop rows
> with
> > Scan object.
> > In write() method, we now have:
> >     Bytes.writeByteArray(out, this.startRow);
> >     Bytes.writeByteArray(out, this.stopRow);
> > ...
> >       for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) {
> >         WritableUtils.writeString(out, attr.getKey());
> >         Bytes.writeByteArray(out, attr.getValue());
> >       }
> > The start/end rows may be written twice.
> >
> > Of course, you have full control over how to generate the unique ID for
> > "sourceScan" attribute.
> >
> > It is Okay to keep all versions of your patch in the JIRA. Maybe the
> second
> > should be named HBASE-3811-v2.patch<
> https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch
> >?
> >
> > Thanks
> >
> >
> > On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
> >
> >> > Can you remove the first version ?
> >> Isn't it ok to keep it in JIRA issue?
> >>
> >>
> >> > In HBaseWD, can you use reflection to detect whether Scan supports
> >> setAttribute() ?
> >> > If it does, can you encode start row and end row as "sourceScan"
> >> attribute ?
> >>
> >> Yeah, smth like this is going to be implemented. Though I'd still want
> to
> >> hear from the devs the story about Scan version.
> >>
> >>
> >> > One consideration is that start row or end row may be quite long.
> >>
> >> Yeah, that is was my though too at first. Though it might be ok, since
> we
> >> anyways "transfer" start/stop rows with Scan object.
> >>
> >> > What do you think ?
> >>
> >> I'd love to hear from you is this variant I mentioned is what we are
> >> looking at here:
> >>
> >>
> >> > From what I understand, you want to distinguish scans fired by the
> same
> >> distributed scan.
> >> > I.e. group scans which were fired by single distributed scan. If
> that's
> >> what you want, distributed
> >> > scan can generate unique ID and set, say "sourceScan" attribute to its
> >> value. This way we'll
> >> > have <# of distinct "sourceScan" attribute values> = <number of
> >> distributed scans invoked by
> >> > client side> and two scans on server side will have the same
> >> "sourceScan" attribute iff they
> >> > "belong" to same distributed scan.
> >>
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >> HBase
> >>
> >> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Alex:
> >>> Your second patch looks good.
> >>> Can you remove the first version ?
> >>>
> >>> In HBaseWD, can you use reflection to detect whether Scan supports
> >>> setAttribute() ?
> >>> If it does, can you encode start row and end row as "sourceScan"
> >>> attribute ?
> >>>
> >>> One consideration is that start row or end row may be quite long.
> >>> Ideally we should store hash code of source Scan object as "sourceScan"
> >>> attribute. But Scan doesn't implement hashCode(). We can add it, that
> would
> >>> require running all Scan related tests.
> >>>
> >>> What do you think ?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <
> alex.baranov.v@gmail.com>wrote:
> >>>
> >>>> Sorry for the delay in response (public holidays here).
> >>>>
> >>>> This depends on what info you are looking for on server side.
> >>>>
> >>>> From what I understand, you want to distinguish scans fired by the
> same
> >>>> distributed scan. I.e. group scans which were fired by single
> distributed
> >>>> scan. If that's what you want, distributed scan can generate unique ID
> and
> >>>> set, say "sourceScan" attribute to its value. This way we'll have <#
> of
> >>>> distinct "sourceScan" attribute values> = <number of distributed scans
> >>>> invoked by client side> and two scans on server side will have the
> same
> >>>> "sourceScan" attribute iff they "belong" to same distributed scan.
> >>>>
> >>>> Is this what are you looking for?
> >>>>
> >>>> Alex Baranau
> >>>>
> >>>> P.S. attached patch for HBASE-3811<
> https://issues.apache.org/jira/browse/HBASE-3811>
> >>>> .
> >>>> P.S-2. should this conversation be moved to dev list?
> >>>>
> >>>> ----
> >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> >>>> HBase
> >>>>
> >>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>>> Alex:
> >>>>> What type of identification should we put in the map of the Scan
> object
> >>>>> ?
> >>>>> I am thinking of using the Id of RowKeyDistributor. But the user can
> >>>>> use same distributor on multiple scans.
> >>>>>
> >>>>> Please share your thought.
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
> >>>>> alex.baranov.v@gmail.com> wrote:
> >>>>>
> >>>>>> https://issues.apache.org/jira/browse/HBASE-3811
> >>>>>>
> >>>>>> Alex Baranau
> >>>>>> ----
> >>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> >>>>>> HBase
> >>>>>>
> >>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> > My plan was to make regions that have active scanners more stable
> -
> >>>>>> trying
> >>>>>> > not to move them when balancing.
> >>>>>> > I prefer second approach - adding custom attribute(s) to Scan so
> >>>>>> that the
> >>>>>> > Scans created by the method below can be 'grouped'.
> >>>>>> >
> >>>>>> > If you can file a JIRA, that would be great.
> >>>>>> >
> >>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
> >>>>>> alex.baranov.v@gmail.com
> >>>>>> > >wrote:
> >>>>>> >
> >>>>>> > > Aha, so you want to "count" it as single scan (or just
> >>>>>> differently) when
> >>>>>> > > determining the load?
> >>>>>> > >
> >>>>>> > > The current code looks like this:
> >>>>>> > >
> >>>>>> > > class DistributedScanner:
> >>>>>> > >  public static DistributedScanner create(HTable hTable, Scan
> >>>>>> original,
> >>>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException {
> >>>>>> > >    byte[][] startKeys =
> >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStartRow());
> >>>>>> > >    byte[][] stopKeys =
> >>>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow());
> >>>>>> > >    Scan[] scans = new Scan[startKeys.length];
> >>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
> >>>>>> > >      scans[i] = new Scan(original);
> >>>>>> > >      scans[i].setStartRow(startKeys[i]);
> >>>>>> > >      scans[i].setStopRow(stopKeys[i]);
> >>>>>> > >    }
> >>>>>> > >
> >>>>>> > >    ResultScanner[] rss = new ResultScanner[startKeys.length];
> >>>>>> > >    for (byte i = 0; i < scans.length; i++) {
> >>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
> >>>>>> > >    }
> >>>>>> > >
> >>>>>> > >    return new DistributedScanner(rss);
> >>>>>> > >  }
> >>>>>> > >
> >>>>>> > > This is client code. To make these scans "identifiable" we need
> to
> >>>>>> either
> >>>>>> > > use some different (derived from Scan) class or add some
> attribute
> >>>>>> to
> >>>>>> > them.
> >>>>>> > > There's no API for doing the latter. But we can do the former,
> but
> >>>>>> I
> >>>>>> > don't
> >>>>>> > > really like the idea of creating extra class (with no extra
> >>>>>> > functionality)
> >>>>>> > > just to distinguish it from the base one.
> >>>>>> > >
> >>>>>> > > If you can share why/how do you want to treat them differently
> on
> >>>>>> server
> >>>>>> > > side, that would be helpful.
> >>>>>> > >
> >>>>>> > > Alex Baranau
> >>>>>> > > ----
> >>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >>>>>> Hadoop -
> >>>>>> > HBase
> >>>>>> > >
> >>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com>
> >>>>>> wrote:
> >>>>>> > >
> >>>>>> > > > My request would be to make the distributed scan identifiable
> >>>>>> from
> >>>>>> > server
> >>>>>> > > > side.
> >>>>>> > > > :-)
> >>>>>> > > >
> >>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> >>>>>> > alex.baranov.v@gmail.com
> >>>>>> > > > >wrote:
> >>>>>> > > >
> >>>>>> > > > > > Basically bucketsCount may not equal number of regions for
> >>>>>> the
> >>>>>> > > > underlying
> >>>>>> > > > > > table.
> >>>>>> > > > >
> >>>>>> > > > > True: e.g. when there's only one region that holds data for
> >>>>>> the whole
> >>>>>> > > > table
> >>>>>> > > > > (not many records in table yet), distributed scan will fire
> N
> >>>>>> scans
> >>>>>> > > > against
> >>>>>> > > > > the same region.
> >>>>>> > > > > On the other hand, in case there are huge number of regions
> >>>>>> for
> >>>>>> > single
> >>>>>> > > > > table, each scan can span over multiple regions.
> >>>>>> > > > >
> >>>>>> > > > > > I need to deal with normal scan and "distributed scan" at
> >>>>>> server
> >>>>>> > > side.
> >>>>>> > > > >
> >>>>>> > > > > With current implementation "distributed" scan won't be
> >>>>>> recognized as
> >>>>>> > > > > something special on the server side. It will be an ordinary
> >>>>>> scan.
> >>>>>> > > Though
> >>>>>> > > > > the number of scan will increase, given that the typical
> >>>>>> situation is
> >>>>>> > > > "many
> >>>>>> > > > > regions for single table", the scans of the same
> "distributed
> >>>>>> scan"
> >>>>>> > are
> >>>>>> > > > > likely not to hit the same region.
> >>>>>> > > > >
> >>>>>> > > > > Not sure if I answered your questions here. Feel free to ask
> >>>>>> more ;)
> >>>>>> > > > >
> >>>>>> > > > > Alex Baranau
> >>>>>> > > > > ----
> >>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> >>>>>> Hadoop -
> >>>>>> > > > HBase
> >>>>>> > > > >
> >>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <
> yuzhihong@gmail.com>
> >>>>>> wrote:
> >>>>>> > > > >
> >>>>>> > > > > > Alex:
> >>>>>> > > > > > If you read this, you would know why I asked:
> >>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
> >>>>>> > > > > >
> >>>>>> > > > > > I need to deal with normal scan and "distributed scan" at
> >>>>>> server
> >>>>>> > > side.
> >>>>>> > > > > > Basically bucketsCount may not equal number of regions for
> >>>>>> the
> >>>>>> > > > underlying
> >>>>>> > > > > > table.
> >>>>>> > > > > >
> >>>>>> > > > > > Cheers
> >>>>>> > > > > >
> >>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> >>>>>> > > > alex.baranov.v@gmail.com
> >>>>>> > > > > > >wrote:
> >>>>>> > > > > >
> >>>>>> > > > > > > Hi Ted,
> >>>>>> > > > > > >
> >>>>>> > > > > > > We currently use this tool in the scenario where data is
> >>>>>> consumed
> >>>>>> > > by
> >>>>>> > > > > > > MapReduce jobs, so we haven't tested the performance of
> >>>>>> pure
> >>>>>> > > > > "distributed
> >>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to
> be
> >>>>>> close
> >>>>>> > to
> >>>>>> > > > > > simple
> >>>>>> > > > > > > scan performance, or may be sometimes even faster
> >>>>>> depending on
> >>>>>> > your
> >>>>>> > > > > data
> >>>>>> > > > > > > access patterns. E.g. in case you write timeseries data
> >>>>>> > > (sequential)
> >>>>>> > > > > > which
> >>>>>> > > > > > > is written into the single region at a time, then e.g.
> if
> >>>>>> you
> >>>>>> > > access
> >>>>>> > > > > > delta
> >>>>>> > > > > > > for further processing/analysis (esp. if from not single
> >>>>>> client)
> >>>>>> > > > these
> >>>>>> > > > > > > scans
> >>>>>> > > > > > > are likely to hit the same region or couple of regions
> at
> >>>>>> a time,
> >>>>>> > > > which
> >>>>>> > > > > > may
> >>>>>> > > > > > > perform worse comparing to many scans hitting data that
> is
> >>>>>> much
> >>>>>> > > > better
> >>>>>> > > > > > > spread over region servers.
> >>>>>> > > > > > >
> >>>>>> > > > > > > As for map-reduce job the approach should not affect
> >>>>>> reading
> >>>>>> > > > > performance
> >>>>>> > > > > > at
> >>>>>> > > > > > > all: it's just that there are bucketsCount times more
> >>>>>> splits and
> >>>>>> > > > hence
> >>>>>> > > > > > > bucketsCount times more Map tasks. In many cases this
> even
> >>>>>> > improves
> >>>>>> > > > > > overall
> >>>>>> > > > > > > performance of the MR job since work is better
> distributed
> >>>>>> over
> >>>>>> > > > cluster
> >>>>>> > > > > > > (esp. in situation when the aim is to constantly process
> >>>>>> the
> >>>>>> > coming
> >>>>>> > > > > delta
> >>>>>> > > > > > > which usually resides in one or just couple of regions
> >>>>>> depending
> >>>>>> > on
> >>>>>> > > > > > > processing frequency).
> >>>>>> > > > > > >
> >>>>>> > > > > > > If you can share details on your case, that will help to
> >>>>>> > understand
> >>>>>> > > > > what
> >>>>>> > > > > > > effect(s) to expect from using this approach.
> >>>>>> > > > > > >
> >>>>>> > > > > > > Alex Baranau
> >>>>>> > > > > > > ----
> >>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> Nutch
> >>>>>> -
> >>>>>> > Hadoop
> >>>>>> > > -
> >>>>>> > > > > > HBase
> >>>>>> > > > > > >
> >>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
> >>>>>> yuzhihong@gmail.com>
> >>>>>> > > wrote:
> >>>>>> > > > > > >
> >>>>>> > > > > > > > Interesting project, Alex.
> >>>>>> > > > > > > > Since there're bucketsCount scanners compared to one
> >>>>>> scanner
> >>>>>> > > > > > originally,
> >>>>>> > > > > > > > have you performed load testing to see the impact ?
> >>>>>> > > > > > > >
> >>>>>> > > > > > > > Thanks
> >>>>>> > > > > > > >
> >>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> >>>>>> > > > > > alex.baranov.v@gmail.com
> >>>>>> > > > > > > > >wrote:
> >>>>>> > > > > > > >
> >>>>>> > > > > > > > > Hello guys,
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > I'd like to introduce a new small java project/lib
> >>>>>> around
> >>>>>> > > HBase:
> >>>>>> > > > > > > HBaseWD.
> >>>>>> > > > > > > > > It
> >>>>>> > > > > > > > > is aimed to help with distribution of the load
> (across
> >>>>>> > > > > regionservers)
> >>>>>> > > > > > > > when
> >>>>>> > > > > > > > > writing sequential (becasue of the row key nature)
> >>>>>> records.
> >>>>>> > It
> >>>>>> > > > > > > implements
> >>>>>> > > > > > > > > the solution which was discussed several times on
> this
> >>>>>> > mailing
> >>>>>> > > > list
> >>>>>> > > > > > > (e.g.
> >>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Please find the sources at
> >>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
> >>>>>> > > > > > > > > also
> >>>>>> > > > > > > > > a jar of current version for convenience). It is
> very
> >>>>>> easy to
> >>>>>> > > > make
> >>>>>> > > > > > use
> >>>>>> > > > > > > of
> >>>>>> > > > > > > > > it: e.g. I added it to one existing project with 1+2
> >>>>>> lines of
> >>>>>> > > > code
> >>>>>> > > > > > (one
> >>>>>> > > > > > > > > where I write to HBase and 2 for configuring
> MapReduce
> >>>>>> job).
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Any feedback is highly appreciated!
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Please find below the short intro to the lib [1].
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Alex Baranau
> >>>>>> > > > > > > > > ----
> >>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
> >>>>>> Nutch -
> >>>>>> > > > Hadoop
> >>>>>> > > > > -
> >>>>>> > > > > > > > HBase
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > [1]
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Description:
> >>>>>> > > > > > > > > ------------
> >>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential) Writes.
> >>>>>> It was
> >>>>>> > > > > inspired
> >>>>>> > > > > > by
> >>>>>> > > > > > > > > discussions on HBase mailing lists around the
> problem
> >>>>>> of
> >>>>>> > > choosing
> >>>>>> > > > > > > > between:
> >>>>>> > > > > > > > > * writing records with sequential row keys (e.g.
> >>>>>> time-series
> >>>>>> > > data
> >>>>>> > > > > > with
> >>>>>> > > > > > > > row
> >>>>>> > > > > > > > > key
> >>>>>> > > > > > > > >  built based on ts)
> >>>>>> > > > > > > > > * using random unique IDs for records
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > First approach makes possible to perform fast range
> >>>>>> scans
> >>>>>> > with
> >>>>>> > > > help
> >>>>>> > > > > > of
> >>>>>> > > > > > > > > setting
> >>>>>> > > > > > > > > start/stop keys on Scanner, but creates single
> region
> >>>>>> server
> >>>>>> > > > > > > hot-spotting
> >>>>>> > > > > > > > > problem upon writing data (as row keys go in
> sequence
> >>>>>> all
> >>>>>> > > records
> >>>>>> > > > > end
> >>>>>> > > > > > > up
> >>>>>> > > > > > > > > written into a single region at a time).
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Second approach aims for fastest writing performance
> >>>>>> by
> >>>>>> > > > > distributing
> >>>>>> > > > > > > new
> >>>>>> > > > > > > > > records over random regions but makes not possible
> >>>>>> doing fast
> >>>>>> > > > range
> >>>>>> > > > > > > scans
> >>>>>> > > > > > > > > against written data.
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > The suggested approach stays in the middle of the
> two
> >>>>>> above
> >>>>>> > and
> >>>>>> > > > > > proved
> >>>>>> > > > > > > to
> >>>>>> > > > > > > > > perform well by distributing records over the
> cluster
> >>>>>> during
> >>>>>> > > > > writing
> >>>>>> > > > > > > data
> >>>>>> > > > > > > > > while allowing range scans over it. HBaseWD provides
> >>>>>> very
> >>>>>> > > simple
> >>>>>> > > > > API
> >>>>>> > > > > > to
> >>>>>> > > > > > > > > work with which makes it perfect to use with
> existing
> >>>>>> code.
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Please refer to unit-tests for lib usage info as
> they
> >>>>>> aimed
> >>>>>> > to
> >>>>>> > > > act
> >>>>>> > > > > as
> >>>>>> > > > > > > > > example.
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Brief Usage Info (Examples):
> >>>>>> > > > > > > > > ----------------------------
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Distributing records with sequential keys which are
> >>>>>> being
> >>>>>> > > written
> >>>>>> > > > > in
> >>>>>> > > > > > up
> >>>>>> > > > > > > > to
> >>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; // distributing
> into
> >>>>>> 32
> >>>>>> > > buckets
> >>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
> >>>>>> > > > > > > > >                           new
> >>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> >>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
> >>>>>> > > > > > > > >      Put put = new
> >>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
> >>>>>> > > > > > > > >      ... // add values
> >>>>>> > > > > > > > >      hTable.put(put);
> >>>>>> > > > > > > > >    }
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Performing a range scan over written data
> (internally
> >>>>>> > > > > <bucketsCount>
> >>>>>> > > > > > > > > scanners
> >>>>>> > > > > > > > > executed):
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >>>>>> > > > > > > > >    ResultScanner rs =
> >>>>>> DistributedScanner.create(hTable, scan,
> >>>>>> > > > > > > > > keyDistributor);
> >>>>>> > > > > > > > >    for (Result current : rs) {
> >>>>>> > > > > > > > >      ...
> >>>>>> > > > > > > > >    }
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Performing mapreduce job over written data chunk
> >>>>>> specified by
> >>>>>> > > > Scan:
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    Configuration conf = HBaseConfiguration.create();
> >>>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    TableMapReduceUtil.initTableMapperJob("table",
> >>>>>> scan,
> >>>>>> > > > > > > > >      RowCounterMapper.class,
> >>>>>> ImmutableBytesWritable.class,
> >>>>>> > > > > > > Result.class,
> >>>>>> > > > > > > > > job);
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    // Substituting standard TableInputFormat which
> was
> >>>>>> set in
> >>>>>> > > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> >>>>>> > > > > > > > >
>  job.setInputFormatClass(WdTableInputFormat.class);
> >>>>>> > > > > > > > >    keyDistributor.addInfo(job.getConfiguration());
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
> >>>>>> > > > > > > > > -----------------------------------------
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > > HBaseWD is designed to be flexible and to support
> >>>>>> custom row
> >>>>>> > > key
> >>>>>> > > > > > > > > distribution
> >>>>>> > > > > > > > > approaches. To define custom row key distributing
> >>>>>> logic just
> >>>>>> > > > > > implement
> >>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which is
> >>>>>> really very
> >>>>>> > > > > simple:
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > > >    public abstract class AbstractRowKeyDistributor
> >>>>>> implements
> >>>>>> > > > > > > > > Parametrizable {
> >>>>>> > > > > > > > >      public abstract byte[] getDistributedKey(byte[]
> >>>>>> > > > originalKey);
> >>>>>> > > > > > > > >      public abstract byte[] getOriginalKey(byte[]
> >>>>>> > adjustedKey);
> >>>>>> > > > > > > > >      public abstract byte[][]
> >>>>>> getAllDistributedKeys(byte[]
> >>>>>> > > > > > > originalKey);
> >>>>>> > > > > > > > >      ... // some utility methods
> >>>>>> > > > > > > > >    }
> >>>>>> > > > > > > > >
> >>>>>> > > > > > > >
> >>>>>> > > > > > >
> >>>>>> > > > > >
> >>>>>> > > > >
> >>>>>> > > >
> >>>>>> > >
> >>>>>> >
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

> The start/end rows may be written twice.

Yeah, I know. I meant that size of startRow+stopRow data is "bearable" in
attribute value no matter how long are they (keys), since we already OK with
transferring them initially (i.e. we should be OK with transferring 2x times
more).

So, what about the suggestion of sourceScan attribute value I mentioned? If
you can tell why it isn't sufficient in your case, I'd have more info to
think about better suggestion ;)

> It is Okay to keep all versions of your patch in the JIRA.
> Maybe the second should be named HBASE-3811-v2.patch<https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch>?

np. Can do that. Just thought that they (patches) can be sorted by date to
find out the final one (aka "convention over naming-rules").

Alex.

On Wed, May 11, 2011 at 11:13 PM, Ted Yu <yu...@gmail.com> wrote:

> >> Though it might be ok, since we anyways "transfer" start/stop rows with
> Scan object.
> In write() method, we now have:
>     Bytes.writeByteArray(out, this.startRow);
>     Bytes.writeByteArray(out, this.stopRow);
> ...
>       for (Map.Entry<String, byte[]> attr : this.attributes.entrySet()) {
>         WritableUtils.writeString(out, attr.getKey());
>         Bytes.writeByteArray(out, attr.getValue());
>       }
> The start/end rows may be written twice.
>
> Of course, you have full control over how to generate the unique ID for
> "sourceScan" attribute.
>
> It is Okay to keep all versions of your patch in the JIRA. Maybe the second
> should be named HBASE-3811-v2.patch<https://issues.apache.org/jira/secure/attachment/12478694/HBASE-3811.patch>?
>
> Thanks
>
>
> On Wed, May 11, 2011 at 1:01 PM, Alex Baranau <al...@gmail.com>wrote:
>
>> > Can you remove the first version ?
>> Isn't it ok to keep it in JIRA issue?
>>
>>
>> > In HBaseWD, can you use reflection to detect whether Scan supports
>> setAttribute() ?
>> > If it does, can you encode start row and end row as "sourceScan"
>> attribute ?
>>
>> Yeah, smth like this is going to be implemented. Though I'd still want to
>> hear from the devs the story about Scan version.
>>
>>
>> > One consideration is that start row or end row may be quite long.
>>
>> Yeah, that is was my though too at first. Though it might be ok, since we
>> anyways "transfer" start/stop rows with Scan object.
>>
>> > What do you think ?
>>
>> I'd love to hear from you is this variant I mentioned is what we are
>> looking at here:
>>
>>
>> > From what I understand, you want to distinguish scans fired by the same
>> distributed scan.
>> > I.e. group scans which were fired by single distributed scan. If that's
>> what you want, distributed
>> > scan can generate unique ID and set, say "sourceScan" attribute to its
>> value. This way we'll
>> > have <# of distinct "sourceScan" attribute values> = <number of
>> distributed scans invoked by
>> > client side> and two scans on server side will have the same
>> "sourceScan" attribute iff they
>> > "belong" to same distributed scan.
>>
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Alex:
>>> Your second patch looks good.
>>> Can you remove the first version ?
>>>
>>> In HBaseWD, can you use reflection to detect whether Scan supports
>>> setAttribute() ?
>>> If it does, can you encode start row and end row as "sourceScan"
>>> attribute ?
>>>
>>> One consideration is that start row or end row may be quite long.
>>> Ideally we should store hash code of source Scan object as "sourceScan"
>>> attribute. But Scan doesn't implement hashCode(). We can add it, that would
>>> require running all Scan related tests.
>>>
>>> What do you think ?
>>>
>>> Thanks
>>>
>>>
>>> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <al...@gmail.com>wrote:
>>>
>>>> Sorry for the delay in response (public holidays here).
>>>>
>>>> This depends on what info you are looking for on server side.
>>>>
>>>> From what I understand, you want to distinguish scans fired by the same
>>>> distributed scan. I.e. group scans which were fired by single distributed
>>>> scan. If that's what you want, distributed scan can generate unique ID and
>>>> set, say "sourceScan" attribute to its value. This way we'll have <# of
>>>> distinct "sourceScan" attribute values> = <number of distributed scans
>>>> invoked by client side> and two scans on server side will have the same
>>>> "sourceScan" attribute iff they "belong" to same distributed scan.
>>>>
>>>> Is this what are you looking for?
>>>>
>>>> Alex Baranau
>>>>
>>>> P.S. attached patch for HBASE-3811<https://issues.apache.org/jira/browse/HBASE-3811>
>>>> .
>>>> P.S-2. should this conversation be moved to dev list?
>>>>
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>>>> HBase
>>>>
>>>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Alex:
>>>>> What type of identification should we put in the map of the Scan object
>>>>> ?
>>>>> I am thinking of using the Id of RowKeyDistributor. But the user can
>>>>> use same distributor on multiple scans.
>>>>>
>>>>> Please share your thought.
>>>>>
>>>>>
>>>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <
>>>>> alex.baranov.v@gmail.com> wrote:
>>>>>
>>>>>> https://issues.apache.org/jira/browse/HBASE-3811
>>>>>>
>>>>>> Alex Baranau
>>>>>> ----
>>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>>>>>> HBase
>>>>>>
>>>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>> > My plan was to make regions that have active scanners more stable -
>>>>>> trying
>>>>>> > not to move them when balancing.
>>>>>> > I prefer second approach - adding custom attribute(s) to Scan so
>>>>>> that the
>>>>>> > Scans created by the method below can be 'grouped'.
>>>>>> >
>>>>>> > If you can file a JIRA, that would be great.
>>>>>> >
>>>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
>>>>>> alex.baranov.v@gmail.com
>>>>>> > >wrote:
>>>>>> >
>>>>>> > > Aha, so you want to "count" it as single scan (or just
>>>>>> differently) when
>>>>>> > > determining the load?
>>>>>> > >
>>>>>> > > The current code looks like this:
>>>>>> > >
>>>>>> > > class DistributedScanner:
>>>>>> > >  public static DistributedScanner create(HTable hTable, Scan
>>>>>> original,
>>>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException {
>>>>>> > >    byte[][] startKeys =
>>>>>> > > keyDistributor.getAllDistributedKeys(original.getStartRow());
>>>>>> > >    byte[][] stopKeys =
>>>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow());
>>>>>> > >    Scan[] scans = new Scan[startKeys.length];
>>>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
>>>>>> > >      scans[i] = new Scan(original);
>>>>>> > >      scans[i].setStartRow(startKeys[i]);
>>>>>> > >      scans[i].setStopRow(stopKeys[i]);
>>>>>> > >    }
>>>>>> > >
>>>>>> > >    ResultScanner[] rss = new ResultScanner[startKeys.length];
>>>>>> > >    for (byte i = 0; i < scans.length; i++) {
>>>>>> > >      rss[i] = hTable.getScanner(scans[i]);
>>>>>> > >    }
>>>>>> > >
>>>>>> > >    return new DistributedScanner(rss);
>>>>>> > >  }
>>>>>> > >
>>>>>> > > This is client code. To make these scans "identifiable" we need to
>>>>>> either
>>>>>> > > use some different (derived from Scan) class or add some attribute
>>>>>> to
>>>>>> > them.
>>>>>> > > There's no API for doing the latter. But we can do the former, but
>>>>>> I
>>>>>> > don't
>>>>>> > > really like the idea of creating extra class (with no extra
>>>>>> > functionality)
>>>>>> > > just to distinguish it from the base one.
>>>>>> > >
>>>>>> > > If you can share why/how do you want to treat them differently on
>>>>>> server
>>>>>> > > side, that would be helpful.
>>>>>> > >
>>>>>> > > Alex Baranau
>>>>>> > > ----
>>>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>>>>>> Hadoop -
>>>>>> > HBase
>>>>>> > >
>>>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > > My request would be to make the distributed scan identifiable
>>>>>> from
>>>>>> > server
>>>>>> > > > side.
>>>>>> > > > :-)
>>>>>> > > >
>>>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
>>>>>> > alex.baranov.v@gmail.com
>>>>>> > > > >wrote:
>>>>>> > > >
>>>>>> > > > > > Basically bucketsCount may not equal number of regions for
>>>>>> the
>>>>>> > > > underlying
>>>>>> > > > > > table.
>>>>>> > > > >
>>>>>> > > > > True: e.g. when there's only one region that holds data for
>>>>>> the whole
>>>>>> > > > table
>>>>>> > > > > (not many records in table yet), distributed scan will fire N
>>>>>> scans
>>>>>> > > > against
>>>>>> > > > > the same region.
>>>>>> > > > > On the other hand, in case there are huge number of regions
>>>>>> for
>>>>>> > single
>>>>>> > > > > table, each scan can span over multiple regions.
>>>>>> > > > >
>>>>>> > > > > > I need to deal with normal scan and "distributed scan" at
>>>>>> server
>>>>>> > > side.
>>>>>> > > > >
>>>>>> > > > > With current implementation "distributed" scan won't be
>>>>>> recognized as
>>>>>> > > > > something special on the server side. It will be an ordinary
>>>>>> scan.
>>>>>> > > Though
>>>>>> > > > > the number of scan will increase, given that the typical
>>>>>> situation is
>>>>>> > > > "many
>>>>>> > > > > regions for single table", the scans of the same "distributed
>>>>>> scan"
>>>>>> > are
>>>>>> > > > > likely not to hit the same region.
>>>>>> > > > >
>>>>>> > > > > Not sure if I answered your questions here. Feel free to ask
>>>>>> more ;)
>>>>>> > > > >
>>>>>> > > > > Alex Baranau
>>>>>> > > > > ----
>>>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>>>>>> Hadoop -
>>>>>> > > > HBase
>>>>>> > > > >
>>>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com>
>>>>>> wrote:
>>>>>> > > > >
>>>>>> > > > > > Alex:
>>>>>> > > > > > If you read this, you would know why I asked:
>>>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
>>>>>> > > > > >
>>>>>> > > > > > I need to deal with normal scan and "distributed scan" at
>>>>>> server
>>>>>> > > side.
>>>>>> > > > > > Basically bucketsCount may not equal number of regions for
>>>>>> the
>>>>>> > > > underlying
>>>>>> > > > > > table.
>>>>>> > > > > >
>>>>>> > > > > > Cheers
>>>>>> > > > > >
>>>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
>>>>>> > > > alex.baranov.v@gmail.com
>>>>>> > > > > > >wrote:
>>>>>> > > > > >
>>>>>> > > > > > > Hi Ted,
>>>>>> > > > > > >
>>>>>> > > > > > > We currently use this tool in the scenario where data is
>>>>>> consumed
>>>>>> > > by
>>>>>> > > > > > > MapReduce jobs, so we haven't tested the performance of
>>>>>> pure
>>>>>> > > > > "distributed
>>>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be
>>>>>> close
>>>>>> > to
>>>>>> > > > > > simple
>>>>>> > > > > > > scan performance, or may be sometimes even faster
>>>>>> depending on
>>>>>> > your
>>>>>> > > > > data
>>>>>> > > > > > > access patterns. E.g. in case you write timeseries data
>>>>>> > > (sequential)
>>>>>> > > > > > which
>>>>>> > > > > > > is written into the single region at a time, then e.g. if
>>>>>> you
>>>>>> > > access
>>>>>> > > > > > delta
>>>>>> > > > > > > for further processing/analysis (esp. if from not single
>>>>>> client)
>>>>>> > > > these
>>>>>> > > > > > > scans
>>>>>> > > > > > > are likely to hit the same region or couple of regions at
>>>>>> a time,
>>>>>> > > > which
>>>>>> > > > > > may
>>>>>> > > > > > > perform worse comparing to many scans hitting data that is
>>>>>> much
>>>>>> > > > better
>>>>>> > > > > > > spread over region servers.
>>>>>> > > > > > >
>>>>>> > > > > > > As for map-reduce job the approach should not affect
>>>>>> reading
>>>>>> > > > > performance
>>>>>> > > > > > at
>>>>>> > > > > > > all: it's just that there are bucketsCount times more
>>>>>> splits and
>>>>>> > > > hence
>>>>>> > > > > > > bucketsCount times more Map tasks. In many cases this even
>>>>>> > improves
>>>>>> > > > > > overall
>>>>>> > > > > > > performance of the MR job since work is better distributed
>>>>>> over
>>>>>> > > > cluster
>>>>>> > > > > > > (esp. in situation when the aim is to constantly process
>>>>>> the
>>>>>> > coming
>>>>>> > > > > delta
>>>>>> > > > > > > which usually resides in one or just couple of regions
>>>>>> depending
>>>>>> > on
>>>>>> > > > > > > processing frequency).
>>>>>> > > > > > >
>>>>>> > > > > > > If you can share details on your case, that will help to
>>>>>> > understand
>>>>>> > > > > what
>>>>>> > > > > > > effect(s) to expect from using this approach.
>>>>>> > > > > > >
>>>>>> > > > > > > Alex Baranau
>>>>>> > > > > > > ----
>>>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>>> -
>>>>>> > Hadoop
>>>>>> > > -
>>>>>> > > > > > HBase
>>>>>> > > > > > >
>>>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
>>>>>> yuzhihong@gmail.com>
>>>>>> > > wrote:
>>>>>> > > > > > >
>>>>>> > > > > > > > Interesting project, Alex.
>>>>>> > > > > > > > Since there're bucketsCount scanners compared to one
>>>>>> scanner
>>>>>> > > > > > originally,
>>>>>> > > > > > > > have you performed load testing to see the impact ?
>>>>>> > > > > > > >
>>>>>> > > > > > > > Thanks
>>>>>> > > > > > > >
>>>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
>>>>>> > > > > > alex.baranov.v@gmail.com
>>>>>> > > > > > > > >wrote:
>>>>>> > > > > > > >
>>>>>> > > > > > > > > Hello guys,
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > I'd like to introduce a new small java project/lib
>>>>>> around
>>>>>> > > HBase:
>>>>>> > > > > > > HBaseWD.
>>>>>> > > > > > > > > It
>>>>>> > > > > > > > > is aimed to help with distribution of the load (across
>>>>>> > > > > regionservers)
>>>>>> > > > > > > > when
>>>>>> > > > > > > > > writing sequential (becasue of the row key nature)
>>>>>> records.
>>>>>> > It
>>>>>> > > > > > > implements
>>>>>> > > > > > > > > the solution which was discussed several times on this
>>>>>> > mailing
>>>>>> > > > list
>>>>>> > > > > > > (e.g.
>>>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Please find the sources at
>>>>>> > > > > > https://github.com/sematext/HBaseWD(there's
>>>>>> > > > > > > > > also
>>>>>> > > > > > > > > a jar of current version for convenience). It is very
>>>>>> easy to
>>>>>> > > > make
>>>>>> > > > > > use
>>>>>> > > > > > > of
>>>>>> > > > > > > > > it: e.g. I added it to one existing project with 1+2
>>>>>> lines of
>>>>>> > > > code
>>>>>> > > > > > (one
>>>>>> > > > > > > > > where I write to HBase and 2 for configuring MapReduce
>>>>>> job).
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Any feedback is highly appreciated!
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Please find below the short intro to the lib [1].
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Alex Baranau
>>>>>> > > > > > > > > ----
>>>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>>>>>> Nutch -
>>>>>> > > > Hadoop
>>>>>> > > > > -
>>>>>> > > > > > > > HBase
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > [1]
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Description:
>>>>>> > > > > > > > > ------------
>>>>>> > > > > > > > > HBaseWD stands for Distributing (sequential) Writes.
>>>>>> It was
>>>>>> > > > > inspired
>>>>>> > > > > > by
>>>>>> > > > > > > > > discussions on HBase mailing lists around the problem
>>>>>> of
>>>>>> > > choosing
>>>>>> > > > > > > > between:
>>>>>> > > > > > > > > * writing records with sequential row keys (e.g.
>>>>>> time-series
>>>>>> > > data
>>>>>> > > > > > with
>>>>>> > > > > > > > row
>>>>>> > > > > > > > > key
>>>>>> > > > > > > > >  built based on ts)
>>>>>> > > > > > > > > * using random unique IDs for records
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > First approach makes possible to perform fast range
>>>>>> scans
>>>>>> > with
>>>>>> > > > help
>>>>>> > > > > > of
>>>>>> > > > > > > > > setting
>>>>>> > > > > > > > > start/stop keys on Scanner, but creates single region
>>>>>> server
>>>>>> > > > > > > hot-spotting
>>>>>> > > > > > > > > problem upon writing data (as row keys go in sequence
>>>>>> all
>>>>>> > > records
>>>>>> > > > > end
>>>>>> > > > > > > up
>>>>>> > > > > > > > > written into a single region at a time).
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Second approach aims for fastest writing performance
>>>>>> by
>>>>>> > > > > distributing
>>>>>> > > > > > > new
>>>>>> > > > > > > > > records over random regions but makes not possible
>>>>>> doing fast
>>>>>> > > > range
>>>>>> > > > > > > scans
>>>>>> > > > > > > > > against written data.
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > The suggested approach stays in the middle of the two
>>>>>> above
>>>>>> > and
>>>>>> > > > > > proved
>>>>>> > > > > > > to
>>>>>> > > > > > > > > perform well by distributing records over the cluster
>>>>>> during
>>>>>> > > > > writing
>>>>>> > > > > > > data
>>>>>> > > > > > > > > while allowing range scans over it. HBaseWD provides
>>>>>> very
>>>>>> > > simple
>>>>>> > > > > API
>>>>>> > > > > > to
>>>>>> > > > > > > > > work with which makes it perfect to use with existing
>>>>>> code.
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Please refer to unit-tests for lib usage info as they
>>>>>> aimed
>>>>>> > to
>>>>>> > > > act
>>>>>> > > > > as
>>>>>> > > > > > > > > example.
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Brief Usage Info (Examples):
>>>>>> > > > > > > > > ----------------------------
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Distributing records with sequential keys which are
>>>>>> being
>>>>>> > > written
>>>>>> > > > > in
>>>>>> > > > > > up
>>>>>> > > > > > > > to
>>>>>> > > > > > > > > Byte.MAX_VALUE buckets:
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    byte bucketsCount = (byte) 32; // distributing into
>>>>>> 32
>>>>>> > > buckets
>>>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
>>>>>> > > > > > > > >                           new
>>>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
>>>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
>>>>>> > > > > > > > >      Put put = new
>>>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
>>>>>> > > > > > > > >      ... // add values
>>>>>> > > > > > > > >      hTable.put(put);
>>>>>> > > > > > > > >    }
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Performing a range scan over written data (internally
>>>>>> > > > > <bucketsCount>
>>>>>> > > > > > > > > scanners
>>>>>> > > > > > > > > executed):
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>>>>>> > > > > > > > >    ResultScanner rs =
>>>>>> DistributedScanner.create(hTable, scan,
>>>>>> > > > > > > > > keyDistributor);
>>>>>> > > > > > > > >    for (Result current : rs) {
>>>>>> > > > > > > > >      ...
>>>>>> > > > > > > > >    }
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Performing mapreduce job over written data chunk
>>>>>> specified by
>>>>>> > > > Scan:
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    Configuration conf = HBaseConfiguration.create();
>>>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    TableMapReduceUtil.initTableMapperJob("table",
>>>>>> scan,
>>>>>> > > > > > > > >      RowCounterMapper.class,
>>>>>> ImmutableBytesWritable.class,
>>>>>> > > > > > > Result.class,
>>>>>> > > > > > > > > job);
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    // Substituting standard TableInputFormat which was
>>>>>> set in
>>>>>> > > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
>>>>>> > > > > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
>>>>>> > > > > > > > >    keyDistributor.addInfo(job.getConfiguration());
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
>>>>>> > > > > > > > > -----------------------------------------
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > HBaseWD is designed to be flexible and to support
>>>>>> custom row
>>>>>> > > key
>>>>>> > > > > > > > > distribution
>>>>>> > > > > > > > > approaches. To define custom row key distributing
>>>>>> logic just
>>>>>> > > > > > implement
>>>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which is
>>>>>> really very
>>>>>> > > > > simple:
>>>>>> > > > > > > > >
>>>>>> > > > > > > > >    public abstract class AbstractRowKeyDistributor
>>>>>> implements
>>>>>> > > > > > > > > Parametrizable {
>>>>>> > > > > > > > >      public abstract byte[] getDistributedKey(byte[]
>>>>>> > > > originalKey);
>>>>>> > > > > > > > >      public abstract byte[] getOriginalKey(byte[]
>>>>>> > adjustedKey);
>>>>>> > > > > > > > >      public abstract byte[][]
>>>>>> getAllDistributedKeys(byte[]
>>>>>> > > > > > > originalKey);
>>>>>> > > > > > > > >      ... // some utility methods
>>>>>> > > > > > > > >    }
>>>>>> > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > > > >
>>>>>> > > > > >
>>>>>> > > > >
>>>>>> > > >
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

> Can you remove the first version ?
Isn't it ok to keep it in JIRA issue?

> In HBaseWD, can you use reflection to detect whether Scan supports
setAttribute() ?
> If it does, can you encode start row and end row as "sourceScan" attribute
?

Yeah, smth like this is going to be implemented. Though I'd still want to
hear from the devs the story about Scan version.

> One consideration is that start row or end row may be quite long.

Yeah, that is was my though too at first. Though it might be ok, since we
anyways "transfer" start/stop rows with Scan object.

> What do you think ?

I'd love to hear from you is this variant I mentioned is what we are looking
at here:

> From what I understand, you want to distinguish scans fired by the same
distributed scan.
> I.e. group scans which were fired by single distributed scan. If that's
what you want, distributed
> scan can generate unique ID and set, say "sourceScan" attribute to its
value. This way we'll
> have <# of distinct "sourceScan" attribute values> = <number of
distributed scans invoked by
> client side> and two scans on server side will have the same "sourceScan"
attribute iff they
> "belong" to same distributed scan.


Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, May 11, 2011 at 5:15 PM, Ted Yu <yu...@gmail.com> wrote:

> Alex:
> Your second patch looks good.
> Can you remove the first version ?
>
> In HBaseWD, can you use reflection to detect whether Scan supports
> setAttribute() ?
> If it does, can you encode start row and end row as "sourceScan" attribute
> ?
>
> One consideration is that start row or end row may be quite long.
> Ideally we should store hash code of source Scan object as "sourceScan"
> attribute. But Scan doesn't implement hashCode(). We can add it, that would
> require running all Scan related tests.
>
> What do you think ?
>
> Thanks
>
>
> On Tue, May 10, 2011 at 5:46 AM, Alex Baranau <al...@gmail.com>wrote:
>
>> Sorry for the delay in response (public holidays here).
>>
>> This depends on what info you are looking for on server side.
>>
>> From what I understand, you want to distinguish scans fired by the same
>> distributed scan. I.e. group scans which were fired by single distributed
>> scan. If that's what you want, distributed scan can generate unique ID and
>> set, say "sourceScan" attribute to its value. This way we'll have <# of
>> distinct "sourceScan" attribute values> = <number of distributed scans
>> invoked by client side> and two scans on server side will have the same
>> "sourceScan" attribute iff they "belong" to same distributed scan.
>>
>> Is this what are you looking for?
>>
>> Alex Baranau
>>
>> P.S. attached patch for HBASE-3811<https://issues.apache.org/jira/browse/HBASE-3811>
>> .
>> P.S-2. should this conversation be moved to dev list?
>>
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Fri, May 6, 2011 at 12:06 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Alex:
>>> What type of identification should we put in the map of the Scan object ?
>>> I am thinking of using the Id of RowKeyDistributor. But the user can use
>>> same distributor on multiple scans.
>>>
>>> Please share your thought.
>>>
>>>
>>> On Thu, Apr 21, 2011 at 8:32 AM, Alex Baranau <al...@gmail.com>wrote:
>>>
>>>> https://issues.apache.org/jira/browse/HBASE-3811
>>>>
>>>> Alex Baranau
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>>>> HBase
>>>>
>>>> On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>> > My plan was to make regions that have active scanners more stable -
>>>> trying
>>>> > not to move them when balancing.
>>>> > I prefer second approach - adding custom attribute(s) to Scan so that
>>>> the
>>>> > Scans created by the method below can be 'grouped'.
>>>> >
>>>> > If you can file a JIRA, that would be great.
>>>> >
>>>> > On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <
>>>> alex.baranov.v@gmail.com
>>>> > >wrote:
>>>> >
>>>> > > Aha, so you want to "count" it as single scan (or just differently)
>>>> when
>>>> > > determining the load?
>>>> > >
>>>> > > The current code looks like this:
>>>> > >
>>>> > > class DistributedScanner:
>>>> > >  public static DistributedScanner create(HTable hTable, Scan
>>>> original,
>>>> > > AbstractRowKeyDistributor keyDistributor) throws IOException {
>>>> > >    byte[][] startKeys =
>>>> > > keyDistributor.getAllDistributedKeys(original.getStartRow());
>>>> > >    byte[][] stopKeys =
>>>> > > keyDistributor.getAllDistributedKeys(original.getStopRow());
>>>> > >    Scan[] scans = new Scan[startKeys.length];
>>>> > >    for (byte i = 0; i < startKeys.length; i++) {
>>>> > >      scans[i] = new Scan(original);
>>>> > >      scans[i].setStartRow(startKeys[i]);
>>>> > >      scans[i].setStopRow(stopKeys[i]);
>>>> > >    }
>>>> > >
>>>> > >    ResultScanner[] rss = new ResultScanner[startKeys.length];
>>>> > >    for (byte i = 0; i < scans.length; i++) {
>>>> > >      rss[i] = hTable.getScanner(scans[i]);
>>>> > >    }
>>>> > >
>>>> > >    return new DistributedScanner(rss);
>>>> > >  }
>>>> > >
>>>> > > This is client code. To make these scans "identifiable" we need to
>>>> either
>>>> > > use some different (derived from Scan) class or add some attribute
>>>> to
>>>> > them.
>>>> > > There's no API for doing the latter. But we can do the former, but I
>>>> > don't
>>>> > > really like the idea of creating extra class (with no extra
>>>> > functionality)
>>>> > > just to distinguish it from the base one.
>>>> > >
>>>> > > If you can share why/how do you want to treat them differently on
>>>> server
>>>> > > side, that would be helpful.
>>>> > >
>>>> > > Alex Baranau
>>>> > > ----
>>>> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
>>>> -
>>>> > HBase
>>>> > >
>>>> > > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com>
>>>> wrote:
>>>> > >
>>>> > > > My request would be to make the distributed scan identifiable from
>>>> > server
>>>> > > > side.
>>>> > > > :-)
>>>> > > >
>>>> > > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
>>>> > alex.baranov.v@gmail.com
>>>> > > > >wrote:
>>>> > > >
>>>> > > > > > Basically bucketsCount may not equal number of regions for the
>>>> > > > underlying
>>>> > > > > > table.
>>>> > > > >
>>>> > > > > True: e.g. when there's only one region that holds data for the
>>>> whole
>>>> > > > table
>>>> > > > > (not many records in table yet), distributed scan will fire N
>>>> scans
>>>> > > > against
>>>> > > > > the same region.
>>>> > > > > On the other hand, in case there are huge number of regions for
>>>> > single
>>>> > > > > table, each scan can span over multiple regions.
>>>> > > > >
>>>> > > > > > I need to deal with normal scan and "distributed scan" at
>>>> server
>>>> > > side.
>>>> > > > >
>>>> > > > > With current implementation "distributed" scan won't be
>>>> recognized as
>>>> > > > > something special on the server side. It will be an ordinary
>>>> scan.
>>>> > > Though
>>>> > > > > the number of scan will increase, given that the typical
>>>> situation is
>>>> > > > "many
>>>> > > > > regions for single table", the scans of the same "distributed
>>>> scan"
>>>> > are
>>>> > > > > likely not to hit the same region.
>>>> > > > >
>>>> > > > > Not sure if I answered your questions here. Feel free to ask
>>>> more ;)
>>>> > > > >
>>>> > > > > Alex Baranau
>>>> > > > > ----
>>>> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>>>> Hadoop -
>>>> > > > HBase
>>>> > > > >
>>>> > > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com>
>>>> wrote:
>>>> > > > >
>>>> > > > > > Alex:
>>>> > > > > > If you read this, you would know why I asked:
>>>> > > > > > https://issues.apache.org/jira/browse/HBASE-3679
>>>> > > > > >
>>>> > > > > > I need to deal with normal scan and "distributed scan" at
>>>> server
>>>> > > side.
>>>> > > > > > Basically bucketsCount may not equal number of regions for the
>>>> > > > underlying
>>>> > > > > > table.
>>>> > > > > >
>>>> > > > > > Cheers
>>>> > > > > >
>>>> > > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
>>>> > > > alex.baranov.v@gmail.com
>>>> > > > > > >wrote:
>>>> > > > > >
>>>> > > > > > > Hi Ted,
>>>> > > > > > >
>>>> > > > > > > We currently use this tool in the scenario where data is
>>>> consumed
>>>> > > by
>>>> > > > > > > MapReduce jobs, so we haven't tested the performance of pure
>>>> > > > > "distributed
>>>> > > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be
>>>> close
>>>> > to
>>>> > > > > > simple
>>>> > > > > > > scan performance, or may be sometimes even faster depending
>>>> on
>>>> > your
>>>> > > > > data
>>>> > > > > > > access patterns. E.g. in case you write timeseries data
>>>> > > (sequential)
>>>> > > > > > which
>>>> > > > > > > is written into the single region at a time, then e.g. if
>>>> you
>>>> > > access
>>>> > > > > > delta
>>>> > > > > > > for further processing/analysis (esp. if from not single
>>>> client)
>>>> > > > these
>>>> > > > > > > scans
>>>> > > > > > > are likely to hit the same region or couple of regions at a
>>>> time,
>>>> > > > which
>>>> > > > > > may
>>>> > > > > > > perform worse comparing to many scans hitting data that is
>>>> much
>>>> > > > better
>>>> > > > > > > spread over region servers.
>>>> > > > > > >
>>>> > > > > > > As for map-reduce job the approach should not affect reading
>>>> > > > > performance
>>>> > > > > > at
>>>> > > > > > > all: it's just that there are bucketsCount times more splits
>>>> and
>>>> > > > hence
>>>> > > > > > > bucketsCount times more Map tasks. In many cases this even
>>>> > improves
>>>> > > > > > overall
>>>> > > > > > > performance of the MR job since work is better distributed
>>>> over
>>>> > > > cluster
>>>> > > > > > > (esp. in situation when the aim is to constantly process the
>>>> > coming
>>>> > > > > delta
>>>> > > > > > > which usually resides in one or just couple of regions
>>>> depending
>>>> > on
>>>> > > > > > > processing frequency).
>>>> > > > > > >
>>>> > > > > > > If you can share details on your case, that will help to
>>>> > understand
>>>> > > > > what
>>>> > > > > > > effect(s) to expect from using this approach.
>>>> > > > > > >
>>>> > > > > > > Alex Baranau
>>>> > > > > > > ----
>>>> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
>>>> > Hadoop
>>>> > > -
>>>> > > > > > HBase
>>>> > > > > > >
>>>> > > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <
>>>> yuzhihong@gmail.com>
>>>> > > wrote:
>>>> > > > > > >
>>>> > > > > > > > Interesting project, Alex.
>>>> > > > > > > > Since there're bucketsCount scanners compared to one
>>>> scanner
>>>> > > > > > originally,
>>>> > > > > > > > have you performed load testing to see the impact ?
>>>> > > > > > > >
>>>> > > > > > > > Thanks
>>>> > > > > > > >
>>>> > > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
>>>> > > > > > alex.baranov.v@gmail.com
>>>> > > > > > > > >wrote:
>>>> > > > > > > >
>>>> > > > > > > > > Hello guys,
>>>> > > > > > > > >
>>>> > > > > > > > > I'd like to introduce a new small java project/lib
>>>> around
>>>> > > HBase:
>>>> > > > > > > HBaseWD.
>>>> > > > > > > > > It
>>>> > > > > > > > > is aimed to help with distribution of the load (across
>>>> > > > > regionservers)
>>>> > > > > > > > when
>>>> > > > > > > > > writing sequential (becasue of the row key nature)
>>>> records.
>>>> > It
>>>> > > > > > > implements
>>>> > > > > > > > > the solution which was discussed several times on this
>>>> > mailing
>>>> > > > list
>>>> > > > > > > (e.g.
>>>> > > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
>>>> > > > > > > > >
>>>> > > > > > > > > Please find the sources at
>>>> > > > > > https://github.com/sematext/HBaseWD(there's
>>>> > > > > > > > > also
>>>> > > > > > > > > a jar of current version for convenience). It is very
>>>> easy to
>>>> > > > make
>>>> > > > > > use
>>>> > > > > > > of
>>>> > > > > > > > > it: e.g. I added it to one existing project with 1+2
>>>> lines of
>>>> > > > code
>>>> > > > > > (one
>>>> > > > > > > > > where I write to HBase and 2 for configuring MapReduce
>>>> job).
>>>> > > > > > > > >
>>>> > > > > > > > > Any feedback is highly appreciated!
>>>> > > > > > > > >
>>>> > > > > > > > > Please find below the short intro to the lib [1].
>>>> > > > > > > > >
>>>> > > > > > > > > Alex Baranau
>>>> > > > > > > > > ----
>>>> > > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene -
>>>> Nutch -
>>>> > > > Hadoop
>>>> > > > > -
>>>> > > > > > > > HBase
>>>> > > > > > > > >
>>>> > > > > > > > > [1]
>>>> > > > > > > > >
>>>> > > > > > > > > Description:
>>>> > > > > > > > > ------------
>>>> > > > > > > > > HBaseWD stands for Distributing (sequential) Writes. It
>>>> was
>>>> > > > > inspired
>>>> > > > > > by
>>>> > > > > > > > > discussions on HBase mailing lists around the problem of
>>>> > > choosing
>>>> > > > > > > > between:
>>>> > > > > > > > > * writing records with sequential row keys (e.g.
>>>> time-series
>>>> > > data
>>>> > > > > > with
>>>> > > > > > > > row
>>>> > > > > > > > > key
>>>> > > > > > > > >  built based on ts)
>>>> > > > > > > > > * using random unique IDs for records
>>>> > > > > > > > >
>>>> > > > > > > > > First approach makes possible to perform fast range
>>>> scans
>>>> > with
>>>> > > > help
>>>> > > > > > of
>>>> > > > > > > > > setting
>>>> > > > > > > > > start/stop keys on Scanner, but creates single region
>>>> server
>>>> > > > > > > hot-spotting
>>>> > > > > > > > > problem upon writing data (as row keys go in sequence
>>>> all
>>>> > > records
>>>> > > > > end
>>>> > > > > > > up
>>>> > > > > > > > > written into a single region at a time).
>>>> > > > > > > > >
>>>> > > > > > > > > Second approach aims for fastest writing performance by
>>>> > > > > distributing
>>>> > > > > > > new
>>>> > > > > > > > > records over random regions but makes not possible doing
>>>> fast
>>>> > > > range
>>>> > > > > > > scans
>>>> > > > > > > > > against written data.
>>>> > > > > > > > >
>>>> > > > > > > > > The suggested approach stays in the middle of the two
>>>> above
>>>> > and
>>>> > > > > > proved
>>>> > > > > > > to
>>>> > > > > > > > > perform well by distributing records over the cluster
>>>> during
>>>> > > > > writing
>>>> > > > > > > data
>>>> > > > > > > > > while allowing range scans over it. HBaseWD provides
>>>> very
>>>> > > simple
>>>> > > > > API
>>>> > > > > > to
>>>> > > > > > > > > work with which makes it perfect to use with existing
>>>> code.
>>>> > > > > > > > >
>>>> > > > > > > > > Please refer to unit-tests for lib usage info as they
>>>> aimed
>>>> > to
>>>> > > > act
>>>> > > > > as
>>>> > > > > > > > > example.
>>>> > > > > > > > >
>>>> > > > > > > > > Brief Usage Info (Examples):
>>>> > > > > > > > > ----------------------------
>>>> > > > > > > > >
>>>> > > > > > > > > Distributing records with sequential keys which are
>>>> being
>>>> > > written
>>>> > > > > in
>>>> > > > > > up
>>>> > > > > > > > to
>>>> > > > > > > > > Byte.MAX_VALUE buckets:
>>>> > > > > > > > >
>>>> > > > > > > > >    byte bucketsCount = (byte) 32; // distributing into
>>>> 32
>>>> > > buckets
>>>> > > > > > > > >    RowKeyDistributor keyDistributor =
>>>> > > > > > > > >                           new
>>>> > > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
>>>> > > > > > > > >    for (int i = 0; i < 100; i++) {
>>>> > > > > > > > >      Put put = new
>>>> > > > > > Put(keyDistributor.getDistributedKey(originalKey));
>>>> > > > > > > > >      ... // add values
>>>> > > > > > > > >      hTable.put(put);
>>>> > > > > > > > >    }
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > > > Performing a range scan over written data (internally
>>>> > > > > <bucketsCount>
>>>> > > > > > > > > scanners
>>>> > > > > > > > > executed):
>>>> > > > > > > > >
>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>>>> > > > > > > > >    ResultScanner rs = DistributedScanner.create(hTable,
>>>> scan,
>>>> > > > > > > > > keyDistributor);
>>>> > > > > > > > >    for (Result current : rs) {
>>>> > > > > > > > >      ...
>>>> > > > > > > > >    }
>>>> > > > > > > > >
>>>> > > > > > > > > Performing mapreduce job over written data chunk
>>>> specified by
>>>> > > > Scan:
>>>> > > > > > > > >
>>>> > > > > > > > >    Configuration conf = HBaseConfiguration.create();
>>>> > > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
>>>> > > > > > > > >
>>>> > > > > > > > >    Scan scan = new Scan(startKey, stopKey);
>>>> > > > > > > > >
>>>> > > > > > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
>>>> > > > > > > > >      RowCounterMapper.class,
>>>> ImmutableBytesWritable.class,
>>>> > > > > > > Result.class,
>>>> > > > > > > > > job);
>>>> > > > > > > > >
>>>> > > > > > > > >    // Substituting standard TableInputFormat which was
>>>> set in
>>>> > > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
>>>> > > > > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
>>>> > > > > > > > >    keyDistributor.addInfo(job.getConfiguration());
>>>> > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > > > Extending Row Keys Distributing Patterns:
>>>> > > > > > > > > -----------------------------------------
>>>> > > > > > > > >
>>>> > > > > > > > > HBaseWD is designed to be flexible and to support custom
>>>> row
>>>> > > key
>>>> > > > > > > > > distribution
>>>> > > > > > > > > approaches. To define custom row key distributing logic
>>>> just
>>>> > > > > > implement
>>>> > > > > > > > > AbstractRowKeyDistributor abstract class which is really
>>>> very
>>>> > > > > simple:
>>>> > > > > > > > >
>>>> > > > > > > > >    public abstract class AbstractRowKeyDistributor
>>>> implements
>>>> > > > > > > > > Parametrizable {
>>>> > > > > > > > >      public abstract byte[] getDistributedKey(byte[]
>>>> > > > originalKey);
>>>> > > > > > > > >      public abstract byte[] getOriginalKey(byte[]
>>>> > adjustedKey);
>>>> > > > > > > > >      public abstract byte[][]
>>>> getAllDistributedKeys(byte[]
>>>> > > > > > > originalKey);
>>>> > > > > > > > >      ... // some utility methods
>>>> > > > > > > > >    }
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

https://issues.apache.org/jira/browse/HBASE-3811

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:

> My plan was to make regions that have active scanners more stable - trying
> not to move them when balancing.
> I prefer second approach - adding custom attribute(s) to Scan so that the
> Scans created by the method below can be 'grouped'.
>
> If you can file a JIRA, that would be great.
>
> On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > Aha, so you want to "count" it as single scan (or just differently) when
> > determining the load?
> >
> > The current code looks like this:
> >
> > class DistributedScanner:
> >  public static DistributedScanner create(HTable hTable, Scan original,
> > AbstractRowKeyDistributor keyDistributor) throws IOException {
> >    byte[][] startKeys =
> > keyDistributor.getAllDistributedKeys(original.getStartRow());
> >    byte[][] stopKeys =
> > keyDistributor.getAllDistributedKeys(original.getStopRow());
> >    Scan[] scans = new Scan[startKeys.length];
> >    for (byte i = 0; i < startKeys.length; i++) {
> >      scans[i] = new Scan(original);
> >      scans[i].setStartRow(startKeys[i]);
> >      scans[i].setStopRow(stopKeys[i]);
> >    }
> >
> >    ResultScanner[] rss = new ResultScanner[startKeys.length];
> >    for (byte i = 0; i < scans.length; i++) {
> >      rss[i] = hTable.getScanner(scans[i]);
> >    }
> >
> >    return new DistributedScanner(rss);
> >  }
> >
> > This is client code. To make these scans "identifiable" we need to either
> > use some different (derived from Scan) class or add some attribute to
> them.
> > There's no API for doing the latter. But we can do the former, but I
> don't
> > really like the idea of creating extra class (with no extra
> functionality)
> > just to distinguish it from the base one.
> >
> > If you can share why/how do you want to treat them differently on server
> > side, that would be helpful.
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > My request would be to make the distributed scan identifiable from
> server
> > > side.
> > > :-)
> > >
> > > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <
> alex.baranov.v@gmail.com
> > > >wrote:
> > >
> > > > > Basically bucketsCount may not equal number of regions for the
> > > underlying
> > > > > table.
> > > >
> > > > True: e.g. when there's only one region that holds data for the whole
> > > table
> > > > (not many records in table yet), distributed scan will fire N scans
> > > against
> > > > the same region.
> > > > On the other hand, in case there are huge number of regions for
> single
> > > > table, each scan can span over multiple regions.
> > > >
> > > > > I need to deal with normal scan and "distributed scan" at server
> > side.
> > > >
> > > > With current implementation "distributed" scan won't be recognized as
> > > > something special on the server side. It will be an ordinary scan.
> > Though
> > > > the number of scan will increase, given that the typical situation is
> > > "many
> > > > regions for single table", the scans of the same "distributed scan"
> are
> > > > likely not to hit the same region.
> > > >
> > > > Not sure if I answered your questions here. Feel free to ask more ;)
> > > >
> > > > Alex Baranau
> > > > ----
> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > > HBase
> > > >
> > > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Alex:
> > > > > If you read this, you would know why I asked:
> > > > > https://issues.apache.org/jira/browse/HBASE-3679
> > > > >
> > > > > I need to deal with normal scan and "distributed scan" at server
> > side.
> > > > > Basically bucketsCount may not equal number of regions for the
> > > underlying
> > > > > table.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Ted,
> > > > > >
> > > > > > We currently use this tool in the scenario where data is consumed
> > by
> > > > > > MapReduce jobs, so we haven't tested the performance of pure
> > > > "distributed
> > > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be close
> to
> > > > > simple
> > > > > > scan performance, or may be sometimes even faster depending on
> your
> > > > data
> > > > > > access patterns. E.g. in case you write timeseries data
> > (sequential)
> > > > > which
> > > > > > is written into the single region at a time, then e.g. if you
> > access
> > > > > delta
> > > > > > for further processing/analysis (esp. if from not single client)
> > > these
> > > > > > scans
> > > > > > are likely to hit the same region or couple of regions at a time,
> > > which
> > > > > may
> > > > > > perform worse comparing to many scans hitting data that is much
> > > better
> > > > > > spread over region servers.
> > > > > >
> > > > > > As for map-reduce job the approach should not affect reading
> > > > performance
> > > > > at
> > > > > > all: it's just that there are bucketsCount times more splits and
> > > hence
> > > > > > bucketsCount times more Map tasks. In many cases this even
> improves
> > > > > overall
> > > > > > performance of the MR job since work is better distributed over
> > > cluster
> > > > > > (esp. in situation when the aim is to constantly process the
> coming
> > > > delta
> > > > > > which usually resides in one or just couple of regions depending
> on
> > > > > > processing frequency).
> > > > > >
> > > > > > If you can share details on your case, that will help to
> understand
> > > > what
> > > > > > effect(s) to expect from using this approach.
> > > > > >
> > > > > > Alex Baranau
> > > > > > ----
> > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > > > > HBase
> > > > > >
> > > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Interesting project, Alex.
> > > > > > > Since there're bucketsCount scanners compared to one scanner
> > > > > originally,
> > > > > > > have you performed load testing to see the impact ?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > > > > alex.baranov.v@gmail.com
> > > > > > > >wrote:
> > > > > > >
> > > > > > > > Hello guys,
> > > > > > > >
> > > > > > > > I'd like to introduce a new small java project/lib around
> > HBase:
> > > > > > HBaseWD.
> > > > > > > > It
> > > > > > > > is aimed to help with distribution of the load (across
> > > > regionservers)
> > > > > > > when
> > > > > > > > writing sequential (becasue of the row key nature) records.
> It
> > > > > > implements
> > > > > > > > the solution which was discussed several times on this
> mailing
> > > list
> > > > > > (e.g.
> > > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > > > > > >
> > > > > > > > Please find the sources at
> > > > > https://github.com/sematext/HBaseWD(there's
> > > > > > > > also
> > > > > > > > a jar of current version for convenience). It is very easy to
> > > make
> > > > > use
> > > > > > of
> > > > > > > > it: e.g. I added it to one existing project with 1+2 lines of
> > > code
> > > > > (one
> > > > > > > > where I write to HBase and 2 for configuring MapReduce job).
> > > > > > > >
> > > > > > > > Any feedback is highly appreciated!
> > > > > > > >
> > > > > > > > Please find below the short intro to the lib [1].
> > > > > > > >
> > > > > > > > Alex Baranau
> > > > > > > > ----
> > > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > > Hadoop
> > > > -
> > > > > > > HBase
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > > Description:
> > > > > > > > ------------
> > > > > > > > HBaseWD stands for Distributing (sequential) Writes. It was
> > > > inspired
> > > > > by
> > > > > > > > discussions on HBase mailing lists around the problem of
> > choosing
> > > > > > > between:
> > > > > > > > * writing records with sequential row keys (e.g. time-series
> > data
> > > > > with
> > > > > > > row
> > > > > > > > key
> > > > > > > >  built based on ts)
> > > > > > > > * using random unique IDs for records
> > > > > > > >
> > > > > > > > First approach makes possible to perform fast range scans
> with
> > > help
> > > > > of
> > > > > > > > setting
> > > > > > > > start/stop keys on Scanner, but creates single region server
> > > > > > hot-spotting
> > > > > > > > problem upon writing data (as row keys go in sequence all
> > records
> > > > end
> > > > > > up
> > > > > > > > written into a single region at a time).
> > > > > > > >
> > > > > > > > Second approach aims for fastest writing performance by
> > > > distributing
> > > > > > new
> > > > > > > > records over random regions but makes not possible doing fast
> > > range
> > > > > > scans
> > > > > > > > against written data.
> > > > > > > >
> > > > > > > > The suggested approach stays in the middle of the two above
> and
> > > > > proved
> > > > > > to
> > > > > > > > perform well by distributing records over the cluster during
> > > > writing
> > > > > > data
> > > > > > > > while allowing range scans over it. HBaseWD provides very
> > simple
> > > > API
> > > > > to
> > > > > > > > work with which makes it perfect to use with existing code.
> > > > > > > >
> > > > > > > > Please refer to unit-tests for lib usage info as they aimed
> to
> > > act
> > > > as
> > > > > > > > example.
> > > > > > > >
> > > > > > > > Brief Usage Info (Examples):
> > > > > > > > ----------------------------
> > > > > > > >
> > > > > > > > Distributing records with sequential keys which are being
> > written
> > > > in
> > > > > up
> > > > > > > to
> > > > > > > > Byte.MAX_VALUE buckets:
> > > > > > > >
> > > > > > > >    byte bucketsCount = (byte) 32; // distributing into 32
> > buckets
> > > > > > > >    RowKeyDistributor keyDistributor =
> > > > > > > >                           new
> > > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > > > > >    for (int i = 0; i < 100; i++) {
> > > > > > > >      Put put = new
> > > > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > > > > >      ... // add values
> > > > > > > >      hTable.put(put);
> > > > > > > >    }
> > > > > > > >
> > > > > > > >
> > > > > > > > Performing a range scan over written data (internally
> > > > <bucketsCount>
> > > > > > > > scanners
> > > > > > > > executed):
> > > > > > > >
> > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > > > > > keyDistributor);
> > > > > > > >    for (Result current : rs) {
> > > > > > > >      ...
> > > > > > > >    }
> > > > > > > >
> > > > > > > > Performing mapreduce job over written data chunk specified by
> > > Scan:
> > > > > > > >
> > > > > > > >    Configuration conf = HBaseConfiguration.create();
> > > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > > > > > >
> > > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > > >
> > > > > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > > > > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > > > > > Result.class,
> > > > > > > > job);
> > > > > > > >
> > > > > > > >    // Substituting standard TableInputFormat which was set in
> > > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > > > > > >    keyDistributor.addInfo(job.getConfiguration());
> > > > > > > >
> > > > > > > >
> > > > > > > > Extending Row Keys Distributing Patterns:
> > > > > > > > -----------------------------------------
> > > > > > > >
> > > > > > > > HBaseWD is designed to be flexible and to support custom row
> > key
> > > > > > > > distribution
> > > > > > > > approaches. To define custom row key distributing logic just
> > > > > implement
> > > > > > > > AbstractRowKeyDistributor abstract class which is really very
> > > > simple:
> > > > > > > >
> > > > > > > >    public abstract class AbstractRowKeyDistributor implements
> > > > > > > > Parametrizable {
> > > > > > > >      public abstract byte[] getDistributedKey(byte[]
> > > originalKey);
> > > > > > > >      public abstract byte[] getOriginalKey(byte[]
> adjustedKey);
> > > > > > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > > > > > originalKey);
> > > > > > > >      ... // some utility methods
> > > > > > > >    }
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

My plan was to make regions that have active scanners more stable - trying
not to move them when balancing.
I prefer second approach - adding custom attribute(s) to Scan so that the
Scans created by the method below can be 'grouped'.

If you can file a JIRA, that would be great.

On Thu, Apr 21, 2011 at 7:23 AM, Alex Baranau <al...@gmail.com>wrote:

> Aha, so you want to "count" it as single scan (or just differently) when
> determining the load?
>
> The current code looks like this:
>
> class DistributedScanner:
>  public static DistributedScanner create(HTable hTable, Scan original,
> AbstractRowKeyDistributor keyDistributor) throws IOException {
>    byte[][] startKeys =
> keyDistributor.getAllDistributedKeys(original.getStartRow());
>    byte[][] stopKeys =
> keyDistributor.getAllDistributedKeys(original.getStopRow());
>    Scan[] scans = new Scan[startKeys.length];
>    for (byte i = 0; i < startKeys.length; i++) {
>      scans[i] = new Scan(original);
>      scans[i].setStartRow(startKeys[i]);
>      scans[i].setStopRow(stopKeys[i]);
>    }
>
>    ResultScanner[] rss = new ResultScanner[startKeys.length];
>    for (byte i = 0; i < scans.length; i++) {
>      rss[i] = hTable.getScanner(scans[i]);
>    }
>
>    return new DistributedScanner(rss);
>  }
>
> This is client code. To make these scans "identifiable" we need to either
> use some different (derived from Scan) class or add some attribute to them.
> There's no API for doing the latter. But we can do the former, but I don't
> really like the idea of creating extra class (with no extra functionality)
> just to distinguish it from the base one.
>
> If you can share why/how do you want to treat them differently on server
> side, that would be helpful.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > My request would be to make the distributed scan identifiable from server
> > side.
> > :-)
> >
> > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > > Basically bucketsCount may not equal number of regions for the
> > underlying
> > > > table.
> > >
> > > True: e.g. when there's only one region that holds data for the whole
> > table
> > > (not many records in table yet), distributed scan will fire N scans
> > against
> > > the same region.
> > > On the other hand, in case there are huge number of regions for single
> > > table, each scan can span over multiple regions.
> > >
> > > > I need to deal with normal scan and "distributed scan" at server
> side.
> > >
> > > With current implementation "distributed" scan won't be recognized as
> > > something special on the server side. It will be an ordinary scan.
> Though
> > > the number of scan will increase, given that the typical situation is
> > "many
> > > regions for single table", the scans of the same "distributed scan" are
> > > likely not to hit the same region.
> > >
> > > Not sure if I answered your questions here. Feel free to ask more ;)
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Alex:
> > > > If you read this, you would know why I asked:
> > > > https://issues.apache.org/jira/browse/HBASE-3679
> > > >
> > > > I need to deal with normal scan and "distributed scan" at server
> side.
> > > > Basically bucketsCount may not equal number of regions for the
> > underlying
> > > > table.
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > We currently use this tool in the scenario where data is consumed
> by
> > > > > MapReduce jobs, so we haven't tested the performance of pure
> > > "distributed
> > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be close to
> > > > simple
> > > > > scan performance, or may be sometimes even faster depending on your
> > > data
> > > > > access patterns. E.g. in case you write timeseries data
> (sequential)
> > > > which
> > > > > is written into the single region at a time, then e.g. if you
> access
> > > > delta
> > > > > for further processing/analysis (esp. if from not single client)
> > these
> > > > > scans
> > > > > are likely to hit the same region or couple of regions at a time,
> > which
> > > > may
> > > > > perform worse comparing to many scans hitting data that is much
> > better
> > > > > spread over region servers.
> > > > >
> > > > > As for map-reduce job the approach should not affect reading
> > > performance
> > > > at
> > > > > all: it's just that there are bucketsCount times more splits and
> > hence
> > > > > bucketsCount times more Map tasks. In many cases this even improves
> > > > overall
> > > > > performance of the MR job since work is better distributed over
> > cluster
> > > > > (esp. in situation when the aim is to constantly process the coming
> > > delta
> > > > > which usually resides in one or just couple of regions depending on
> > > > > processing frequency).
> > > > >
> > > > > If you can share details on your case, that will help to understand
> > > what
> > > > > effect(s) to expect from using this approach.
> > > > >
> > > > > Alex Baranau
> > > > > ----
> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > > HBase
> > > > >
> > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > Interesting project, Alex.
> > > > > > Since there're bucketsCount scanners compared to one scanner
> > > > originally,
> > > > > > have you performed load testing to see the impact ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > > > alex.baranov.v@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hello guys,
> > > > > > >
> > > > > > > I'd like to introduce a new small java project/lib around
> HBase:
> > > > > HBaseWD.
> > > > > > > It
> > > > > > > is aimed to help with distribution of the load (across
> > > regionservers)
> > > > > > when
> > > > > > > writing sequential (becasue of the row key nature) records. It
> > > > > implements
> > > > > > > the solution which was discussed several times on this mailing
> > list
> > > > > (e.g.
> > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > > > > >
> > > > > > > Please find the sources at
> > > > https://github.com/sematext/HBaseWD(there's
> > > > > > > also
> > > > > > > a jar of current version for convenience). It is very easy to
> > make
> > > > use
> > > > > of
> > > > > > > it: e.g. I added it to one existing project with 1+2 lines of
> > code
> > > > (one
> > > > > > > where I write to HBase and 2 for configuring MapReduce job).
> > > > > > >
> > > > > > > Any feedback is highly appreciated!
> > > > > > >
> > > > > > > Please find below the short intro to the lib [1].
> > > > > > >
> > > > > > > Alex Baranau
> > > > > > > ----
> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > > -
> > > > > > HBase
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > > Description:
> > > > > > > ------------
> > > > > > > HBaseWD stands for Distributing (sequential) Writes. It was
> > > inspired
> > > > by
> > > > > > > discussions on HBase mailing lists around the problem of
> choosing
> > > > > > between:
> > > > > > > * writing records with sequential row keys (e.g. time-series
> data
> > > > with
> > > > > > row
> > > > > > > key
> > > > > > >  built based on ts)
> > > > > > > * using random unique IDs for records
> > > > > > >
> > > > > > > First approach makes possible to perform fast range scans with
> > help
> > > > of
> > > > > > > setting
> > > > > > > start/stop keys on Scanner, but creates single region server
> > > > > hot-spotting
> > > > > > > problem upon writing data (as row keys go in sequence all
> records
> > > end
> > > > > up
> > > > > > > written into a single region at a time).
> > > > > > >
> > > > > > > Second approach aims for fastest writing performance by
> > > distributing
> > > > > new
> > > > > > > records over random regions but makes not possible doing fast
> > range
> > > > > scans
> > > > > > > against written data.
> > > > > > >
> > > > > > > The suggested approach stays in the middle of the two above and
> > > > proved
> > > > > to
> > > > > > > perform well by distributing records over the cluster during
> > > writing
> > > > > data
> > > > > > > while allowing range scans over it. HBaseWD provides very
> simple
> > > API
> > > > to
> > > > > > > work with which makes it perfect to use with existing code.
> > > > > > >
> > > > > > > Please refer to unit-tests for lib usage info as they aimed to
> > act
> > > as
> > > > > > > example.
> > > > > > >
> > > > > > > Brief Usage Info (Examples):
> > > > > > > ----------------------------
> > > > > > >
> > > > > > > Distributing records with sequential keys which are being
> written
> > > in
> > > > up
> > > > > > to
> > > > > > > Byte.MAX_VALUE buckets:
> > > > > > >
> > > > > > >    byte bucketsCount = (byte) 32; // distributing into 32
> buckets
> > > > > > >    RowKeyDistributor keyDistributor =
> > > > > > >                           new
> > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > > > >    for (int i = 0; i < 100; i++) {
> > > > > > >      Put put = new
> > > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > > > >      ... // add values
> > > > > > >      hTable.put(put);
> > > > > > >    }
> > > > > > >
> > > > > > >
> > > > > > > Performing a range scan over written data (internally
> > > <bucketsCount>
> > > > > > > scanners
> > > > > > > executed):
> > > > > > >
> > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > > > > keyDistributor);
> > > > > > >    for (Result current : rs) {
> > > > > > >      ...
> > > > > > >    }
> > > > > > >
> > > > > > > Performing mapreduce job over written data chunk specified by
> > Scan:
> > > > > > >
> > > > > > >    Configuration conf = HBaseConfiguration.create();
> > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > > > > >
> > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > >
> > > > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > > > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > > > > Result.class,
> > > > > > > job);
> > > > > > >
> > > > > > >    // Substituting standard TableInputFormat which was set in
> > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > > > > >    keyDistributor.addInfo(job.getConfiguration());
> > > > > > >
> > > > > > >
> > > > > > > Extending Row Keys Distributing Patterns:
> > > > > > > -----------------------------------------
> > > > > > >
> > > > > > > HBaseWD is designed to be flexible and to support custom row
> key
> > > > > > > distribution
> > > > > > > approaches. To define custom row key distributing logic just
> > > > implement
> > > > > > > AbstractRowKeyDistributor abstract class which is really very
> > > simple:
> > > > > > >
> > > > > > >    public abstract class AbstractRowKeyDistributor implements
> > > > > > > Parametrizable {
> > > > > > >      public abstract byte[] getDistributedKey(byte[]
> > originalKey);
> > > > > > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > > > > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > > > > originalKey);
> > > > > > >      ... // some utility methods
> > > > > > >    }
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Weishung Chung <we...@gmail.com>.

Awesome, I need to try it out :) Thank you !

On Thu, Apr 21, 2011 at 9:23 AM, Alex Baranau <al...@gmail.com>wrote:

> Aha, so you want to "count" it as single scan (or just differently) when
> determining the load?
>
> The current code looks like this:
>
> class DistributedScanner:
>  public static DistributedScanner create(HTable hTable, Scan original,
> AbstractRowKeyDistributor keyDistributor) throws IOException {
>    byte[][] startKeys =
> keyDistributor.getAllDistributedKeys(original.getStartRow());
>    byte[][] stopKeys =
> keyDistributor.getAllDistributedKeys(original.getStopRow());
>    Scan[] scans = new Scan[startKeys.length];
>    for (byte i = 0; i < startKeys.length; i++) {
>      scans[i] = new Scan(original);
>      scans[i].setStartRow(startKeys[i]);
>      scans[i].setStopRow(stopKeys[i]);
>    }
>
>    ResultScanner[] rss = new ResultScanner[startKeys.length];
>    for (byte i = 0; i < scans.length; i++) {
>      rss[i] = hTable.getScanner(scans[i]);
>    }
>
>    return new DistributedScanner(rss);
>  }
>
> This is client code. To make these scans "identifiable" we need to either
> use some different (derived from Scan) class or add some attribute to them.
> There's no API for doing the latter. But we can do the former, but I don't
> really like the idea of creating extra class (with no extra functionality)
> just to distinguish it from the base one.
>
> If you can share why/how do you want to treat them differently on server
> side, that would be helpful.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > My request would be to make the distributed scan identifiable from server
> > side.
> > :-)
> >
> > On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > > Basically bucketsCount may not equal number of regions for the
> > underlying
> > > > table.
> > >
> > > True: e.g. when there's only one region that holds data for the whole
> > table
> > > (not many records in table yet), distributed scan will fire N scans
> > against
> > > the same region.
> > > On the other hand, in case there are huge number of regions for single
> > > table, each scan can span over multiple regions.
> > >
> > > > I need to deal with normal scan and "distributed scan" at server
> side.
> > >
> > > With current implementation "distributed" scan won't be recognized as
> > > something special on the server side. It will be an ordinary scan.
> Though
> > > the number of scan will increase, given that the typical situation is
> > "many
> > > regions for single table", the scans of the same "distributed scan" are
> > > likely not to hit the same region.
> > >
> > > Not sure if I answered your questions here. Feel free to ask more ;)
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Alex:
> > > > If you read this, you would know why I asked:
> > > > https://issues.apache.org/jira/browse/HBASE-3679
> > > >
> > > > I need to deal with normal scan and "distributed scan" at server
> side.
> > > > Basically bucketsCount may not equal number of regions for the
> > underlying
> > > > table.
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > > We currently use this tool in the scenario where data is consumed
> by
> > > > > MapReduce jobs, so we haven't tested the performance of pure
> > > "distributed
> > > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be close to
> > > > simple
> > > > > scan performance, or may be sometimes even faster depending on your
> > > data
> > > > > access patterns. E.g. in case you write timeseries data
> (sequential)
> > > > which
> > > > > is written into the single region at a time, then e.g. if you
> access
> > > > delta
> > > > > for further processing/analysis (esp. if from not single client)
> > these
> > > > > scans
> > > > > are likely to hit the same region or couple of regions at a time,
> > which
> > > > may
> > > > > perform worse comparing to many scans hitting data that is much
> > better
> > > > > spread over region servers.
> > > > >
> > > > > As for map-reduce job the approach should not affect reading
> > > performance
> > > > at
> > > > > all: it's just that there are bucketsCount times more splits and
> > hence
> > > > > bucketsCount times more Map tasks. In many cases this even improves
> > > > overall
> > > > > performance of the MR job since work is better distributed over
> > cluster
> > > > > (esp. in situation when the aim is to constantly process the coming
> > > delta
> > > > > which usually resides in one or just couple of regions depending on
> > > > > processing frequency).
> > > > >
> > > > > If you can share details on your case, that will help to understand
> > > what
> > > > > effect(s) to expect from using this approach.
> > > > >
> > > > > Alex Baranau
> > > > > ----
> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > > HBase
> > > > >
> > > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > Interesting project, Alex.
> > > > > > Since there're bucketsCount scanners compared to one scanner
> > > > originally,
> > > > > > have you performed load testing to see the impact ?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > > > alex.baranov.v@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Hello guys,
> > > > > > >
> > > > > > > I'd like to introduce a new small java project/lib around
> HBase:
> > > > > HBaseWD.
> > > > > > > It
> > > > > > > is aimed to help with distribution of the load (across
> > > regionservers)
> > > > > > when
> > > > > > > writing sequential (becasue of the row key nature) records. It
> > > > > implements
> > > > > > > the solution which was discussed several times on this mailing
> > list
> > > > > (e.g.
> > > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > > > > >
> > > > > > > Please find the sources at
> > > > https://github.com/sematext/HBaseWD(there's
> > > > > > > also
> > > > > > > a jar of current version for convenience). It is very easy to
> > make
> > > > use
> > > > > of
> > > > > > > it: e.g. I added it to one existing project with 1+2 lines of
> > code
> > > > (one
> > > > > > > where I write to HBase and 2 for configuring MapReduce job).
> > > > > > >
> > > > > > > Any feedback is highly appreciated!
> > > > > > >
> > > > > > > Please find below the short intro to the lib [1].
> > > > > > >
> > > > > > > Alex Baranau
> > > > > > > ----
> > > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> > Hadoop
> > > -
> > > > > > HBase
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > > Description:
> > > > > > > ------------
> > > > > > > HBaseWD stands for Distributing (sequential) Writes. It was
> > > inspired
> > > > by
> > > > > > > discussions on HBase mailing lists around the problem of
> choosing
> > > > > > between:
> > > > > > > * writing records with sequential row keys (e.g. time-series
> data
> > > > with
> > > > > > row
> > > > > > > key
> > > > > > >  built based on ts)
> > > > > > > * using random unique IDs for records
> > > > > > >
> > > > > > > First approach makes possible to perform fast range scans with
> > help
> > > > of
> > > > > > > setting
> > > > > > > start/stop keys on Scanner, but creates single region server
> > > > > hot-spotting
> > > > > > > problem upon writing data (as row keys go in sequence all
> records
> > > end
> > > > > up
> > > > > > > written into a single region at a time).
> > > > > > >
> > > > > > > Second approach aims for fastest writing performance by
> > > distributing
> > > > > new
> > > > > > > records over random regions but makes not possible doing fast
> > range
> > > > > scans
> > > > > > > against written data.
> > > > > > >
> > > > > > > The suggested approach stays in the middle of the two above and
> > > > proved
> > > > > to
> > > > > > > perform well by distributing records over the cluster during
> > > writing
> > > > > data
> > > > > > > while allowing range scans over it. HBaseWD provides very
> simple
> > > API
> > > > to
> > > > > > > work with which makes it perfect to use with existing code.
> > > > > > >
> > > > > > > Please refer to unit-tests for lib usage info as they aimed to
> > act
> > > as
> > > > > > > example.
> > > > > > >
> > > > > > > Brief Usage Info (Examples):
> > > > > > > ----------------------------
> > > > > > >
> > > > > > > Distributing records with sequential keys which are being
> written
> > > in
> > > > up
> > > > > > to
> > > > > > > Byte.MAX_VALUE buckets:
> > > > > > >
> > > > > > >    byte bucketsCount = (byte) 32; // distributing into 32
> buckets
> > > > > > >    RowKeyDistributor keyDistributor =
> > > > > > >                           new
> > > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > > > >    for (int i = 0; i < 100; i++) {
> > > > > > >      Put put = new
> > > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > > > >      ... // add values
> > > > > > >      hTable.put(put);
> > > > > > >    }
> > > > > > >
> > > > > > >
> > > > > > > Performing a range scan over written data (internally
> > > <bucketsCount>
> > > > > > > scanners
> > > > > > > executed):
> > > > > > >
> > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > > > > keyDistributor);
> > > > > > >    for (Result current : rs) {
> > > > > > >      ...
> > > > > > >    }
> > > > > > >
> > > > > > > Performing mapreduce job over written data chunk specified by
> > Scan:
> > > > > > >
> > > > > > >    Configuration conf = HBaseConfiguration.create();
> > > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > > > > >
> > > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > > >
> > > > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > > > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > > > > Result.class,
> > > > > > > job);
> > > > > > >
> > > > > > >    // Substituting standard TableInputFormat which was set in
> > > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > > > > >    keyDistributor.addInfo(job.getConfiguration());
> > > > > > >
> > > > > > >
> > > > > > > Extending Row Keys Distributing Patterns:
> > > > > > > -----------------------------------------
> > > > > > >
> > > > > > > HBaseWD is designed to be flexible and to support custom row
> key
> > > > > > > distribution
> > > > > > > approaches. To define custom row key distributing logic just
> > > > implement
> > > > > > > AbstractRowKeyDistributor abstract class which is really very
> > > simple:
> > > > > > >
> > > > > > >    public abstract class AbstractRowKeyDistributor implements
> > > > > > > Parametrizable {
> > > > > > >      public abstract byte[] getDistributedKey(byte[]
> > originalKey);
> > > > > > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > > > > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > > > > originalKey);
> > > > > > >      ... // some utility methods
> > > > > > >    }
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

Aha, so you want to "count" it as single scan (or just differently) when
determining the load?

The current code looks like this:

class DistributedScanner:
  public static DistributedScanner create(HTable hTable, Scan original,
AbstractRowKeyDistributor keyDistributor) throws IOException {
    byte[][] startKeys =
keyDistributor.getAllDistributedKeys(original.getStartRow());
    byte[][] stopKeys =
keyDistributor.getAllDistributedKeys(original.getStopRow());
    Scan[] scans = new Scan[startKeys.length];
    for (byte i = 0; i < startKeys.length; i++) {
      scans[i] = new Scan(original);
      scans[i].setStartRow(startKeys[i]);
      scans[i].setStopRow(stopKeys[i]);
    }

    ResultScanner[] rss = new ResultScanner[startKeys.length];
    for (byte i = 0; i < scans.length; i++) {
      rss[i] = hTable.getScanner(scans[i]);
    }

    return new DistributedScanner(rss);
  }

This is client code. To make these scans "identifiable" we need to either
use some different (derived from Scan) class or add some attribute to them.
There's no API for doing the latter. But we can do the former, but I don't
really like the idea of creating extra class (with no extra functionality)
just to distinguish it from the base one.

If you can share why/how do you want to treat them differently on server
side, that would be helpful.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Apr 21, 2011 at 4:58 PM, Ted Yu <yu...@gmail.com> wrote:

> My request would be to make the distributed scan identifiable from server
> side.
> :-)
>
> On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > > Basically bucketsCount may not equal number of regions for the
> underlying
> > > table.
> >
> > True: e.g. when there's only one region that holds data for the whole
> table
> > (not many records in table yet), distributed scan will fire N scans
> against
> > the same region.
> > On the other hand, in case there are huge number of regions for single
> > table, each scan can span over multiple regions.
> >
> > > I need to deal with normal scan and "distributed scan" at server side.
> >
> > With current implementation "distributed" scan won't be recognized as
> > something special on the server side. It will be an ordinary scan. Though
> > the number of scan will increase, given that the typical situation is
> "many
> > regions for single table", the scans of the same "distributed scan" are
> > likely not to hit the same region.
> >
> > Not sure if I answered your questions here. Feel free to ask more ;)
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Alex:
> > > If you read this, you would know why I asked:
> > > https://issues.apache.org/jira/browse/HBASE-3679
> > >
> > > I need to deal with normal scan and "distributed scan" at server side.
> > > Basically bucketsCount may not equal number of regions for the
> underlying
> > > table.
> > >
> > > Cheers
> > >
> > > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > > >wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > We currently use this tool in the scenario where data is consumed by
> > > > MapReduce jobs, so we haven't tested the performance of pure
> > "distributed
> > > > scan" (i.e. N scans instead of 1) a lot. I expect it to be close to
> > > simple
> > > > scan performance, or may be sometimes even faster depending on your
> > data
> > > > access patterns. E.g. in case you write timeseries data (sequential)
> > > which
> > > > is written into the single region at a time, then e.g. if you access
> > > delta
> > > > for further processing/analysis (esp. if from not single client)
> these
> > > > scans
> > > > are likely to hit the same region or couple of regions at a time,
> which
> > > may
> > > > perform worse comparing to many scans hitting data that is much
> better
> > > > spread over region servers.
> > > >
> > > > As for map-reduce job the approach should not affect reading
> > performance
> > > at
> > > > all: it's just that there are bucketsCount times more splits and
> hence
> > > > bucketsCount times more Map tasks. In many cases this even improves
> > > overall
> > > > performance of the MR job since work is better distributed over
> cluster
> > > > (esp. in situation when the aim is to constantly process the coming
> > delta
> > > > which usually resides in one or just couple of regions depending on
> > > > processing frequency).
> > > >
> > > > If you can share details on your case, that will help to understand
> > what
> > > > effect(s) to expect from using this approach.
> > > >
> > > > Alex Baranau
> > > > ----
> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > > HBase
> > > >
> > > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Interesting project, Alex.
> > > > > Since there're bucketsCount scanners compared to one scanner
> > > originally,
> > > > > have you performed load testing to see the impact ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > > alex.baranov.v@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hello guys,
> > > > > >
> > > > > > I'd like to introduce a new small java project/lib around HBase:
> > > > HBaseWD.
> > > > > > It
> > > > > > is aimed to help with distribution of the load (across
> > regionservers)
> > > > > when
> > > > > > writing sequential (becasue of the row key nature) records. It
> > > > implements
> > > > > > the solution which was discussed several times on this mailing
> list
> > > > (e.g.
> > > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > > > >
> > > > > > Please find the sources at
> > > https://github.com/sematext/HBaseWD(there's
> > > > > > also
> > > > > > a jar of current version for convenience). It is very easy to
> make
> > > use
> > > > of
> > > > > > it: e.g. I added it to one existing project with 1+2 lines of
> code
> > > (one
> > > > > > where I write to HBase and 2 for configuring MapReduce job).
> > > > > >
> > > > > > Any feedback is highly appreciated!
> > > > > >
> > > > > > Please find below the short intro to the lib [1].
> > > > > >
> > > > > > Alex Baranau
> > > > > > ----
> > > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop
> > -
> > > > > HBase
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > > Description:
> > > > > > ------------
> > > > > > HBaseWD stands for Distributing (sequential) Writes. It was
> > inspired
> > > by
> > > > > > discussions on HBase mailing lists around the problem of choosing
> > > > > between:
> > > > > > * writing records with sequential row keys (e.g. time-series data
> > > with
> > > > > row
> > > > > > key
> > > > > >  built based on ts)
> > > > > > * using random unique IDs for records
> > > > > >
> > > > > > First approach makes possible to perform fast range scans with
> help
> > > of
> > > > > > setting
> > > > > > start/stop keys on Scanner, but creates single region server
> > > > hot-spotting
> > > > > > problem upon writing data (as row keys go in sequence all records
> > end
> > > > up
> > > > > > written into a single region at a time).
> > > > > >
> > > > > > Second approach aims for fastest writing performance by
> > distributing
> > > > new
> > > > > > records over random regions but makes not possible doing fast
> range
> > > > scans
> > > > > > against written data.
> > > > > >
> > > > > > The suggested approach stays in the middle of the two above and
> > > proved
> > > > to
> > > > > > perform well by distributing records over the cluster during
> > writing
> > > > data
> > > > > > while allowing range scans over it. HBaseWD provides very simple
> > API
> > > to
> > > > > > work with which makes it perfect to use with existing code.
> > > > > >
> > > > > > Please refer to unit-tests for lib usage info as they aimed to
> act
> > as
> > > > > > example.
> > > > > >
> > > > > > Brief Usage Info (Examples):
> > > > > > ----------------------------
> > > > > >
> > > > > > Distributing records with sequential keys which are being written
> > in
> > > up
> > > > > to
> > > > > > Byte.MAX_VALUE buckets:
> > > > > >
> > > > > >    byte bucketsCount = (byte) 32; // distributing into 32 buckets
> > > > > >    RowKeyDistributor keyDistributor =
> > > > > >                           new
> > > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > > >    for (int i = 0; i < 100; i++) {
> > > > > >      Put put = new
> > > Put(keyDistributor.getDistributedKey(originalKey));
> > > > > >      ... // add values
> > > > > >      hTable.put(put);
> > > > > >    }
> > > > > >
> > > > > >
> > > > > > Performing a range scan over written data (internally
> > <bucketsCount>
> > > > > > scanners
> > > > > > executed):
> > > > > >
> > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > > > keyDistributor);
> > > > > >    for (Result current : rs) {
> > > > > >      ...
> > > > > >    }
> > > > > >
> > > > > > Performing mapreduce job over written data chunk specified by
> Scan:
> > > > > >
> > > > > >    Configuration conf = HBaseConfiguration.create();
> > > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > > > >
> > > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > > >
> > > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > > > Result.class,
> > > > > > job);
> > > > > >
> > > > > >    // Substituting standard TableInputFormat which was set in
> > > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > > > >    keyDistributor.addInfo(job.getConfiguration());
> > > > > >
> > > > > >
> > > > > > Extending Row Keys Distributing Patterns:
> > > > > > -----------------------------------------
> > > > > >
> > > > > > HBaseWD is designed to be flexible and to support custom row key
> > > > > > distribution
> > > > > > approaches. To define custom row key distributing logic just
> > > implement
> > > > > > AbstractRowKeyDistributor abstract class which is really very
> > simple:
> > > > > >
> > > > > >    public abstract class AbstractRowKeyDistributor implements
> > > > > > Parametrizable {
> > > > > >      public abstract byte[] getDistributedKey(byte[]
> originalKey);
> > > > > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > > > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > > > originalKey);
> > > > > >      ... // some utility methods
> > > > > >    }
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

My request would be to make the distributed scan identifiable from server
side.
:-)

On Thu, Apr 21, 2011 at 5:45 AM, Alex Baranau <al...@gmail.com>wrote:

> > Basically bucketsCount may not equal number of regions for the underlying
> > table.
>
> True: e.g. when there's only one region that holds data for the whole table
> (not many records in table yet), distributed scan will fire N scans against
> the same region.
> On the other hand, in case there are huge number of regions for single
> table, each scan can span over multiple regions.
>
> > I need to deal with normal scan and "distributed scan" at server side.
>
> With current implementation "distributed" scan won't be recognized as
> something special on the server side. It will be an ordinary scan. Though
> the number of scan will increase, given that the typical situation is "many
> regions for single table", the scans of the same "distributed scan" are
> likely not to hit the same region.
>
> Not sure if I answered your questions here. Feel free to ask more ;)
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Alex:
> > If you read this, you would know why I asked:
> > https://issues.apache.org/jira/browse/HBASE-3679
> >
> > I need to deal with normal scan and "distributed scan" at server side.
> > Basically bucketsCount may not equal number of regions for the underlying
> > table.
> >
> > Cheers
> >
> > On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > Hi Ted,
> > >
> > > We currently use this tool in the scenario where data is consumed by
> > > MapReduce jobs, so we haven't tested the performance of pure
> "distributed
> > > scan" (i.e. N scans instead of 1) a lot. I expect it to be close to
> > simple
> > > scan performance, or may be sometimes even faster depending on your
> data
> > > access patterns. E.g. in case you write timeseries data (sequential)
> > which
> > > is written into the single region at a time, then e.g. if you access
> > delta
> > > for further processing/analysis (esp. if from not single client) these
> > > scans
> > > are likely to hit the same region or couple of regions at a time, which
> > may
> > > perform worse comparing to many scans hitting data that is much better
> > > spread over region servers.
> > >
> > > As for map-reduce job the approach should not affect reading
> performance
> > at
> > > all: it's just that there are bucketsCount times more splits and hence
> > > bucketsCount times more Map tasks. In many cases this even improves
> > overall
> > > performance of the MR job since work is better distributed over cluster
> > > (esp. in situation when the aim is to constantly process the coming
> delta
> > > which usually resides in one or just couple of regions depending on
> > > processing frequency).
> > >
> > > If you can share details on your case, that will help to understand
> what
> > > effect(s) to expect from using this approach.
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Interesting project, Alex.
> > > > Since there're bucketsCount scanners compared to one scanner
> > originally,
> > > > have you performed load testing to see the impact ?
> > > >
> > > > Thanks
> > > >
> > > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> > alex.baranov.v@gmail.com
> > > > >wrote:
> > > >
> > > > > Hello guys,
> > > > >
> > > > > I'd like to introduce a new small java project/lib around HBase:
> > > HBaseWD.
> > > > > It
> > > > > is aimed to help with distribution of the load (across
> regionservers)
> > > > when
> > > > > writing sequential (becasue of the row key nature) records. It
> > > implements
> > > > > the solution which was discussed several times on this mailing list
> > > (e.g.
> > > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > > >
> > > > > Please find the sources at
> > https://github.com/sematext/HBaseWD(there's
> > > > > also
> > > > > a jar of current version for convenience). It is very easy to make
> > use
> > > of
> > > > > it: e.g. I added it to one existing project with 1+2 lines of code
> > (one
> > > > > where I write to HBase and 2 for configuring MapReduce job).
> > > > >
> > > > > Any feedback is highly appreciated!
> > > > >
> > > > > Please find below the short intro to the lib [1].
> > > > >
> > > > > Alex Baranau
> > > > > ----
> > > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > > > HBase
> > > > >
> > > > > [1]
> > > > >
> > > > > Description:
> > > > > ------------
> > > > > HBaseWD stands for Distributing (sequential) Writes. It was
> inspired
> > by
> > > > > discussions on HBase mailing lists around the problem of choosing
> > > > between:
> > > > > * writing records with sequential row keys (e.g. time-series data
> > with
> > > > row
> > > > > key
> > > > >  built based on ts)
> > > > > * using random unique IDs for records
> > > > >
> > > > > First approach makes possible to perform fast range scans with help
> > of
> > > > > setting
> > > > > start/stop keys on Scanner, but creates single region server
> > > hot-spotting
> > > > > problem upon writing data (as row keys go in sequence all records
> end
> > > up
> > > > > written into a single region at a time).
> > > > >
> > > > > Second approach aims for fastest writing performance by
> distributing
> > > new
> > > > > records over random regions but makes not possible doing fast range
> > > scans
> > > > > against written data.
> > > > >
> > > > > The suggested approach stays in the middle of the two above and
> > proved
> > > to
> > > > > perform well by distributing records over the cluster during
> writing
> > > data
> > > > > while allowing range scans over it. HBaseWD provides very simple
> API
> > to
> > > > > work with which makes it perfect to use with existing code.
> > > > >
> > > > > Please refer to unit-tests for lib usage info as they aimed to act
> as
> > > > > example.
> > > > >
> > > > > Brief Usage Info (Examples):
> > > > > ----------------------------
> > > > >
> > > > > Distributing records with sequential keys which are being written
> in
> > up
> > > > to
> > > > > Byte.MAX_VALUE buckets:
> > > > >
> > > > >    byte bucketsCount = (byte) 32; // distributing into 32 buckets
> > > > >    RowKeyDistributor keyDistributor =
> > > > >                           new
> > > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > > >    for (int i = 0; i < 100; i++) {
> > > > >      Put put = new
> > Put(keyDistributor.getDistributedKey(originalKey));
> > > > >      ... // add values
> > > > >      hTable.put(put);
> > > > >    }
> > > > >
> > > > >
> > > > > Performing a range scan over written data (internally
> <bucketsCount>
> > > > > scanners
> > > > > executed):
> > > > >
> > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > > keyDistributor);
> > > > >    for (Result current : rs) {
> > > > >      ...
> > > > >    }
> > > > >
> > > > > Performing mapreduce job over written data chunk specified by Scan:
> > > > >
> > > > >    Configuration conf = HBaseConfiguration.create();
> > > > >    Job job = new Job(conf, "testMapreduceJob");
> > > > >
> > > > >    Scan scan = new Scan(startKey, stopKey);
> > > > >
> > > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > > Result.class,
> > > > > job);
> > > > >
> > > > >    // Substituting standard TableInputFormat which was set in
> > > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > > >    keyDistributor.addInfo(job.getConfiguration());
> > > > >
> > > > >
> > > > > Extending Row Keys Distributing Patterns:
> > > > > -----------------------------------------
> > > > >
> > > > > HBaseWD is designed to be flexible and to support custom row key
> > > > > distribution
> > > > > approaches. To define custom row key distributing logic just
> > implement
> > > > > AbstractRowKeyDistributor abstract class which is really very
> simple:
> > > > >
> > > > >    public abstract class AbstractRowKeyDistributor implements
> > > > > Parametrizable {
> > > > >      public abstract byte[] getDistributedKey(byte[] originalKey);
> > > > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > > originalKey);
> > > > >      ... // some utility methods
> > > > >    }
> > > > >
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

> Basically bucketsCount may not equal number of regions for the underlying
> table.

True: e.g. when there's only one region that holds data for the whole table
(not many records in table yet), distributed scan will fire N scans against
the same region.
On the other hand, in case there are huge number of regions for single
table, each scan can span over multiple regions.

> I need to deal with normal scan and "distributed scan" at server side.

With current implementation "distributed" scan won't be recognized as
something special on the server side. It will be an ordinary scan. Though
the number of scan will increase, given that the typical situation is "many
regions for single table", the scans of the same "distributed scan" are
likely not to hit the same region.

Not sure if I answered your questions here. Feel free to ask more ;)

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Apr 20, 2011 at 2:10 PM, Ted Yu <yu...@gmail.com> wrote:

> Alex:
> If you read this, you would know why I asked:
> https://issues.apache.org/jira/browse/HBASE-3679
>
> I need to deal with normal scan and "distributed scan" at server side.
> Basically bucketsCount may not equal number of regions for the underlying
> table.
>
> Cheers
>
> On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > Hi Ted,
> >
> > We currently use this tool in the scenario where data is consumed by
> > MapReduce jobs, so we haven't tested the performance of pure "distributed
> > scan" (i.e. N scans instead of 1) a lot. I expect it to be close to
> simple
> > scan performance, or may be sometimes even faster depending on your data
> > access patterns. E.g. in case you write timeseries data (sequential)
> which
> > is written into the single region at a time, then e.g. if you access
> delta
> > for further processing/analysis (esp. if from not single client) these
> > scans
> > are likely to hit the same region or couple of regions at a time, which
> may
> > perform worse comparing to many scans hitting data that is much better
> > spread over region servers.
> >
> > As for map-reduce job the approach should not affect reading performance
> at
> > all: it's just that there are bucketsCount times more splits and hence
> > bucketsCount times more Map tasks. In many cases this even improves
> overall
> > performance of the MR job since work is better distributed over cluster
> > (esp. in situation when the aim is to constantly process the coming delta
> > which usually resides in one or just couple of regions depending on
> > processing frequency).
> >
> > If you can share details on your case, that will help to understand what
> > effect(s) to expect from using this approach.
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Interesting project, Alex.
> > > Since there're bucketsCount scanners compared to one scanner
> originally,
> > > have you performed load testing to see the impact ?
> > >
> > > Thanks
> > >
> > > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <
> alex.baranov.v@gmail.com
> > > >wrote:
> > >
> > > > Hello guys,
> > > >
> > > > I'd like to introduce a new small java project/lib around HBase:
> > HBaseWD.
> > > > It
> > > > is aimed to help with distribution of the load (across regionservers)
> > > when
> > > > writing sequential (becasue of the row key nature) records. It
> > implements
> > > > the solution which was discussed several times on this mailing list
> > (e.g.
> > > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > > >
> > > > Please find the sources at
> https://github.com/sematext/HBaseWD(there's
> > > > also
> > > > a jar of current version for convenience). It is very easy to make
> use
> > of
> > > > it: e.g. I added it to one existing project with 1+2 lines of code
> (one
> > > > where I write to HBase and 2 for configuring MapReduce job).
> > > >
> > > > Any feedback is highly appreciated!
> > > >
> > > > Please find below the short intro to the lib [1].
> > > >
> > > > Alex Baranau
> > > > ----
> > > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > > HBase
> > > >
> > > > [1]
> > > >
> > > > Description:
> > > > ------------
> > > > HBaseWD stands for Distributing (sequential) Writes. It was inspired
> by
> > > > discussions on HBase mailing lists around the problem of choosing
> > > between:
> > > > * writing records with sequential row keys (e.g. time-series data
> with
> > > row
> > > > key
> > > >  built based on ts)
> > > > * using random unique IDs for records
> > > >
> > > > First approach makes possible to perform fast range scans with help
> of
> > > > setting
> > > > start/stop keys on Scanner, but creates single region server
> > hot-spotting
> > > > problem upon writing data (as row keys go in sequence all records end
> > up
> > > > written into a single region at a time).
> > > >
> > > > Second approach aims for fastest writing performance by distributing
> > new
> > > > records over random regions but makes not possible doing fast range
> > scans
> > > > against written data.
> > > >
> > > > The suggested approach stays in the middle of the two above and
> proved
> > to
> > > > perform well by distributing records over the cluster during writing
> > data
> > > > while allowing range scans over it. HBaseWD provides very simple API
> to
> > > > work with which makes it perfect to use with existing code.
> > > >
> > > > Please refer to unit-tests for lib usage info as they aimed to act as
> > > > example.
> > > >
> > > > Brief Usage Info (Examples):
> > > > ----------------------------
> > > >
> > > > Distributing records with sequential keys which are being written in
> up
> > > to
> > > > Byte.MAX_VALUE buckets:
> > > >
> > > >    byte bucketsCount = (byte) 32; // distributing into 32 buckets
> > > >    RowKeyDistributor keyDistributor =
> > > >                           new
> > > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > > >    for (int i = 0; i < 100; i++) {
> > > >      Put put = new
> Put(keyDistributor.getDistributedKey(originalKey));
> > > >      ... // add values
> > > >      hTable.put(put);
> > > >    }
> > > >
> > > >
> > > > Performing a range scan over written data (internally <bucketsCount>
> > > > scanners
> > > > executed):
> > > >
> > > >    Scan scan = new Scan(startKey, stopKey);
> > > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > > keyDistributor);
> > > >    for (Result current : rs) {
> > > >      ...
> > > >    }
> > > >
> > > > Performing mapreduce job over written data chunk specified by Scan:
> > > >
> > > >    Configuration conf = HBaseConfiguration.create();
> > > >    Job job = new Job(conf, "testMapreduceJob");
> > > >
> > > >    Scan scan = new Scan(startKey, stopKey);
> > > >
> > > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> > Result.class,
> > > > job);
> > > >
> > > >    // Substituting standard TableInputFormat which was set in
> > > >    // TableMapReduceUtil.initTableMapperJob(...)
> > > >    job.setInputFormatClass(WdTableInputFormat.class);
> > > >    keyDistributor.addInfo(job.getConfiguration());
> > > >
> > > >
> > > > Extending Row Keys Distributing Patterns:
> > > > -----------------------------------------
> > > >
> > > > HBaseWD is designed to be flexible and to support custom row key
> > > > distribution
> > > > approaches. To define custom row key distributing logic just
> implement
> > > > AbstractRowKeyDistributor abstract class which is really very simple:
> > > >
> > > >    public abstract class AbstractRowKeyDistributor implements
> > > > Parametrizable {
> > > >      public abstract byte[] getDistributedKey(byte[] originalKey);
> > > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > > >      public abstract byte[][] getAllDistributedKeys(byte[]
> > originalKey);
> > > >      ... // some utility methods
> > > >    }
> > > >
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

Alex:
If you read this, you would know why I asked:
https://issues.apache.org/jira/browse/HBASE-3679

I need to deal with normal scan and "distributed scan" at server side.
Basically bucketsCount may not equal number of regions for the underlying
table.

Cheers

On Tue, Apr 19, 2011 at 11:11 PM, Alex Baranau <al...@gmail.com>wrote:

> Hi Ted,
>
> We currently use this tool in the scenario where data is consumed by
> MapReduce jobs, so we haven't tested the performance of pure "distributed
> scan" (i.e. N scans instead of 1) a lot. I expect it to be close to simple
> scan performance, or may be sometimes even faster depending on your data
> access patterns. E.g. in case you write timeseries data (sequential) which
> is written into the single region at a time, then e.g. if you access delta
> for further processing/analysis (esp. if from not single client) these
> scans
> are likely to hit the same region or couple of regions at a time, which may
> perform worse comparing to many scans hitting data that is much better
> spread over region servers.
>
> As for map-reduce job the approach should not affect reading performance at
> all: it's just that there are bucketsCount times more splits and hence
> bucketsCount times more Map tasks. In many cases this even improves overall
> performance of the MR job since work is better distributed over cluster
> (esp. in situation when the aim is to constantly process the coming delta
> which usually resides in one or just couple of regions depending on
> processing frequency).
>
> If you can share details on your case, that will help to understand what
> effect(s) to expect from using this approach.
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Interesting project, Alex.
> > Since there're bucketsCount scanners compared to one scanner originally,
> > have you performed load testing to see the impact ?
> >
> > Thanks
> >
> > On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <alex.baranov.v@gmail.com
> > >wrote:
> >
> > > Hello guys,
> > >
> > > I'd like to introduce a new small java project/lib around HBase:
> HBaseWD.
> > > It
> > > is aimed to help with distribution of the load (across regionservers)
> > when
> > > writing sequential (becasue of the row key nature) records. It
> implements
> > > the solution which was discussed several times on this mailing list
> (e.g.
> > > here: http://search-hadoop.com/m/gNRA82No5Wk).
> > >
> > > Please find the sources at https://github.com/sematext/HBaseWD(there's
> > > also
> > > a jar of current version for convenience). It is very easy to make use
> of
> > > it: e.g. I added it to one existing project with 1+2 lines of code (one
> > > where I write to HBase and 2 for configuring MapReduce job).
> > >
> > > Any feedback is highly appreciated!
> > >
> > > Please find below the short intro to the lib [1].
> > >
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > [1]
> > >
> > > Description:
> > > ------------
> > > HBaseWD stands for Distributing (sequential) Writes. It was inspired by
> > > discussions on HBase mailing lists around the problem of choosing
> > between:
> > > * writing records with sequential row keys (e.g. time-series data with
> > row
> > > key
> > >  built based on ts)
> > > * using random unique IDs for records
> > >
> > > First approach makes possible to perform fast range scans with help of
> > > setting
> > > start/stop keys on Scanner, but creates single region server
> hot-spotting
> > > problem upon writing data (as row keys go in sequence all records end
> up
> > > written into a single region at a time).
> > >
> > > Second approach aims for fastest writing performance by distributing
> new
> > > records over random regions but makes not possible doing fast range
> scans
> > > against written data.
> > >
> > > The suggested approach stays in the middle of the two above and proved
> to
> > > perform well by distributing records over the cluster during writing
> data
> > > while allowing range scans over it. HBaseWD provides very simple API to
> > > work with which makes it perfect to use with existing code.
> > >
> > > Please refer to unit-tests for lib usage info as they aimed to act as
> > > example.
> > >
> > > Brief Usage Info (Examples):
> > > ----------------------------
> > >
> > > Distributing records with sequential keys which are being written in up
> > to
> > > Byte.MAX_VALUE buckets:
> > >
> > >    byte bucketsCount = (byte) 32; // distributing into 32 buckets
> > >    RowKeyDistributor keyDistributor =
> > >                           new
> > > RowKeyDistributorByOneBytePrefix(bucketsCount);
> > >    for (int i = 0; i < 100; i++) {
> > >      Put put = new Put(keyDistributor.getDistributedKey(originalKey));
> > >      ... // add values
> > >      hTable.put(put);
> > >    }
> > >
> > >
> > > Performing a range scan over written data (internally <bucketsCount>
> > > scanners
> > > executed):
> > >
> > >    Scan scan = new Scan(startKey, stopKey);
> > >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > > keyDistributor);
> > >    for (Result current : rs) {
> > >      ...
> > >    }
> > >
> > > Performing mapreduce job over written data chunk specified by Scan:
> > >
> > >    Configuration conf = HBaseConfiguration.create();
> > >    Job job = new Job(conf, "testMapreduceJob");
> > >
> > >    Scan scan = new Scan(startKey, stopKey);
> > >
> > >    TableMapReduceUtil.initTableMapperJob("table", scan,
> > >      RowCounterMapper.class, ImmutableBytesWritable.class,
> Result.class,
> > > job);
> > >
> > >    // Substituting standard TableInputFormat which was set in
> > >    // TableMapReduceUtil.initTableMapperJob(...)
> > >    job.setInputFormatClass(WdTableInputFormat.class);
> > >    keyDistributor.addInfo(job.getConfiguration());
> > >
> > >
> > > Extending Row Keys Distributing Patterns:
> > > -----------------------------------------
> > >
> > > HBaseWD is designed to be flexible and to support custom row key
> > > distribution
> > > approaches. To define custom row key distributing logic just implement
> > > AbstractRowKeyDistributor abstract class which is really very simple:
> > >
> > >    public abstract class AbstractRowKeyDistributor implements
> > > Parametrizable {
> > >      public abstract byte[] getDistributedKey(byte[] originalKey);
> > >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> > >      public abstract byte[][] getAllDistributedKeys(byte[]
> originalKey);
> > >      ... // some utility methods
> > >    }
> > >
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Alex Baranau <al...@gmail.com>.

Hi Ted,

We currently use this tool in the scenario where data is consumed by
MapReduce jobs, so we haven't tested the performance of pure "distributed
scan" (i.e. N scans instead of 1) a lot. I expect it to be close to simple
scan performance, or may be sometimes even faster depending on your data
access patterns. E.g. in case you write timeseries data (sequential) which
is written into the single region at a time, then e.g. if you access delta
for further processing/analysis (esp. if from not single client) these scans
are likely to hit the same region or couple of regions at a time, which may
perform worse comparing to many scans hitting data that is much better
spread over region servers.

As for map-reduce job the approach should not affect reading performance at
all: it's just that there are bucketsCount times more splits and hence
bucketsCount times more Map tasks. In many cases this even improves overall
performance of the MR job since work is better distributed over cluster
(esp. in situation when the aim is to constantly process the coming delta
which usually resides in one or just couple of regions depending on
processing frequency).

If you can share details on your case, that will help to understand what
effect(s) to expect from using this approach.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu <yu...@gmail.com> wrote:

> Interesting project, Alex.
> Since there're bucketsCount scanners compared to one scanner originally,
> have you performed load testing to see the impact ?
>
> Thanks
>
> On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > Hello guys,
> >
> > I'd like to introduce a new small java project/lib around HBase: HBaseWD.
> > It
> > is aimed to help with distribution of the load (across regionservers)
> when
> > writing sequential (becasue of the row key nature) records. It implements
> > the solution which was discussed several times on this mailing list (e.g.
> > here: http://search-hadoop.com/m/gNRA82No5Wk).
> >
> > Please find the sources at https://github.com/sematext/HBaseWD (there's
> > also
> > a jar of current version for convenience). It is very easy to make use of
> > it: e.g. I added it to one existing project with 1+2 lines of code (one
> > where I write to HBase and 2 for configuring MapReduce job).
> >
> > Any feedback is highly appreciated!
> >
> > Please find below the short intro to the lib [1].
> >
> > Alex Baranau
> > ----
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> HBase
> >
> > [1]
> >
> > Description:
> > ------------
> > HBaseWD stands for Distributing (sequential) Writes. It was inspired by
> > discussions on HBase mailing lists around the problem of choosing
> between:
> > * writing records with sequential row keys (e.g. time-series data with
> row
> > key
> >  built based on ts)
> > * using random unique IDs for records
> >
> > First approach makes possible to perform fast range scans with help of
> > setting
> > start/stop keys on Scanner, but creates single region server hot-spotting
> > problem upon writing data (as row keys go in sequence all records end up
> > written into a single region at a time).
> >
> > Second approach aims for fastest writing performance by distributing new
> > records over random regions but makes not possible doing fast range scans
> > against written data.
> >
> > The suggested approach stays in the middle of the two above and proved to
> > perform well by distributing records over the cluster during writing data
> > while allowing range scans over it. HBaseWD provides very simple API to
> > work with which makes it perfect to use with existing code.
> >
> > Please refer to unit-tests for lib usage info as they aimed to act as
> > example.
> >
> > Brief Usage Info (Examples):
> > ----------------------------
> >
> > Distributing records with sequential keys which are being written in up
> to
> > Byte.MAX_VALUE buckets:
> >
> >    byte bucketsCount = (byte) 32; // distributing into 32 buckets
> >    RowKeyDistributor keyDistributor =
> >                           new
> > RowKeyDistributorByOneBytePrefix(bucketsCount);
> >    for (int i = 0; i < 100; i++) {
> >      Put put = new Put(keyDistributor.getDistributedKey(originalKey));
> >      ... // add values
> >      hTable.put(put);
> >    }
> >
> >
> > Performing a range scan over written data (internally <bucketsCount>
> > scanners
> > executed):
> >
> >    Scan scan = new Scan(startKey, stopKey);
> >    ResultScanner rs = DistributedScanner.create(hTable, scan,
> > keyDistributor);
> >    for (Result current : rs) {
> >      ...
> >    }
> >
> > Performing mapreduce job over written data chunk specified by Scan:
> >
> >    Configuration conf = HBaseConfiguration.create();
> >    Job job = new Job(conf, "testMapreduceJob");
> >
> >    Scan scan = new Scan(startKey, stopKey);
> >
> >    TableMapReduceUtil.initTableMapperJob("table", scan,
> >      RowCounterMapper.class, ImmutableBytesWritable.class, Result.class,
> > job);
> >
> >    // Substituting standard TableInputFormat which was set in
> >    // TableMapReduceUtil.initTableMapperJob(...)
> >    job.setInputFormatClass(WdTableInputFormat.class);
> >    keyDistributor.addInfo(job.getConfiguration());
> >
> >
> > Extending Row Keys Distributing Patterns:
> > -----------------------------------------
> >
> > HBaseWD is designed to be flexible and to support custom row key
> > distribution
> > approaches. To define custom row key distributing logic just implement
> > AbstractRowKeyDistributor abstract class which is really very simple:
> >
> >    public abstract class AbstractRowKeyDistributor implements
> > Parametrizable {
> >      public abstract byte[] getDistributedKey(byte[] originalKey);
> >      public abstract byte[] getOriginalKey(byte[] adjustedKey);
> >      public abstract byte[][] getAllDistributedKeys(byte[] originalKey);
> >      ... // some utility methods
> >    }
> >
>

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

Posted by Ted Yu <yu...@gmail.com>.

Interesting project, Alex.
Since there're bucketsCount scanners compared to one scanner originally,
have you performed load testing to see the impact ?

Thanks

On Tue, Apr 19, 2011 at 10:25 AM, Alex Baranau <al...@gmail.com>wrote:

> Hello guys,
>
> I'd like to introduce a new small java project/lib around HBase: HBaseWD.
> It
> is aimed to help with distribution of the load (across regionservers) when
> writing sequential (becasue of the row key nature) records. It implements
> the solution which was discussed several times on this mailing list (e.g.
> here: http://search-hadoop.com/m/gNRA82No5Wk).
>
> Please find the sources at https://github.com/sematext/HBaseWD (there's
> also
> a jar of current version for convenience). It is very easy to make use of
> it: e.g. I added it to one existing project with 1+2 lines of code (one
> where I write to HBase and 2 for configuring MapReduce job).
>
> Any feedback is highly appreciated!
>
> Please find below the short intro to the lib [1].
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> [1]
>
> Description:
> ------------
> HBaseWD stands for Distributing (sequential) Writes. It was inspired by
> discussions on HBase mailing lists around the problem of choosing between:
> * writing records with sequential row keys (e.g. time-series data with row
> key
>  built based on ts)
> * using random unique IDs for records
>
> First approach makes possible to perform fast range scans with help of
> setting
> start/stop keys on Scanner, but creates single region server hot-spotting
> problem upon writing data (as row keys go in sequence all records end up
> written into a single region at a time).
>
> Second approach aims for fastest writing performance by distributing new
> records over random regions but makes not possible doing fast range scans
> against written data.
>
> The suggested approach stays in the middle of the two above and proved to
> perform well by distributing records over the cluster during writing data
> while allowing range scans over it. HBaseWD provides very simple API to
> work with which makes it perfect to use with existing code.
>
> Please refer to unit-tests for lib usage info as they aimed to act as
> example.
>
> Brief Usage Info (Examples):
> ----------------------------
>
> Distributing records with sequential keys which are being written in up to
> Byte.MAX_VALUE buckets:
>
>    byte bucketsCount = (byte) 32; // distributing into 32 buckets
>    RowKeyDistributor keyDistributor =
>                           new
> RowKeyDistributorByOneBytePrefix(bucketsCount);
>    for (int i = 0; i < 100; i++) {
>      Put put = new Put(keyDistributor.getDistributedKey(originalKey));
>      ... // add values
>      hTable.put(put);
>    }
>
>
> Performing a range scan over written data (internally <bucketsCount>
> scanners
> executed):
>
>    Scan scan = new Scan(startKey, stopKey);
>    ResultScanner rs = DistributedScanner.create(hTable, scan,
> keyDistributor);
>    for (Result current : rs) {
>      ...
>    }
>
> Performing mapreduce job over written data chunk specified by Scan:
>
>    Configuration conf = HBaseConfiguration.create();
>    Job job = new Job(conf, "testMapreduceJob");
>
>    Scan scan = new Scan(startKey, stopKey);
>
>    TableMapReduceUtil.initTableMapperJob("table", scan,
>      RowCounterMapper.class, ImmutableBytesWritable.class, Result.class,
> job);
>
>    // Substituting standard TableInputFormat which was set in
>    // TableMapReduceUtil.initTableMapperJob(...)
>    job.setInputFormatClass(WdTableInputFormat.class);
>    keyDistributor.addInfo(job.getConfiguration());
>
>
> Extending Row Keys Distributing Patterns:
> -----------------------------------------
>
> HBaseWD is designed to be flexible and to support custom row key
> distribution
> approaches. To define custom row key distributing logic just implement
> AbstractRowKeyDistributor abstract class which is really very simple:
>
>    public abstract class AbstractRowKeyDistributor implements
> Parametrizable {
>      public abstract byte[] getDistributedKey(byte[] originalKey);
>      public abstract byte[] getOriginalKey(byte[] adjustedKey);
>      public abstract byte[][] getAllDistributedKeys(byte[] originalKey);
>      ... // some utility methods
>    }
>