You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Kristoffer Sjögren <st...@gmail.com> on 2015/12/10 16:49:52 UTC

Copy table between hbase clusters

Hi

We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
way to copy data from old-to-new table.

The tables contain only a few hundred million rows so it's OK to
export locally and then upsert.

Cheers,
-Kristoffer

Re: Copy table between hbase clusters

Posted by Kristoffer Sjögren <st...@gmail.com>.
Great!

I realized that my confusion was that I forgot about the salt byte at
the first position in the key.

Here's the working code if someone needs to do the same thing.

Thanks again James.

    HBaseClient client = new HBaseClient("zookeeper");
    Scanner scanner = client.newScanner("T1");
    ArrayList<ArrayList<KeyValue>> rows;
    while (null != (rows = scanner.nextRows(100).join())) {
      for (ArrayList<KeyValue> list : rows) {
        for (KeyValue kv : list) {
          byte[] key = kv.key();
          // skip salt byte
          int offset = 1;
          // flip integer sign bit
          int c1 =  (Bytes.getInt(key, offset) ^ (1 << 31));
          offset += 4;
          int c2 = (Bytes.getInt(key, offset) ^ (1 << 31));
          offset += 4;
          // flip long sign bit
          long c3 = (Bytes.getInt(key, offset) ^ (1 << 31));  // 16
          offset += 4;
          long c4 = (Bytes.getLong(key, offset) ^ (1L << 63));  // 32
          offset += 8;
          System.out.println("c1=" + c1 + ",c2=" + c2 + ",c3=" + c3 +
",c4=" + c4 +  " " + new String(key, offset, 2));
        }
      }
    }

On Sat, Dec 12, 2015 at 7:02 AM, James Taylor <ja...@apache.org> wrote:
> My mistake - I thought you were going to do an in-place upgrade. No, no post
> processing will be required if the data is UPSERT'ed through the standard
> Phoenix APIs.
>
> I don't know how/why your row keys aren't the correct number of bytes. Maybe
> your row key is different than what you think?
>
> Thanks,
> James
>
> On Fri, Dec 11, 2015 at 1:36 PM, Kristoffer Sjögren <st...@gmail.com>
> wrote:
>>
>> My concern regarding the row key is that they are 23 bytes in reality
>> instead of 26 bytes? Hmm. Guess I could write a value manually using
>> SQL and inspect it afterwards using asynchbase to find where those 3
>> missing bytes went.
>>
>> Not sure I understand the post procedure. I'm not trying to do an
>> in-place upgrade. The table will be created from scratch in Phoenix
>> 4.4.0 / HBase 1.1.2. The data will be read from the old installation
>> using asynchbase and manually UPSERT'ed from that raw data using SQL.
>>
>> The reason I do it this way is to avoid such manual post procedures.
>> So just to be clear, will this approach still require those post
>> procedures?
>>
>>
>> Thanks James.
>>
>> On Fri, Dec 11, 2015 at 5:28 PM, James Taylor <ja...@apache.org>
>> wrote:
>> > Your analysis of the row key structure is correct. Those are all fixed
>> > types
>> > (4 + 4 + 8 +8 + 2 = 26 bytes for the key).
>> >
>> > If you're going from 0.94 to 0.98, there's stuff you need to do to get
>> > your
>> > data into the new format. Best to ask about this on the HBase user list
>> > or
>> > look it up in the reference docs.
>> >
>> > Once you get your data moved over, I'd recommend reissuing your DDL,
>> > specifying a CURRENT_SCN at connection time of a timestamp in millis
>> > prior
>> > to any of your cell timestamps (so that Phoenix doesn't set empty key
>> > value
>> > markers for every row). If your using any date/time types then your DDL
>> > should specify them as their UNSIGNED equivalent. If you have DESC row
>> > key
>> > declarations for variable length types or BINARY, there's some post
>> > processing steps you'll need to do, but otherwise you should be good to
>> > go
>> > (FWIW, we had an internal team successfully go through this about 6mo
>> > back).
>> >
>> > Thanks,
>> > James
>> >
>> > On Friday, December 11, 2015, Kristoffer Sjögren <st...@gmail.com>
>> > wrote:
>> >>
>> >> My plan is to try use asynchbase to read the raw data and then upsert
>> >> it using Phoenix SQL.
>> >>
>> >> However, when I read the old table the data types for the row key
>> >> doesn't add up.
>> >>
>> >> CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
>> >> NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
>> >> PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))
>> >>
>> >> That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
>> >> that I read from HBase is only 23 bytes.
>> >>
>> >> [0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
>> >> 0, 0, 0, 32, 32]
>> >>
>> >> Maybe the data type definitions as described on the phoenix site have
>> >> changed since version 2.2.3? Or some data type may be variable in
>> >> size?
>> >>
>> >>
>> >> On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <st...@gmail.com>
>> >> wrote:
>> >> > Hi
>> >> >
>> >> > We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
>> >> > Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
>> >> > way to copy data from old-to-new table.
>> >> >
>> >> > The tables contain only a few hundred million rows so it's OK to
>> >> > export locally and then upsert.
>> >> >
>> >> > Cheers,
>> >> > -Kristoffer
>
>

Re: Copy table between hbase clusters

Posted by James Taylor <ja...@apache.org>.
My mistake - I thought you were going to do an in-place upgrade. No, no
post processing will be required if the data is UPSERT'ed through the
standard Phoenix APIs.

I don't know how/why your row keys aren't the correct number of bytes.
Maybe your row key is different than what you think?

Thanks,
James

On Fri, Dec 11, 2015 at 1:36 PM, Kristoffer Sjögren <st...@gmail.com>
wrote:

> My concern regarding the row key is that they are 23 bytes in reality
> instead of 26 bytes? Hmm. Guess I could write a value manually using
> SQL and inspect it afterwards using asynchbase to find where those 3
> missing bytes went.
>
> Not sure I understand the post procedure. I'm not trying to do an
> in-place upgrade. The table will be created from scratch in Phoenix
> 4.4.0 / HBase 1.1.2. The data will be read from the old installation
> using asynchbase and manually UPSERT'ed from that raw data using SQL.
>
> The reason I do it this way is to avoid such manual post procedures.
> So just to be clear, will this approach still require those post
> procedures?
>
>
> Thanks James.
>
> On Fri, Dec 11, 2015 at 5:28 PM, James Taylor <ja...@apache.org>
> wrote:
> > Your analysis of the row key structure is correct. Those are all fixed
> types
> > (4 + 4 + 8 +8 + 2 = 26 bytes for the key).
> >
> > If you're going from 0.94 to 0.98, there's stuff you need to do to get
> your
> > data into the new format. Best to ask about this on the HBase user list
> or
> > look it up in the reference docs.
> >
> > Once you get your data moved over, I'd recommend reissuing your DDL,
> > specifying a CURRENT_SCN at connection time of a timestamp in millis
> prior
> > to any of your cell timestamps (so that Phoenix doesn't set empty key
> value
> > markers for every row). If your using any date/time types then your DDL
> > should specify them as their UNSIGNED equivalent. If you have DESC row
> key
> > declarations for variable length types or BINARY, there's some post
> > processing steps you'll need to do, but otherwise you should be good to
> go
> > (FWIW, we had an internal team successfully go through this about 6mo
> back).
> >
> > Thanks,
> > James
> >
> > On Friday, December 11, 2015, Kristoffer Sjögren <st...@gmail.com>
> wrote:
> >>
> >> My plan is to try use asynchbase to read the raw data and then upsert
> >> it using Phoenix SQL.
> >>
> >> However, when I read the old table the data types for the row key
> >> doesn't add up.
> >>
> >> CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
> >> NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
> >> PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))
> >>
> >> That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
> >> that I read from HBase is only 23 bytes.
> >>
> >> [0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
> >> 0, 0, 0, 32, 32]
> >>
> >> Maybe the data type definitions as described on the phoenix site have
> >> changed since version 2.2.3? Or some data type may be variable in
> >> size?
> >>
> >>
> >> On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <st...@gmail.com>
> >> wrote:
> >> > Hi
> >> >
> >> > We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
> >> > Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
> >> > way to copy data from old-to-new table.
> >> >
> >> > The tables contain only a few hundred million rows so it's OK to
> >> > export locally and then upsert.
> >> >
> >> > Cheers,
> >> > -Kristoffer
>

Re: Copy table between hbase clusters

Posted by Kristoffer Sjögren <st...@gmail.com>.
My concern regarding the row key is that they are 23 bytes in reality
instead of 26 bytes? Hmm. Guess I could write a value manually using
SQL and inspect it afterwards using asynchbase to find where those 3
missing bytes went.

Not sure I understand the post procedure. I'm not trying to do an
in-place upgrade. The table will be created from scratch in Phoenix
4.4.0 / HBase 1.1.2. The data will be read from the old installation
using asynchbase and manually UPSERT'ed from that raw data using SQL.

The reason I do it this way is to avoid such manual post procedures.
So just to be clear, will this approach still require those post
procedures?


Thanks James.

On Fri, Dec 11, 2015 at 5:28 PM, James Taylor <ja...@apache.org> wrote:
> Your analysis of the row key structure is correct. Those are all fixed types
> (4 + 4 + 8 +8 + 2 = 26 bytes for the key).
>
> If you're going from 0.94 to 0.98, there's stuff you need to do to get your
> data into the new format. Best to ask about this on the HBase user list or
> look it up in the reference docs.
>
> Once you get your data moved over, I'd recommend reissuing your DDL,
> specifying a CURRENT_SCN at connection time of a timestamp in millis prior
> to any of your cell timestamps (so that Phoenix doesn't set empty key value
> markers for every row). If your using any date/time types then your DDL
> should specify them as their UNSIGNED equivalent. If you have DESC row key
> declarations for variable length types or BINARY, there's some post
> processing steps you'll need to do, but otherwise you should be good to go
> (FWIW, we had an internal team successfully go through this about 6mo back).
>
> Thanks,
> James
>
> On Friday, December 11, 2015, Kristoffer Sjögren <st...@gmail.com> wrote:
>>
>> My plan is to try use asynchbase to read the raw data and then upsert
>> it using Phoenix SQL.
>>
>> However, when I read the old table the data types for the row key
>> doesn't add up.
>>
>> CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
>> NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
>> PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))
>>
>> That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
>> that I read from HBase is only 23 bytes.
>>
>> [0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
>> 0, 0, 0, 32, 32]
>>
>> Maybe the data type definitions as described on the phoenix site have
>> changed since version 2.2.3? Or some data type may be variable in
>> size?
>>
>>
>> On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <st...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
>> > Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
>> > way to copy data from old-to-new table.
>> >
>> > The tables contain only a few hundred million rows so it's OK to
>> > export locally and then upsert.
>> >
>> > Cheers,
>> > -Kristoffer

Re: Copy table between hbase clusters

Posted by James Taylor <ja...@apache.org>.
Your analysis of the row key structure is correct. Those are all fixed
types (4 + 4 + 8 +8 + 2 = 26 bytes for the key).

If you're going from 0.94 to 0.98, there's stuff you need to do to get your
data into the new format. Best to ask about this on the HBase user list or
look it up in the reference docs.

Once you get your data moved over, I'd recommend reissuing your DDL,
specifying a CURRENT_SCN at connection time of a timestamp in millis prior
to any of your cell timestamps (so that Phoenix doesn't set empty key value
markers for every row). If your using any date/time types then your DDL
should specify them as their UNSIGNED equivalent. If you have DESC row key
declarations for variable length types or BINARY, there's some post
processing steps you'll need to do, but otherwise you should be good to go
(FWIW, we had an internal team successfully go through this about 6mo back).

Thanks,
James

On Friday, December 11, 2015, Kristoffer Sjögren <st...@gmail.com> wrote:

> My plan is to try use asynchbase to read the raw data and then upsert
> it using Phoenix SQL.
>
> However, when I read the old table the data types for the row key
> doesn't add up.
>
> CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
> NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
> PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))
>
> That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
> that I read from HBase is only 23 bytes.
>
> [0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
> 0, 0, 0, 32, 32]
>
> Maybe the data type definitions as described on the phoenix site have
> changed since version 2.2.3? Or some data type may be variable in
> size?
>
>
> On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <stoffe@gmail.com
> <javascript:;>> wrote:
> > Hi
> >
> > We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
> > Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
> > way to copy data from old-to-new table.
> >
> > The tables contain only a few hundred million rows so it's OK to
> > export locally and then upsert.
> >
> > Cheers,
> > -Kristoffer
>

Re: Copy table between hbase clusters

Posted by Kristoffer Sjögren <st...@gmail.com>.
My plan is to try use asynchbase to read the raw data and then upsert
it using Phoenix SQL.

However, when I read the old table the data types for the row key
doesn't add up.

CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER NOT NULL, C3 BIGINT
NOT NULL, C4 BIGINT NOT NULL, C5 CHAR(2) NOT NULL, V BIGINT CONSTRAINT
PK PRIMARY KEY ( C1, C2, C3, C4, C5 ))

That's 4 + 4 + 8 +8 + 2 = 26 bytes for the key. But the actual key
that I read from HBase is only 23 bytes.

[0, -128, 0, 0, 0, -44, 4, 123, -32, -128, 0, 0, 10, -128, 0, 0, 0, 0,
0, 0, 0, 32, 32]

Maybe the data type definitions as described on the phoenix site have
changed since version 2.2.3? Or some data type may be variable in
size?


On Thu, Dec 10, 2015 at 4:49 PM, Kristoffer Sjögren <st...@gmail.com> wrote:
> Hi
>
> We're in the process of upgrading from Phoenix 2.2.3 / HBase 0.96 to
> Phoneix 4.4.0 / HBase 1.1.2 and wanted to know the simplest/easiest
> way to copy data from old-to-new table.
>
> The tables contain only a few hundred million rows so it's OK to
> export locally and then upsert.
>
> Cheers,
> -Kristoffer