You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by ankit beohar <an...@gmail.com> on 2016/08/24 07:40:56 UTC

Phoenix and HBase data type serialization issue

HI All,

I have table in HBase and putting data into it then create phoenix view
with date, bigint etc data types but when I query from phoenix its giving
me wrong values I tried unassigned data types also but not working below
are stack:-

==========Hbase===========
hbase(main):057:0> create 'CEHCK_DT','0'
0 row(s) in 2.2650 seconds

=> Hbase::Table - CEHCK_DT
hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
0 row(s) in 0.0080 seconds

hbase(main):059:0> scan 'CEHCK_DT'
ROW                                                          COLUMN+CELL
 row1
 column=0:dates, timestamp=1471930977145, value=2016-08-11
1 row(s) in 0.0100 seconds


=========Phoenix=============
0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar primary
key,"0"."dates" date,"0"."SALARY" bigint);
No rows affected (0.347 seconds)
0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
+-------------------------------+
|             dates             |
+-------------------------------+
| 177670840-04-13 05:44:22.317  |
| 177670840-04-13 05:44:22.317  |
| 177670840-04-13 05:44:22.317  |
+-------------------------------+



Best Regards,
ANKIT BEOHAR

Re: Phoenix and HBase data type serialization issue

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Ryan,

There isn't a published javadoc (at least not that I'm aware of), so your best bet is to pull down the source[1] from git and either build the javadoc (mvn javadoc:javadoc should do it), or just look directly at the source. 

- Gabriel

http://phoenix.apache.org/source.html

> On 24 Aug 2016, at 18:25, Ryan Templeton <rt...@hortonworks.com> wrote:
> 
> Gabriel, the PDataType subclass you mention, can I read more about this in a Javadoc somewhere?
> 
> Hbase provides the convenient static Bytes functions for performing the encoding and decoding. I’m guessing the PDataType is some kind of equivalent and that this is part of the Phoenix JDBC (fat) driver?
> 
> Thanks,
> Ryan
> 
> 
> 
> 
>> On 8/24/16, 3:01 AM, "Gabriel Reid" <ga...@gmail.com> wrote:
>> 
>> Hi Ankit,
>> 
>> All data stored in HBase is stored in the form of byte arrays. The
>> conversion from richer types (e.g. date) to byte arrays is one of the
>> (many) functionalities included in Phoenix.
>> 
>> When you add a date value in the form of a string to HBase directly
>> (bypassing Phoenix), you're simply saving the byte representation of
>> that string to HBase. Phoenix uses an encoded long value to store
>> dates in HBase, so when you try to read your date value from HBase via
>> Phoenix, it's simply interpreting the bytes as a long, which leads to
>> the unexpected date value that you're getting.
>> 
>> There are two options to do what you're doing: either (1) use Phoenix
>> for both reading and writing data, or (2) use the PDataType subclasses
>> (e.g. PDate, PLong, etc) to encode data before storing it to HBase.
>> 
>> - Gabriel
>> 
>>> On Wed, Aug 24, 2016 at 9:40 AM, ankit beohar <an...@gmail.com> wrote:
>>> HI All,
>>> 
>>> I have table in HBase and putting data into it then create phoenix view with
>>> date, bigint etc data types but when I query from phoenix its giving me
>>> wrong values I tried unassigned data types also but not working below are
>>> stack:-
>>> 
>>> ==========Hbase===========
>>> hbase(main):057:0> create 'CEHCK_DT','0'
>>> 0 row(s) in 2.2650 seconds
>>> 
>>> => Hbase::Table - CEHCK_DT
>>> hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
>>> 0 row(s) in 0.0080 seconds
>>> 
>>> hbase(main):059:0> scan 'CEHCK_DT'
>>> ROW                                                          COLUMN+CELL
>>> row1                                                        column=0:dates,
>>> timestamp=1471930977145, value=2016-08-11
>>> 1 row(s) in 0.0100 seconds
>>> 
>>> 
>>> =========Phoenix=============
>>> 0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar primary
>>> key,"0"."dates" date,"0"."SALARY" bigint);
>>> No rows affected (0.347 seconds)
>>> 0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
>>> +-------------------------------+
>>> |             dates             |
>>> +-------------------------------+
>>> | 177670840-04-13 05:44:22.317  |
>>> | 177670840-04-13 05:44:22.317  |
>>> | 177670840-04-13 05:44:22.317  |
>>> +-------------------------------+
>>> 
>>> 
>>> 
>>> Best Regards,
>>> ANKIT BEOHAR
>> 

Re: Phoenix and HBase data type serialization issue

Posted by Ryan Templeton <rt...@hortonworks.com>.
Gabriel, the PDataType subclass you mention, can I read more about this in a Javadoc somewhere?

Hbase provides the convenient static Bytes functions for performing the encoding and decoding. I’m guessing the PDataType is some kind of equivalent and that this is part of the Phoenix JDBC (fat) driver?

Thanks,
Ryan




On 8/24/16, 3:01 AM, "Gabriel Reid" <ga...@gmail.com> wrote:

>Hi Ankit,
>
>All data stored in HBase is stored in the form of byte arrays. The
>conversion from richer types (e.g. date) to byte arrays is one of the
>(many) functionalities included in Phoenix.
>
>When you add a date value in the form of a string to HBase directly
>(bypassing Phoenix), you're simply saving the byte representation of
>that string to HBase. Phoenix uses an encoded long value to store
>dates in HBase, so when you try to read your date value from HBase via
>Phoenix, it's simply interpreting the bytes as a long, which leads to
>the unexpected date value that you're getting.
>
>There are two options to do what you're doing: either (1) use Phoenix
>for both reading and writing data, or (2) use the PDataType subclasses
>(e.g. PDate, PLong, etc) to encode data before storing it to HBase.
>
>- Gabriel
>
>On Wed, Aug 24, 2016 at 9:40 AM, ankit beohar <an...@gmail.com> wrote:
>> HI All,
>>
>> I have table in HBase and putting data into it then create phoenix view with
>> date, bigint etc data types but when I query from phoenix its giving me
>> wrong values I tried unassigned data types also but not working below are
>> stack:-
>>
>> ==========Hbase===========
>> hbase(main):057:0> create 'CEHCK_DT','0'
>> 0 row(s) in 2.2650 seconds
>>
>> => Hbase::Table - CEHCK_DT
>> hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
>> 0 row(s) in 0.0080 seconds
>>
>> hbase(main):059:0> scan 'CEHCK_DT'
>> ROW                                                          COLUMN+CELL
>>  row1                                                        column=0:dates,
>> timestamp=1471930977145, value=2016-08-11
>> 1 row(s) in 0.0100 seconds
>>
>>
>> =========Phoenix=============
>> 0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar primary
>> key,"0"."dates" date,"0"."SALARY" bigint);
>> No rows affected (0.347 seconds)
>> 0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
>> +-------------------------------+
>> |             dates             |
>> +-------------------------------+
>> | 177670840-04-13 05:44:22.317  |
>> | 177670840-04-13 05:44:22.317  |
>> | 177670840-04-13 05:44:22.317  |
>> +-------------------------------+
>>
>>
>>
>> Best Regards,
>> ANKIT BEOHAR
>>
>

Re: Phoenix and HBase data type serialization issue

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Ankit,

I'm not sure what the options are for encoding data via Talend -- if
you can convert data to the correct byte representation expected by
Phoenix via another way in Talend then it might also work. There is
information on the binary representation of various types on
http://phoenix.apache.org/language/datatypes.html.

Another option would be to go via JDBC if Talend has support for
writing data via JDBC.

A last (bad) option might be to store everything as strings, but that
will probably result in a bloated HBase and poor performance.

- Gabriel

On Wed, Aug 24, 2016 at 10:27 AM, ankit beohar <an...@gmail.com> wrote:
> HI Gabriel,
>
> Thanks for your quick reply, but in my case I am using Talend ETL tool to
> ingest the data into HBase, so I can not use PDataType subclasses and in
> Talend Phoenix connector is not avaialble.
> Is there any other way?
>
> Best Regards,
> ANKIT BEOHAR
>
>
> On Wed, Aug 24, 2016 at 1:31 PM, Gabriel Reid <ga...@gmail.com>
> wrote:
>>
>> Hi Ankit,
>>
>> All data stored in HBase is stored in the form of byte arrays. The
>> conversion from richer types (e.g. date) to byte arrays is one of the
>> (many) functionalities included in Phoenix.
>>
>> When you add a date value in the form of a string to HBase directly
>> (bypassing Phoenix), you're simply saving the byte representation of
>> that string to HBase. Phoenix uses an encoded long value to store
>> dates in HBase, so when you try to read your date value from HBase via
>> Phoenix, it's simply interpreting the bytes as a long, which leads to
>> the unexpected date value that you're getting.
>>
>> There are two options to do what you're doing: either (1) use Phoenix
>> for both reading and writing data, or (2) use the PDataType subclasses
>> (e.g. PDate, PLong, etc) to encode data before storing it to HBase.
>>
>> - Gabriel
>>
>> On Wed, Aug 24, 2016 at 9:40 AM, ankit beohar <an...@gmail.com>
>> wrote:
>> > HI All,
>> >
>> > I have table in HBase and putting data into it then create phoenix view
>> > with
>> > date, bigint etc data types but when I query from phoenix its giving me
>> > wrong values I tried unassigned data types also but not working below
>> > are
>> > stack:-
>> >
>> > ==========Hbase===========
>> > hbase(main):057:0> create 'CEHCK_DT','0'
>> > 0 row(s) in 2.2650 seconds
>> >
>> > => Hbase::Table - CEHCK_DT
>> > hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
>> > 0 row(s) in 0.0080 seconds
>> >
>> > hbase(main):059:0> scan 'CEHCK_DT'
>> > ROW                                                          COLUMN+CELL
>> >  row1
>> > column=0:dates,
>> > timestamp=1471930977145, value=2016-08-11
>> > 1 row(s) in 0.0100 seconds
>> >
>> >
>> > =========Phoenix=============
>> > 0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar
>> > primary
>> > key,"0"."dates" date,"0"."SALARY" bigint);
>> > No rows affected (0.347 seconds)
>> > 0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
>> > +-------------------------------+
>> > |             dates             |
>> > +-------------------------------+
>> > | 177670840-04-13 05:44:22.317  |
>> > | 177670840-04-13 05:44:22.317  |
>> > | 177670840-04-13 05:44:22.317  |
>> > +-------------------------------+
>> >
>> >
>> >
>> > Best Regards,
>> > ANKIT BEOHAR
>> >
>
>

Re: Phoenix and HBase data type serialization issue

Posted by ankit beohar <an...@gmail.com>.
HI Gabriel,

Thanks for your quick reply, but in my case I am using Talend ETL tool to
ingest the data into HBase, so I can not use PDataType subclasses and in
Talend Phoenix connector is not avaialble.
Is there any other way?

Best Regards,
ANKIT BEOHAR


On Wed, Aug 24, 2016 at 1:31 PM, Gabriel Reid <ga...@gmail.com>
wrote:

> Hi Ankit,
>
> All data stored in HBase is stored in the form of byte arrays. The
> conversion from richer types (e.g. date) to byte arrays is one of the
> (many) functionalities included in Phoenix.
>
> When you add a date value in the form of a string to HBase directly
> (bypassing Phoenix), you're simply saving the byte representation of
> that string to HBase. Phoenix uses an encoded long value to store
> dates in HBase, so when you try to read your date value from HBase via
> Phoenix, it's simply interpreting the bytes as a long, which leads to
> the unexpected date value that you're getting.
>
> There are two options to do what you're doing: either (1) use Phoenix
> for both reading and writing data, or (2) use the PDataType subclasses
> (e.g. PDate, PLong, etc) to encode data before storing it to HBase.
>
> - Gabriel
>
> On Wed, Aug 24, 2016 at 9:40 AM, ankit beohar <an...@gmail.com>
> wrote:
> > HI All,
> >
> > I have table in HBase and putting data into it then create phoenix view
> with
> > date, bigint etc data types but when I query from phoenix its giving me
> > wrong values I tried unassigned data types also but not working below are
> > stack:-
> >
> > ==========Hbase===========
> > hbase(main):057:0> create 'CEHCK_DT','0'
> > 0 row(s) in 2.2650 seconds
> >
> > => Hbase::Table - CEHCK_DT
> > hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
> > 0 row(s) in 0.0080 seconds
> >
> > hbase(main):059:0> scan 'CEHCK_DT'
> > ROW                                                          COLUMN+CELL
> >  row1
> column=0:dates,
> > timestamp=1471930977145, value=2016-08-11
> > 1 row(s) in 0.0100 seconds
> >
> >
> > =========Phoenix=============
> > 0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar
> primary
> > key,"0"."dates" date,"0"."SALARY" bigint);
> > No rows affected (0.347 seconds)
> > 0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
> > +-------------------------------+
> > |             dates             |
> > +-------------------------------+
> > | 177670840-04-13 05:44:22.317  |
> > | 177670840-04-13 05:44:22.317  |
> > | 177670840-04-13 05:44:22.317  |
> > +-------------------------------+
> >
> >
> >
> > Best Regards,
> > ANKIT BEOHAR
> >
>

Re: Phoenix and HBase data type serialization issue

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Ankit,

All data stored in HBase is stored in the form of byte arrays. The
conversion from richer types (e.g. date) to byte arrays is one of the
(many) functionalities included in Phoenix.

When you add a date value in the form of a string to HBase directly
(bypassing Phoenix), you're simply saving the byte representation of
that string to HBase. Phoenix uses an encoded long value to store
dates in HBase, so when you try to read your date value from HBase via
Phoenix, it's simply interpreting the bytes as a long, which leads to
the unexpected date value that you're getting.

There are two options to do what you're doing: either (1) use Phoenix
for both reading and writing data, or (2) use the PDataType subclasses
(e.g. PDate, PLong, etc) to encode data before storing it to HBase.

- Gabriel

On Wed, Aug 24, 2016 at 9:40 AM, ankit beohar <an...@gmail.com> wrote:
> HI All,
>
> I have table in HBase and putting data into it then create phoenix view with
> date, bigint etc data types but when I query from phoenix its giving me
> wrong values I tried unassigned data types also but not working below are
> stack:-
>
> ==========Hbase===========
> hbase(main):057:0> create 'CEHCK_DT','0'
> 0 row(s) in 2.2650 seconds
>
> => Hbase::Table - CEHCK_DT
> hbase(main):058:0> put 'CEHCK_DT','row1','0:dates','2016-08-11'
> 0 row(s) in 0.0080 seconds
>
> hbase(main):059:0> scan 'CEHCK_DT'
> ROW                                                          COLUMN+CELL
>  row1                                                        column=0:dates,
> timestamp=1471930977145, value=2016-08-11
> 1 row(s) in 0.0100 seconds
>
>
> =========Phoenix=============
> 0: jdbc:phoenix:localhost:2181> create table "CEHCK_DT"(pk varchar primary
> key,"0"."dates" date,"0"."SALARY" bigint);
> No rows affected (0.347 seconds)
> 0: jdbc:phoenix:localhost:2181> select "0"."dates" from "CEHCK_DT";
> +-------------------------------+
> |             dates             |
> +-------------------------------+
> | 177670840-04-13 05:44:22.317  |
> | 177670840-04-13 05:44:22.317  |
> | 177670840-04-13 05:44:22.317  |
> +-------------------------------+
>
>
>
> Best Regards,
> ANKIT BEOHAR
>