You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Alex Ott <al...@gmail.com> on 2015/07/22 13:22:47 UTC

Several questions...

Hello

I'm starting to play with Apache Drill & try to use it with HBase.

I have following questions:
- I have HBase table, where some columns have minus sign ('-') in the name,
like, 'raw-url', etc.  How I can query this table & do conversion of the
the corresponding columns? I tried to use single quotes around name, but in
this case the name of column itself is returned:

0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
hbase.urls WHERE row_key = 'AZ.OC.ICR';
+------------+-----------------+
|    key     |       url       |
+------------+-----------------+
| AZ.OC.ICR  | urls.u.raw-url  |
+------------+-----------------+

Use of backquotes or double quotes leads to the error.

- Another question is about data conversion - I have 'status' column that
holds integer value (as binary), but when I'm trying to convert it from
binary to INT, then I get value different from stored in the DB:

For example, for this row I have status field equal to 200 (0xC8)

hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
COLUMN
CELL

 u:check-td                     timestamp=1422651539493,
value=2015-01-30T07:53:17Z
 u:checked                      timestamp=1422651539493,
value=\xFF
 u:imp-td                       timestamp=1414402209086,
value=2014-09-11T06:51:41Z
 u:raw-url                      timestamp=1411476725886, value=
http://RCI.CO.ZA
 u:status                       timestamp=1411476725886,
value=\x00\x00\x00\xC8
5 row(s) in 0.0300 seconds

But when I do query from Drill, I get some big negative number:

0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE row_key
= 'AZ.OC.ICR'
. . . . . . . . . . . > ;
+------------+-------------+
|    key     |   status    |
+------------+-------------+
| AZ.OC.ICR  | -939524096  |
+------------+-------------+

What is the correct way of converting binary data into corresponding
representation?

Thank you

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Re: Several questions...

Posted by Kristine Hahn <kh...@maprtech.com>.
Sounds great. The docs are written in markdown and stored in github-pages.
You can contribute to the docs using a pull request. Click the pencil icon
on the top right side of the page, and go for it. Thanks much, really
appreciate your feedback and help.

Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn


On Thu, Jul 23, 2015 at 6:56 AM, Alex Ott <al...@gmail.com> wrote:

> Thank you for pointing me to this section - somehow I missed it.
>
> How do you maintain this documentation? Maybe I have time to add more
> examples, so it will be easier for other people to start to work with
> HBase/Drill combo.
>
> On Thu, Jul 23, 2015 at 3:38 PM, Kristine Hahn <kh...@maprtech.com> wrote:
>
> > These data types are listed
> >
> >
> http://drill.apache.org/docs/supported-data-types/#convert_to-and-convert_from
> > ,
> > but need to be easier to find and include useful examples as Ted pointed
> > out. Sorry you had a problem. We'll add links to the types from strategic
> > places.
> >
> > On Thursday, July 23, 2015, Alex Ott <al...@gmail.com> wrote:
> >
> > > Thank you Jacques
> > >
> > > The INT_BE made the trick - now I'm getting status 200 instead of the
> > > negative number.  The problem is that I haven't seen any mention of
> this
> > > type anywhere in the documentation - maybe the corresponding section of
> > the
> > > conversions should be expanded, because it refers only to standard
> types
> > > that mention only INT, LONG, etc,, without endian-specific variants
> > >
> > > Another point is ease to use - if you need to make relatively complex
> > query
> > > against HBase, your code is cluttered with all these CONVERT_FROMs -
> > maybe
> > > there could be a way to "pre-register" HBase schema with data types,
> > etc.,
> > > and after that - use provided information.  Otherwise, it's completely
> > > unusable, especially if the project targets analysts, not developers -
> > they
> > > all will be confused by conversions, big endian types, etc.
> > >
> > > On Thu, Jul 23, 2015 at 1:51 AM, Jacques Nadeau <jacques@dremio.com
> > > <javascript:;>> wrote:
> > >
> > > > It is easier to understand using the BINARY_STRING and STRING_BINARY
> > > > functions that Aditya so kindly added.  In general, CONVERT_TO and
> > > > CONVERT_FROM are converting to binary and from binary.  The encoding
> > > > defines the translation.
> > > >
> > > > SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as
> > cnvrt
> > > > from (VALUES (1));
> > > > +--------+
> > > > | cnvrt  |
> > > > +--------+
> > > > | 200    |
> > > > +--------+
> > > >
> > > > You can read this expression as
> > > > 1. Start with string literal \x00\x00\x00\xC8
> > > > 2. Decode that string literal into a VARBINARY with those four
> octets.
> > > > 3. Decode that VARBINARY understanding the data is encoded using big
> > > endian
> > > > four byte integer encoding.
> > > > 4. Return that value.
> > > >
> > > > So 000000C8 = 200 if we're dealing with a big endian integer.
> > > >
> > > > Other examples:
> > > > SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as
> cnvrt
> > > from
> > > > (VALUES (1));
> > > > +--------+
> > > > | cnvrt  |
> > > > +--------+
> > > > | 1047   |
> > > > +--------+
> > > >
> > > > SELECT
> > > >   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
> > > >   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
> > > >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
> > > >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
> > > >   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> > > > from (VALUES (1));
> > > >
> > > >
> > >
> >
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > > > |         i         |       i_be        |                 l
> > > > |               l_be                | l_be0  |
> > > >
> > > >
> > >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > > > | \x01\x00\x00\x00  | \x00\x00\x00\x01  |
> > > \x01\x00\x00\x00\x00\x00\x00\x00
> > > >  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
> > > >
> > > >
> > >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > > >
> > > > SELECT
> > > >   STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
> > > >   STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
> > > > from (VALUES (1));
> > > > +--------+------------------------------------+
> > > > |   u8   |                u16                 |
> > > > +--------+------------------------------------+
> > > > | hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
> > > > +--------+------------------------------------+
> > > >
> > > > You can see a bunch of examples in the tests here [1].
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java
> > > >
> > > > Now specifically to your three examples:
> > > >
> > > > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> > > > sys.version;
> > > >
> > > > File a bug.  This should work.
> > > >
> > > > > select cast(x as BINARY(10)) foo from (values 1000)tbl(x);
> > > >
> > > > I'm pretty sure that SQL doesn't allow a cast from integer to
> > varbinary,
> > > > thus a correct failure message.
> > > >
> > > > >select convert_to(x, 'INT') from (values 1000) tbl(x);
> > > >
> > > > The problem here is that Drill treats all number literals as BIGINT.
> > > > You're trying to convert to a four byte encoding.  This doesn't work
> > > since
> > > > your input type requires eight bytes of precision.  You either need
> to
> > > cast
> > > > to lower precision or change your encoding to BIGINT or some other
> > large
> > > > precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
> > > >
> > >
> > >
> > >
> > > --
> > > With best wishes,                    Alex Ott
> > > http://alexott.net/
> > > Twitter: alexott_en (English), alexott (Russian)
> > > Skype: alex.ott
> > >
> >
> >
> > --
> > Kristine Hahn
> > Sr. Technical Writer
> > 415-497-8107 @krishahn
> >
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>

Re: Several questions...

Posted by Alex Ott <al...@gmail.com>.
Thank you for pointing me to this section - somehow I missed it.

How do you maintain this documentation? Maybe I have time to add more
examples, so it will be easier for other people to start to work with
HBase/Drill combo.

On Thu, Jul 23, 2015 at 3:38 PM, Kristine Hahn <kh...@maprtech.com> wrote:

> These data types are listed
>
> http://drill.apache.org/docs/supported-data-types/#convert_to-and-convert_from
> ,
> but need to be easier to find and include useful examples as Ted pointed
> out. Sorry you had a problem. We'll add links to the types from strategic
> places.
>
> On Thursday, July 23, 2015, Alex Ott <al...@gmail.com> wrote:
>
> > Thank you Jacques
> >
> > The INT_BE made the trick - now I'm getting status 200 instead of the
> > negative number.  The problem is that I haven't seen any mention of this
> > type anywhere in the documentation - maybe the corresponding section of
> the
> > conversions should be expanded, because it refers only to standard types
> > that mention only INT, LONG, etc,, without endian-specific variants
> >
> > Another point is ease to use - if you need to make relatively complex
> query
> > against HBase, your code is cluttered with all these CONVERT_FROMs -
> maybe
> > there could be a way to "pre-register" HBase schema with data types,
> etc.,
> > and after that - use provided information.  Otherwise, it's completely
> > unusable, especially if the project targets analysts, not developers -
> they
> > all will be confused by conversions, big endian types, etc.
> >
> > On Thu, Jul 23, 2015 at 1:51 AM, Jacques Nadeau <jacques@dremio.com
> > <javascript:;>> wrote:
> >
> > > It is easier to understand using the BINARY_STRING and STRING_BINARY
> > > functions that Aditya so kindly added.  In general, CONVERT_TO and
> > > CONVERT_FROM are converting to binary and from binary.  The encoding
> > > defines the translation.
> > >
> > > SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as
> cnvrt
> > > from (VALUES (1));
> > > +--------+
> > > | cnvrt  |
> > > +--------+
> > > | 200    |
> > > +--------+
> > >
> > > You can read this expression as
> > > 1. Start with string literal \x00\x00\x00\xC8
> > > 2. Decode that string literal into a VARBINARY with those four octets.
> > > 3. Decode that VARBINARY understanding the data is encoded using big
> > endian
> > > four byte integer encoding.
> > > 4. Return that value.
> > >
> > > So 000000C8 = 200 if we're dealing with a big endian integer.
> > >
> > > Other examples:
> > > SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as cnvrt
> > from
> > > (VALUES (1));
> > > +--------+
> > > | cnvrt  |
> > > +--------+
> > > | 1047   |
> > > +--------+
> > >
> > > SELECT
> > >   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
> > >   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
> > >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
> > >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
> > >   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> > > from (VALUES (1));
> > >
> > >
> >
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > > |         i         |       i_be        |                 l
> > > |               l_be                | l_be0  |
> > >
> > >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > > | \x01\x00\x00\x00  | \x00\x00\x00\x01  |
> > \x01\x00\x00\x00\x00\x00\x00\x00
> > >  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
> > >
> > >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > >
> > > SELECT
> > >   STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
> > >   STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
> > > from (VALUES (1));
> > > +--------+------------------------------------+
> > > |   u8   |                u16                 |
> > > +--------+------------------------------------+
> > > | hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
> > > +--------+------------------------------------+
> > >
> > > You can see a bunch of examples in the tests here [1].
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java
> > >
> > > Now specifically to your three examples:
> > >
> > > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> > > sys.version;
> > >
> > > File a bug.  This should work.
> > >
> > > > select cast(x as BINARY(10)) foo from (values 1000)tbl(x);
> > >
> > > I'm pretty sure that SQL doesn't allow a cast from integer to
> varbinary,
> > > thus a correct failure message.
> > >
> > > >select convert_to(x, 'INT') from (values 1000) tbl(x);
> > >
> > > The problem here is that Drill treats all number literals as BIGINT.
> > > You're trying to convert to a four byte encoding.  This doesn't work
> > since
> > > your input type requires eight bytes of precision.  You either need to
> > cast
> > > to lower precision or change your encoding to BIGINT or some other
> large
> > > precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
> > >
> >
> >
> >
> > --
> > With best wishes,                    Alex Ott
> > http://alexott.net/
> > Twitter: alexott_en (English), alexott (Russian)
> > Skype: alex.ott
> >
>
>
> --
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn
>



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Re: Several questions...

Posted by Kristine Hahn <kh...@maprtech.com>.
These data types are listed
http://drill.apache.org/docs/supported-data-types/#convert_to-and-convert_from,
but need to be easier to find and include useful examples as Ted pointed
out. Sorry you had a problem. We'll add links to the types from strategic
places.

On Thursday, July 23, 2015, Alex Ott <al...@gmail.com> wrote:

> Thank you Jacques
>
> The INT_BE made the trick - now I'm getting status 200 instead of the
> negative number.  The problem is that I haven't seen any mention of this
> type anywhere in the documentation - maybe the corresponding section of the
> conversions should be expanded, because it refers only to standard types
> that mention only INT, LONG, etc,, without endian-specific variants
>
> Another point is ease to use - if you need to make relatively complex query
> against HBase, your code is cluttered with all these CONVERT_FROMs - maybe
> there could be a way to "pre-register" HBase schema with data types, etc.,
> and after that - use provided information.  Otherwise, it's completely
> unusable, especially if the project targets analysts, not developers - they
> all will be confused by conversions, big endian types, etc.
>
> On Thu, Jul 23, 2015 at 1:51 AM, Jacques Nadeau <jacques@dremio.com
> <javascript:;>> wrote:
>
> > It is easier to understand using the BINARY_STRING and STRING_BINARY
> > functions that Aditya so kindly added.  In general, CONVERT_TO and
> > CONVERT_FROM are converting to binary and from binary.  The encoding
> > defines the translation.
> >
> > SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as cnvrt
> > from (VALUES (1));
> > +--------+
> > | cnvrt  |
> > +--------+
> > | 200    |
> > +--------+
> >
> > You can read this expression as
> > 1. Start with string literal \x00\x00\x00\xC8
> > 2. Decode that string literal into a VARBINARY with those four octets.
> > 3. Decode that VARBINARY understanding the data is encoded using big
> endian
> > four byte integer encoding.
> > 4. Return that value.
> >
> > So 000000C8 = 200 if we're dealing with a big endian integer.
> >
> > Other examples:
> > SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as cnvrt
> from
> > (VALUES (1));
> > +--------+
> > | cnvrt  |
> > +--------+
> > | 1047   |
> > +--------+
> >
> > SELECT
> >   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> > from (VALUES (1));
> >
> >
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > |         i         |       i_be        |                 l
> > |               l_be                | l_be0  |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > | \x01\x00\x00\x00  | \x00\x00\x00\x01  |
> \x01\x00\x00\x00\x00\x00\x00\x00
> >  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> >
> > SELECT
> >   STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
> >   STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
> > from (VALUES (1));
> > +--------+------------------------------------+
> > |   u8   |                u16                 |
> > +--------+------------------------------------+
> > | hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
> > +--------+------------------------------------+
> >
> > You can see a bunch of examples in the tests here [1].
> >
> > [1]
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java
> >
> > Now specifically to your three examples:
> >
> > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> > sys.version;
> >
> > File a bug.  This should work.
> >
> > > select cast(x as BINARY(10)) foo from (values 1000)tbl(x);
> >
> > I'm pretty sure that SQL doesn't allow a cast from integer to varbinary,
> > thus a correct failure message.
> >
> > >select convert_to(x, 'INT') from (values 1000) tbl(x);
> >
> > The problem here is that Drill treats all number literals as BIGINT.
> > You're trying to convert to a four byte encoding.  This doesn't work
> since
> > your input type requires eight bytes of precision.  You either need to
> cast
> > to lower precision or change your encoding to BIGINT or some other large
> > precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
> >
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>


-- 
Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Jul 23, 2015 at 8:18 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> The good news is, Drill does provide a nice simple way to abstract these
> details away.  You simply create a view on top of HBase [1].  The view can
> contain the physical conversions.  Then users can interact with the view
> rather than the underlying table.
>
> [1] http://drill.apache.org/docs/create-view-command/
>

This is sooo powerful.

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
Unfortunately, HBase hasn't embraced embedded schema last I checked.  There
are definitely tools on top of HBase that do provide this.  For example I
believe Wibi and Cask both provide a more structured approach on top of
HBase.  Someone could extend the plugin to support these systems.

The good news is, Drill does provide a nice simple way to abstract these
details away.  You simply create a view on top of HBase [1].  The view can
contain the physical conversions.  Then users can interact with the view
rather than the underlying table.

[1] http://drill.apache.org/docs/create-view-command/

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Jul 23, 2015 at 12:06 AM, Alex Ott <al...@gmail.com> wrote:

> Thank you Jacques
>
> The INT_BE made the trick - now I'm getting status 200 instead of the
> negative number.  The problem is that I haven't seen any mention of this
> type anywhere in the documentation - maybe the corresponding section of the
> conversions should be expanded, because it refers only to standard types
> that mention only INT, LONG, etc,, without endian-specific variants
>
> Another point is ease to use - if you need to make relatively complex query
> against HBase, your code is cluttered with all these CONVERT_FROMs - maybe
> there could be a way to "pre-register" HBase schema with data types, etc.,
> and after that - use provided information.  Otherwise, it's completely
> unusable, especially if the project targets analysts, not developers - they
> all will be confused by conversions, big endian types, etc.
>
> On Thu, Jul 23, 2015 at 1:51 AM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > It is easier to understand using the BINARY_STRING and STRING_BINARY
> > functions that Aditya so kindly added.  In general, CONVERT_TO and
> > CONVERT_FROM are converting to binary and from binary.  The encoding
> > defines the translation.
> >
> > SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as cnvrt
> > from (VALUES (1));
> > +--------+
> > | cnvrt  |
> > +--------+
> > | 200    |
> > +--------+
> >
> > You can read this expression as
> > 1. Start with string literal \x00\x00\x00\xC8
> > 2. Decode that string literal into a VARBINARY with those four octets.
> > 3. Decode that VARBINARY understanding the data is encoded using big
> endian
> > four byte integer encoding.
> > 4. Return that value.
> >
> > So 000000C8 = 200 if we're dealing with a big endian integer.
> >
> > Other examples:
> > SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as cnvrt
> from
> > (VALUES (1));
> > +--------+
> > | cnvrt  |
> > +--------+
> > | 1047   |
> > +--------+
> >
> > SELECT
> >   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> > from (VALUES (1));
> >
> >
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > |         i         |       i_be        |                 l
> > |               l_be                | l_be0  |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > | \x01\x00\x00\x00  | \x00\x00\x00\x01  |
> \x01\x00\x00\x00\x00\x00\x00\x00
> >  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> >
> > SELECT
> >   STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
> >   STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
> > from (VALUES (1));
> > +--------+------------------------------------+
> > |   u8   |                u16                 |
> > +--------+------------------------------------+
> > | hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
> > +--------+------------------------------------+
> >
> > You can see a bunch of examples in the tests here [1].
> >
> > [1]
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java
> >
> > Now specifically to your three examples:
> >
> > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> > sys.version;
> >
> > File a bug.  This should work.
> >
> > > select cast(x as BINARY(10)) foo from (values 1000)tbl(x);
> >
> > I'm pretty sure that SQL doesn't allow a cast from integer to varbinary,
> > thus a correct failure message.
> >
> > >select convert_to(x, 'INT') from (values 1000) tbl(x);
> >
> > The problem here is that Drill treats all number literals as BIGINT.
> > You're trying to convert to a four byte encoding.  This doesn't work
> since
> > your input type requires eight bytes of precision.  You either need to
> cast
> > to lower precision or change your encoding to BIGINT or some other large
> > precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
> >
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>

Re: Several questions...

Posted by Alex Ott <al...@gmail.com>.
Thank you Jacques

The INT_BE made the trick - now I'm getting status 200 instead of the
negative number.  The problem is that I haven't seen any mention of this
type anywhere in the documentation - maybe the corresponding section of the
conversions should be expanded, because it refers only to standard types
that mention only INT, LONG, etc,, without endian-specific variants

Another point is ease to use - if you need to make relatively complex query
against HBase, your code is cluttered with all these CONVERT_FROMs - maybe
there could be a way to "pre-register" HBase schema with data types, etc.,
and after that - use provided information.  Otherwise, it's completely
unusable, especially if the project targets analysts, not developers - they
all will be confused by conversions, big endian types, etc.

On Thu, Jul 23, 2015 at 1:51 AM, Jacques Nadeau <ja...@dremio.com> wrote:

> It is easier to understand using the BINARY_STRING and STRING_BINARY
> functions that Aditya so kindly added.  In general, CONVERT_TO and
> CONVERT_FROM are converting to binary and from binary.  The encoding
> defines the translation.
>
> SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as cnvrt
> from (VALUES (1));
> +--------+
> | cnvrt  |
> +--------+
> | 200    |
> +--------+
>
> You can read this expression as
> 1. Start with string literal \x00\x00\x00\xC8
> 2. Decode that string literal into a VARBINARY with those four octets.
> 3. Decode that VARBINARY understanding the data is encoded using big endian
> four byte integer encoding.
> 4. Return that value.
>
> So 000000C8 = 200 if we're dealing with a big endian integer.
>
> Other examples:
> SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as cnvrt from
> (VALUES (1));
> +--------+
> | cnvrt  |
> +--------+
> | 1047   |
> +--------+
>
> SELECT
>   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
>   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
>   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
>   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
>   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> from (VALUES (1));
>
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> |         i         |       i_be        |                 l
> |               l_be                | l_be0  |
>
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> | \x01\x00\x00\x00  | \x00\x00\x00\x01  | \x01\x00\x00\x00\x00\x00\x00\x00
>  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
>
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
>
> SELECT
>   STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
>   STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
> from (VALUES (1));
> +--------+------------------------------------+
> |   u8   |                u16                 |
> +--------+------------------------------------+
> | hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
> +--------+------------------------------------+
>
> You can see a bunch of examples in the tests here [1].
>
> [1]
>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java
>
> Now specifically to your three examples:
>
> > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> sys.version;
>
> File a bug.  This should work.
>
> > select cast(x as BINARY(10)) foo from (values 1000)tbl(x);
>
> I'm pretty sure that SQL doesn't allow a cast from integer to varbinary,
> thus a correct failure message.
>
> >select convert_to(x, 'INT') from (values 1000) tbl(x);
>
> The problem here is that Drill treats all number literals as BIGINT.
> You're trying to convert to a four byte encoding.  This doesn't work since
> your input type requires eight bytes of precision.  You either need to cast
> to lower precision or change your encoding to BIGINT or some other large
> precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
>



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 22, 2015 at 5:44 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Good point.  It is because in case the expression is evaluated after the
> data is materialized (yours) and the other the expression is evaluated at
> the same time the data is materialized (mine).  In the case that they are
> evaluated simultaneously, we're treating as an INT until the data gets to
> big.  As such, my query works, yours doesn't, and neither does mine once
> the value gets beyond the data type capacity:
>
> SELECT STRING_BINARY(CONVERT_TO(2147483648, 'INT')) as i from (VALUES
> (1));
>
> I'm not sure if this is a bug or a feature.
>

It is confusing, for sure.

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
Good point.  It is because in case the expression is evaluated after the
data is materialized (yours) and the other the expression is evaluated at
the same time the data is materialized (mine).  In the case that they are
evaluated simultaneously, we're treating as an INT until the data gets to
big.  As such, my query works, yours doesn't, and neither does mine once
the value gets beyond the data type capacity:

SELECT STRING_BINARY(CONVERT_TO(2147483648, 'INT')) as i from (VALUES (1));

I'm not sure if this is a bug or a feature.

On Wed, Jul 22, 2015 at 5:00 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > SELECT
> >   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
> >   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
> >   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> > from (VALUES (1));
> >
> >
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > |         i         |       i_be        |                 l
> > |               l_be                | l_be0  |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> > | \x01\x00\x00\x00  | \x00\x00\x00\x01  |
> \x01\x00\x00\x00\x00\x00\x00\x00
> >  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
> >
> >
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> >
> > ...
> >
> > >select convert_to(x, 'INT') from (values 1000) tbl(x);
> >
> > The problem here is that Drill treats all number literals as BIGINT.
> > You're trying to convert to a four byte encoding.  This doesn't work
> since
> > your input type requires eight bytes of precision.  You either need to
> cast
> > to lower precision or change your encoding to BIGINT or some other large
> > precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
> >
>
> So why does your convert_to(1,'int') work and my convert_to(x,'int') fail?
>

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> SELECT
>   STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
>   STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
>   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
>   STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
>   STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
> from (VALUES (1));
>
> -------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> |         i         |       i_be        |                 l
> |               l_be                | l_be0  |
>
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
> | \x01\x00\x00\x00  | \x00\x00\x00\x01  | \x01\x00\x00\x00\x00\x00\x00\x00
>  | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
>
> +-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
>
> ...
>
> >select convert_to(x, 'INT') from (values 1000) tbl(x);
>
> The problem here is that Drill treats all number literals as BIGINT.
> You're trying to convert to a four byte encoding.  This doesn't work since
> your input type requires eight bytes of precision.  You either need to cast
> to lower precision or change your encoding to BIGINT or some other large
> precision encoding such as BIGINT_HADOOPV or BIGINT_BE.
>

So why does your convert_to(1,'int') work and my convert_to(x,'int') fail?

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
Cool.

On Wed, Jul 22, 2015 at 6:54 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> I'm sorry I wasn't clearer.  The fact that the error is incomprehensible
> has already been fixed by Parth and will be part of 1.2
>
> On Wed, Jul 22, 2015 at 6:42 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Wed, Jul 22, 2015 at 5:35 PM, Jacques Nadeau <ja...@dremio.com>
> > wrote:
> >
> > > So this works:
> > >
> > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS MYCOL1 FROM
> > > sys.version;
> > > +--------------+
> > > |    MYCOL1    |
> > > +--------------+
> > > | [B@7e308c04  |
> > > +--------------+
> > >
> >
> > OK.  So the difference here is that I had UTF-8 instead of UTF8.
> >
> > I will file a bug on this anyway since the error is incomprehensible.
> >
>

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
I'm sorry I wasn't clearer.  The fact that the error is incomprehensible
has already been fixed by Parth and will be part of 1.2

On Wed, Jul 22, 2015 at 6:42 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Jul 22, 2015 at 5:35 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > So this works:
> >
> > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS MYCOL1 FROM
> > sys.version;
> > +--------------+
> > |    MYCOL1    |
> > +--------------+
> > | [B@7e308c04  |
> > +--------------+
> >
>
> OK.  So the difference here is that I had UTF-8 instead of UTF8.
>
> I will file a bug on this anyway since the error is incomprehensible.
>

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 22, 2015 at 5:35 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> So this works:
>
> SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS MYCOL1 FROM
> sys.version;
> +--------------+
> |    MYCOL1    |
> +--------------+
> | [B@7e308c04  |
> +--------------+
>

OK.  So the difference here is that I had UTF-8 instead of UTF8.

I will file a bug on this anyway since the error is incomprehensible.

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
It should return VARBINARY value encoded in UTF8 that matches the binary
encoding one would expect.  But I just realized there is actually an error
in what you wrote.  The correct encoding is UTF8, not UTF-8.  A recent fix
makes the error message much better here and will be included in 1.2.

So this works:

SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]' ,'UTF8') AS MYCOL1 FROM
sys.version;
+--------------+
|    MYCOL1    |
+--------------+
| [B@7e308c04  |
+--------------+






On Wed, Jul 22, 2015 at 4:57 PM, Ted Dunning <te...@gmail.com> wrote:

> On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau <ja...@dremio.com>
> wrote:
>
> > > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> > sys.version;
> >
> > File a bug.  This should work.
>
>
> I would love to but I don't know what it should do.
>
> Note that the error messages in all of these cases were essentially
> incomprehensible.
>

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Jul 22, 2015 at 4:51 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> > SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
> sys.version;
>
> File a bug.  This should work.


I would love to but I don't know what it should do.

Note that the error messages in all of these cases were essentially
incomprehensible.

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
It is easier to understand using the BINARY_STRING and STRING_BINARY
functions that Aditya so kindly added.  In general, CONVERT_TO and
CONVERT_FROM are converting to binary and from binary.  The encoding
defines the translation.

SELECT CONVERT_FROM(BINARY_STRING('\x00\x00\x00\xC8'), 'INT_BE') as cnvrt
from (VALUES (1));
+--------+
| cnvrt  |
+--------+
| 200    |
+--------+

You can read this expression as
1. Start with string literal \x00\x00\x00\xC8
2. Decode that string literal into a VARBINARY with those four octets.
3. Decode that VARBINARY understanding the data is encoded using big endian
four byte integer encoding.
4. Return that value.

So 000000C8 = 200 if we're dealing with a big endian integer.

Other examples:
SELECT CONVERT_FROM(BINARY_STRING('\x17\x04\x00\x00'), 'INT') as cnvrt from
(VALUES (1));
+--------+
| cnvrt  |
+--------+
| 1047   |
+--------+

SELECT
  STRING_BINARY(CONVERT_TO(1, 'INT')) as i,
  STRING_BINARY(CONVERT_TO(1, 'INT_BE')) as i_be,
  STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l,
  STRING_BINARY(CONVERT_TO(1, 'BIGINT')) as l_be,
  STRING_BINARY(CONVERT_TO(1, 'INT_HADOOPV')) as l_be
from (VALUES (1));
-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
|         i         |       i_be        |                 l
|               l_be                | l_be0  |
+-------------------+-------------------+-----------------------------------+-----------------------------------+--------+
| \x01\x00\x00\x00  | \x00\x00\x00\x01  | \x01\x00\x00\x00\x00\x00\x00\x00
 | \x01\x00\x00\x00\x00\x00\x00\x00  | \x01   |
+-------------------+-------------------+-----------------------------------+-----------------------------------+--------+

SELECT
  STRING_BINARY(CONVERT_TO('hello', 'UTF8')) u8,
  STRING_BINARY(CONVERT_TO('hello', 'UTF16')) u16
from (VALUES (1));
+--------+------------------------------------+
|   u8   |                u16                 |
+--------+------------------------------------+
| hello  | \xFE\xFF\x00h\x00e\x00l\x00l\x00o  |
+--------+------------------------------------+

You can see a bunch of examples in the tests here [1].

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestConvertFunctions.java

Now specifically to your three examples:

> SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]','UTF-8') AS MYCOL1 FROM
sys.version;

File a bug.  This should work.

> select cast(x as BINARY(10)) foo from (values 1000)tbl(x);

I'm pretty sure that SQL doesn't allow a cast from integer to varbinary,
thus a correct failure message.

>select convert_to(x, 'INT') from (values 1000) tbl(x);

The problem here is that Drill treats all number literals as BIGINT.
You're trying to convert to a four byte encoding.  This doesn't work since
your input type requires eight bytes of precision.  You either need to cast
to lower precision or change your encoding to BIGINT or some other large
precision encoding such as BIGINT_HADOOPV or BIGINT_BE.

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
Jacques,

I just spent an hour or more trying to read the docs on convert_from/to.  I
had no success.

There are plenty of examples of converting to or from UTF-8, but none
describing conversions to do with integers.

In doing (lots of) experiments, I have failed to

1) create a constant of binary type using values or values + convert or
values + cast

2) create a value utf-8 format

3) get any interesting convert_from or convert_to to work

Can you help with an example that illustrates using CONVERT_FROM on a
big-endian integer?

Typical queries include:

0: jdbc:drill:zk=local> SELECT CONVERT_TO('[ [1, 2], [3, 4], [5]]'
,'UTF-8') AS MYCOL1 FROM sys.version;

*Error: SYSTEM ERROR: org.apache.drill.exec.work.foreman.ForemanException:
Unexpected exception during fragment initialization: null*

0: jdbc:drill:zk=local> select cast(x as BINARY(10)) foo from (values
1000)tbl(x);

Jul 22, 2015 4:15:45 PM
org.apache.calcite.sql.validate.SqlValidatorException <init>

SEVERE: org.apache.calcite.sql.validate.SqlValidatorException: Cast
function cannot convert value of type INTEGER to type BINARY(10)

Jul 22, 2015 4:15:45 PM org.apache.calcite.runtime.CalciteException <init>

SEVERE: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 8 to line 1, column 28: Cast function cannot convert value of type
INTEGER to type BINARY(10)

*Error: PARSE ERROR: From line 1, column 8 to line 1, column 28: Cast
function cannot convert value of type INTEGER to type BINARY(10)*

*[Error Id: 20732209-b06e-4ff0-8371-4f04eb0b7a12 on 172.16.0.61:31010
<http://172.16.0.61:31010>] (state=,code=0)*
 0: jdbc:drill:zk=local> select convert_to(x, 'INT') from (values 1000)
tbl(x) ;

*Error: SYSTEM ERROR:
org.apache.drill.exec.exception.SchemaChangeException: Failure while trying
to materialize incoming schema.  Errors*

*Error in expression at index -1.  Error: Missing function implementation:
[convert_toint(BIGINT-OPTIONAL)].  Full expression: --UNKNOWN
EXPRESSION--..*




On Wed, Jul 22, 2015 at 2:40 PM, Jacques Nadeau <ja...@dremio.com> wrote:

> Let me clarify this a bit.
>
> If the data is encoded as text (UTF8), then cast is what you want to use.
> If the data is encoded in a binary representation (such as 4 byte little or
> big endian integer), then you want to use CONVERT_FROM.  CONVERT_FROM is
> about converting from a binary representation to a particular data type.
> CAST is about converting between known data types.
>
> On Wed, Jul 22, 2015 at 5:29 AM, Alex Ott <al...@gmail.com> wrote:
>
> > Hmmm, what I get when I using the CAST:
> >
> >
> > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > CAST(urls.u.status AS INT) AS status FROM hbase.urls WHERE row_key =
> > 'AZ.OC.ICR';
> > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> > NumberFormatException: �
> >
> >
> > In the documentation I see following:
> >
> > "Use CONVERT_TO and CONVERT_FROM instead of the CAST function for
> > converting binary data types with one exception: When converting an INT
> or
> > BIGINT number, having a byte count in the destination/source that does
> not
> > match the byte count of the number in the VARBINARY source/destination,
> use
> > CAST."
> >
> > But I have 4 bytes in the corresponding cell of the database...
> >
> >
> > On Wed, Jul 22, 2015 at 2:09 PM, Nathaniel Auvil <
> > nathaniel.auvil@gmail.com>
> > wrote:
> >
> > > to convert data, use the CAST function as in:
> > >
> > > Select CAST(hbase.urls as VARCHAR(64)) as url from ...
> > >
> > > On Wed, Jul 22, 2015 at 7:22 AM, Alex Ott <al...@gmail.com> wrote:
> > >
> > > > Hello
> > > >
> > > > I'm starting to play with Apache Drill & try to use it with HBase.
> > > >
> > > > I have following questions:
> > > > - I have HBase table, where some columns have minus sign ('-') in the
> > > name,
> > > > like, 'raw-url', etc.  How I can query this table & do conversion of
> > the
> > > > the corresponding columns? I tried to use single quotes around name,
> > but
> > > in
> > > > this case the name of column itself is returned:
> > > >
> > > > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > > > CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> > > > hbase.urls WHERE row_key = 'AZ.OC.ICR';
> > > > +------------+-----------------+
> > > > |    key     |       url       |
> > > > +------------+-----------------+
> > > > | AZ.OC.ICR  | urls.u.raw-url  |
> > > > +------------+-----------------+
> > > >
> > > > Use of backquotes or double quotes leads to the error.
> > > >
> > > > - Another question is about data conversion - I have 'status' column
> > that
> > > > holds integer value (as binary), but when I'm trying to convert it
> from
> > > > binary to INT, then I get value different from stored in the DB:
> > > >
> > > > For example, for this row I have status field equal to 200 (0xC8)
> > > >
> > > > hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> > > > COLUMN
> > > > CELL
> > > >
> > > >  u:check-td                     timestamp=1422651539493,
> > > > value=2015-01-30T07:53:17Z
> > > >  u:checked                      timestamp=1422651539493,
> > > > value=\xFF
> > > >  u:imp-td                       timestamp=1414402209086,
> > > > value=2014-09-11T06:51:41Z
> > > >  u:raw-url                      timestamp=1411476725886, value=
> > > > http://RCI.CO.ZA
> > > >  u:status                       timestamp=1411476725886,
> > > > value=\x00\x00\x00\xC8
> > > > 5 row(s) in 0.0300 seconds
> > > >
> > > > But when I do query from Drill, I get some big negative number:
> > > >
> > > > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > > > CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE
> > > row_key
> > > > = 'AZ.OC.ICR'
> > > > . . . . . . . . . . . > ;
> > > > +------------+-------------+
> > > > |    key     |   status    |
> > > > +------------+-------------+
> > > > | AZ.OC.ICR  | -939524096  |
> > > > +------------+-------------+
> > > >
> > > > What is the correct way of converting binary data into corresponding
> > > > representation?
> > > >
> > > > Thank you
> > > >
> > > > --
> > > > With best wishes,                    Alex Ott
> > > > http://alexott.net/
> > > > Twitter: alexott_en (English), alexott (Russian)
> > > > Skype: alex.ott
> > > >
> > >
> >
> >
> >
> > --
> > With best wishes,                    Alex Ott
> > http://alexott.net/
> > Twitter: alexott_en (English), alexott (Russian)
> > Skype: alex.ott
> >
>

Re: Several questions...

Posted by Jacques Nadeau <ja...@dremio.com>.
Let me clarify this a bit.

If the data is encoded as text (UTF8), then cast is what you want to use.
If the data is encoded in a binary representation (such as 4 byte little or
big endian integer), then you want to use CONVERT_FROM.  CONVERT_FROM is
about converting from a binary representation to a particular data type.
CAST is about converting between known data types.

On Wed, Jul 22, 2015 at 5:29 AM, Alex Ott <al...@gmail.com> wrote:

> Hmmm, what I get when I using the CAST:
>
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CAST(urls.u.status AS INT) AS status FROM hbase.urls WHERE row_key =
> 'AZ.OC.ICR';
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
> NumberFormatException: �
>
>
> In the documentation I see following:
>
> "Use CONVERT_TO and CONVERT_FROM instead of the CAST function for
> converting binary data types with one exception: When converting an INT or
> BIGINT number, having a byte count in the destination/source that does not
> match the byte count of the number in the VARBINARY source/destination, use
> CAST."
>
> But I have 4 bytes in the corresponding cell of the database...
>
>
> On Wed, Jul 22, 2015 at 2:09 PM, Nathaniel Auvil <
> nathaniel.auvil@gmail.com>
> wrote:
>
> > to convert data, use the CAST function as in:
> >
> > Select CAST(hbase.urls as VARCHAR(64)) as url from ...
> >
> > On Wed, Jul 22, 2015 at 7:22 AM, Alex Ott <al...@gmail.com> wrote:
> >
> > > Hello
> > >
> > > I'm starting to play with Apache Drill & try to use it with HBase.
> > >
> > > I have following questions:
> > > - I have HBase table, where some columns have minus sign ('-') in the
> > name,
> > > like, 'raw-url', etc.  How I can query this table & do conversion of
> the
> > > the corresponding columns? I tried to use single quotes around name,
> but
> > in
> > > this case the name of column itself is returned:
> > >
> > > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > > CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> > > hbase.urls WHERE row_key = 'AZ.OC.ICR';
> > > +------------+-----------------+
> > > |    key     |       url       |
> > > +------------+-----------------+
> > > | AZ.OC.ICR  | urls.u.raw-url  |
> > > +------------+-----------------+
> > >
> > > Use of backquotes or double quotes leads to the error.
> > >
> > > - Another question is about data conversion - I have 'status' column
> that
> > > holds integer value (as binary), but when I'm trying to convert it from
> > > binary to INT, then I get value different from stored in the DB:
> > >
> > > For example, for this row I have status field equal to 200 (0xC8)
> > >
> > > hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> > > COLUMN
> > > CELL
> > >
> > >  u:check-td                     timestamp=1422651539493,
> > > value=2015-01-30T07:53:17Z
> > >  u:checked                      timestamp=1422651539493,
> > > value=\xFF
> > >  u:imp-td                       timestamp=1414402209086,
> > > value=2014-09-11T06:51:41Z
> > >  u:raw-url                      timestamp=1411476725886, value=
> > > http://RCI.CO.ZA
> > >  u:status                       timestamp=1411476725886,
> > > value=\x00\x00\x00\xC8
> > > 5 row(s) in 0.0300 seconds
> > >
> > > But when I do query from Drill, I get some big negative number:
> > >
> > > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > > CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE
> > row_key
> > > = 'AZ.OC.ICR'
> > > . . . . . . . . . . . > ;
> > > +------------+-------------+
> > > |    key     |   status    |
> > > +------------+-------------+
> > > | AZ.OC.ICR  | -939524096  |
> > > +------------+-------------+
> > >
> > > What is the correct way of converting binary data into corresponding
> > > representation?
> > >
> > > Thank you
> > >
> > > --
> > > With best wishes,                    Alex Ott
> > > http://alexott.net/
> > > Twitter: alexott_en (English), alexott (Russian)
> > > Skype: alex.ott
> > >
> >
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>

Re: Several questions...

Posted by Alex Ott <al...@gmail.com>.
Hmmm, what I get when I using the CAST:


0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
CAST(urls.u.status AS INT) AS status FROM hbase.urls WHERE row_key =
'AZ.OC.ICR';
java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR:
NumberFormatException: �


In the documentation I see following:

"Use CONVERT_TO and CONVERT_FROM instead of the CAST function for
converting binary data types with one exception: When converting an INT or
BIGINT number, having a byte count in the destination/source that does not
match the byte count of the number in the VARBINARY source/destination, use
CAST."

But I have 4 bytes in the corresponding cell of the database...


On Wed, Jul 22, 2015 at 2:09 PM, Nathaniel Auvil <na...@gmail.com>
wrote:

> to convert data, use the CAST function as in:
>
> Select CAST(hbase.urls as VARCHAR(64)) as url from ...
>
> On Wed, Jul 22, 2015 at 7:22 AM, Alex Ott <al...@gmail.com> wrote:
>
> > Hello
> >
> > I'm starting to play with Apache Drill & try to use it with HBase.
> >
> > I have following questions:
> > - I have HBase table, where some columns have minus sign ('-') in the
> name,
> > like, 'raw-url', etc.  How I can query this table & do conversion of the
> > the corresponding columns? I tried to use single quotes around name, but
> in
> > this case the name of column itself is returned:
> >
> > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> > hbase.urls WHERE row_key = 'AZ.OC.ICR';
> > +------------+-----------------+
> > |    key     |       url       |
> > +------------+-----------------+
> > | AZ.OC.ICR  | urls.u.raw-url  |
> > +------------+-----------------+
> >
> > Use of backquotes or double quotes leads to the error.
> >
> > - Another question is about data conversion - I have 'status' column that
> > holds integer value (as binary), but when I'm trying to convert it from
> > binary to INT, then I get value different from stored in the DB:
> >
> > For example, for this row I have status field equal to 200 (0xC8)
> >
> > hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> > COLUMN
> > CELL
> >
> >  u:check-td                     timestamp=1422651539493,
> > value=2015-01-30T07:53:17Z
> >  u:checked                      timestamp=1422651539493,
> > value=\xFF
> >  u:imp-td                       timestamp=1414402209086,
> > value=2014-09-11T06:51:41Z
> >  u:raw-url                      timestamp=1411476725886, value=
> > http://RCI.CO.ZA
> >  u:status                       timestamp=1411476725886,
> > value=\x00\x00\x00\xC8
> > 5 row(s) in 0.0300 seconds
> >
> > But when I do query from Drill, I get some big negative number:
> >
> > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE
> row_key
> > = 'AZ.OC.ICR'
> > . . . . . . . . . . . > ;
> > +------------+-------------+
> > |    key     |   status    |
> > +------------+-------------+
> > | AZ.OC.ICR  | -939524096  |
> > +------------+-------------+
> >
> > What is the correct way of converting binary data into corresponding
> > representation?
> >
> > Thank you
> >
> > --
> > With best wishes,                    Alex Ott
> > http://alexott.net/
> > Twitter: alexott_en (English), alexott (Russian)
> > Skype: alex.ott
> >
>



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Re: Several questions...

Posted by Nathaniel Auvil <na...@gmail.com>.
to convert data, use the CAST function as in:

Select CAST(hbase.urls as VARCHAR(64)) as url from ...

On Wed, Jul 22, 2015 at 7:22 AM, Alex Ott <al...@gmail.com> wrote:

> Hello
>
> I'm starting to play with Apache Drill & try to use it with HBase.
>
> I have following questions:
> - I have HBase table, where some columns have minus sign ('-') in the name,
> like, 'raw-url', etc.  How I can query this table & do conversion of the
> the corresponding columns? I tried to use single quotes around name, but in
> this case the name of column itself is returned:
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> hbase.urls WHERE row_key = 'AZ.OC.ICR';
> +------------+-----------------+
> |    key     |       url       |
> +------------+-----------------+
> | AZ.OC.ICR  | urls.u.raw-url  |
> +------------+-----------------+
>
> Use of backquotes or double quotes leads to the error.
>
> - Another question is about data conversion - I have 'status' column that
> holds integer value (as binary), but when I'm trying to convert it from
> binary to INT, then I get value different from stored in the DB:
>
> For example, for this row I have status field equal to 200 (0xC8)
>
> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> COLUMN
> CELL
>
>  u:check-td                     timestamp=1422651539493,
> value=2015-01-30T07:53:17Z
>  u:checked                      timestamp=1422651539493,
> value=\xFF
>  u:imp-td                       timestamp=1414402209086,
> value=2014-09-11T06:51:41Z
>  u:raw-url                      timestamp=1411476725886, value=
> http://RCI.CO.ZA
>  u:status                       timestamp=1411476725886,
> value=\x00\x00\x00\xC8
> 5 row(s) in 0.0300 seconds
>
> But when I do query from Drill, I get some big negative number:
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE row_key
> = 'AZ.OC.ICR'
> . . . . . . . . . . . > ;
> +------------+-------------+
> |    key     |   status    |
> +------------+-------------+
> | AZ.OC.ICR  | -939524096  |
> +------------+-------------+
>
> What is the correct way of converting binary data into corresponding
> representation?
>
> Thank you
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
Alex,

I am sure that there is a better answer in the large sense, but as a
quick answer, I wrote a UDF that you can use (I think) to do this
conversion.  I haven't tested it yet, however, and would be interested
if it just solves your issue before pushing forward with making it all
nice.

You can find this UDF in https://github.com/mapr-demos/simple-drill-functions

To use it, [mvn package; cp target/*.jar
~/tmp/apache-drill-1.1.0/jars/3rdparty/].  Of course, you need to
insert your own path names into the location of your drill install.

Then in a SQL query, you should be able to do this:

select CONVERT_FROM(row_key, 'UTF8') AS key,
bytes2int(urls.u.`status`) AS status FROM hbase.urls
WHERE row_key = 'AZ.OC.ICR';

Does that work for you?


On Wed, Jul 22, 2015 at 11:06 AM, Alex Ott <al...@gmail.com> wrote:
> Here is what hbase-shell returns for this field:
>
> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> COLUMN
> CELL
>
>  u:status                       timestamp=1411476725886,
> value=\x00\x00\x00\xC8
>
> The database is populated via Java/Clojure (using the
> clojure-hbase-schemas) applications that use Bytes.toBytes(int) to convert
> HTTP status into byte array.  I also have no problem converting value back
> via Bytes/toInt.
>
>
> On Wed, Jul 22, 2015 at 7:00 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Yes.  Just right on that.
>>
>> Regarding the integer conversion, can you saw what format your data is
>> in?  Is it exactly 4 bytes, big endian?
>>
>>
>>
>> On Wed, Jul 22, 2015 at 5:34 AM, Alex Ott <al...@gmail.com> wrote:
>> > Ok, answering my first question - I need to take the only the column name
>> > into the backquotes, instead of taking the complete coordinates:
>> >
>> > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
>> > CONVERT_FROM(urls.u.`raw-url`, 'UTF8') AS url FROM hbase.urls WHERE
>> row_key
>> > = 'AZ.OC.ICR';
>> > +------------+-------------------+
>> > |    key     |        url        |
>> > +------------+-------------------+
>> > | AZ.OC.ICR  | http://RCI.CO.ZA  |
>> > +------------+-------------------+
>> >
>> >
>> > On Wed, Jul 22, 2015 at 1:22 PM, Alex Ott <al...@gmail.com> wrote:
>> >
>> >> Hello
>> >>
>> >> I'm starting to play with Apache Drill & try to use it with HBase.
>> >>
>> >> I have following questions:
>> >> - I have HBase table, where some columns have minus sign ('-') in the
>> >> name, like, 'raw-url', etc.  How I can query this table & do conversion
>> of
>> >> the the corresponding columns? I tried to use single quotes around name,
>> >> but in this case the name of column itself is returned:
>> >>
>> >> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
>> >> CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
>> >> hbase.urls WHERE row_key = 'AZ.OC.ICR';
>> >> +------------+-----------------+
>> >> |    key     |       url       |
>> >> +------------+-----------------+
>> >> | AZ.OC.ICR  | urls.u.raw-url  |
>> >> +------------+-----------------+
>> >>
>> >> Use of backquotes or double quotes leads to the error.
>> >>
>> >> - Another question is about data conversion - I have 'status' column
>> that
>> >> holds integer value (as binary), but when I'm trying to convert it from
>> >> binary to INT, then I get value different from stored in the DB:
>> >>
>> >> For example, for this row I have status field equal to 200 (0xC8)
>> >>
>> >> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
>> >> COLUMN
>> >> CELL
>> >>
>> >>  u:check-td                     timestamp=1422651539493,
>> >> value=2015-01-30T07:53:17Z
>> >>  u:checked                      timestamp=1422651539493,
>> >> value=\xFF
>> >>  u:imp-td                       timestamp=1414402209086,
>> >> value=2014-09-11T06:51:41Z
>> >>  u:raw-url                      timestamp=1411476725886, value=
>> >> http://RCI.CO.ZA
>> >>  u:status                       timestamp=1411476725886,
>> >> value=\x00\x00\x00\xC8
>> >> 5 row(s) in 0.0300 seconds
>> >>
>> >> But when I do query from Drill, I get some big negative number:
>> >>
>> >> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
>> >> CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE
>> row_key
>> >> = 'AZ.OC.ICR'
>> >> . . . . . . . . . . . > ;
>> >> +------------+-------------+
>> >> |    key     |   status    |
>> >> +------------+-------------+
>> >> | AZ.OC.ICR  | -939524096  |
>> >> +------------+-------------+
>> >>
>> >> What is the correct way of converting binary data into corresponding
>> >> representation?
>> >>
>> >> Thank you
>> >>
>> >> --
>> >> With best wishes,                    Alex Ott
>> >> http://alexott.net/
>> >> Twitter: alexott_en (English), alexott (Russian)
>> >> Skype: alex.ott
>> >>
>> >
>> >
>> >
>> > --
>> > With best wishes,                    Alex Ott
>> > http://alexott.net/
>> > Twitter: alexott_en (English), alexott (Russian)
>> > Skype: alex.ott
>>
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott

Re: Several questions...

Posted by Alex Ott <al...@gmail.com>.
Here is what hbase-shell returns for this field:

hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
COLUMN
CELL

 u:status                       timestamp=1411476725886,
value=\x00\x00\x00\xC8

The database is populated via Java/Clojure (using the
clojure-hbase-schemas) applications that use Bytes.toBytes(int) to convert
HTTP status into byte array.  I also have no problem converting value back
via Bytes/toInt.


On Wed, Jul 22, 2015 at 7:00 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.  Just right on that.
>
> Regarding the integer conversion, can you saw what format your data is
> in?  Is it exactly 4 bytes, big endian?
>
>
>
> On Wed, Jul 22, 2015 at 5:34 AM, Alex Ott <al...@gmail.com> wrote:
> > Ok, answering my first question - I need to take the only the column name
> > into the backquotes, instead of taking the complete coordinates:
> >
> > 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> > CONVERT_FROM(urls.u.`raw-url`, 'UTF8') AS url FROM hbase.urls WHERE
> row_key
> > = 'AZ.OC.ICR';
> > +------------+-------------------+
> > |    key     |        url        |
> > +------------+-------------------+
> > | AZ.OC.ICR  | http://RCI.CO.ZA  |
> > +------------+-------------------+
> >
> >
> > On Wed, Jul 22, 2015 at 1:22 PM, Alex Ott <al...@gmail.com> wrote:
> >
> >> Hello
> >>
> >> I'm starting to play with Apache Drill & try to use it with HBase.
> >>
> >> I have following questions:
> >> - I have HBase table, where some columns have minus sign ('-') in the
> >> name, like, 'raw-url', etc.  How I can query this table & do conversion
> of
> >> the the corresponding columns? I tried to use single quotes around name,
> >> but in this case the name of column itself is returned:
> >>
> >> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> >> CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> >> hbase.urls WHERE row_key = 'AZ.OC.ICR';
> >> +------------+-----------------+
> >> |    key     |       url       |
> >> +------------+-----------------+
> >> | AZ.OC.ICR  | urls.u.raw-url  |
> >> +------------+-----------------+
> >>
> >> Use of backquotes or double quotes leads to the error.
> >>
> >> - Another question is about data conversion - I have 'status' column
> that
> >> holds integer value (as binary), but when I'm trying to convert it from
> >> binary to INT, then I get value different from stored in the DB:
> >>
> >> For example, for this row I have status field equal to 200 (0xC8)
> >>
> >> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> >> COLUMN
> >> CELL
> >>
> >>  u:check-td                     timestamp=1422651539493,
> >> value=2015-01-30T07:53:17Z
> >>  u:checked                      timestamp=1422651539493,
> >> value=\xFF
> >>  u:imp-td                       timestamp=1414402209086,
> >> value=2014-09-11T06:51:41Z
> >>  u:raw-url                      timestamp=1411476725886, value=
> >> http://RCI.CO.ZA
> >>  u:status                       timestamp=1411476725886,
> >> value=\x00\x00\x00\xC8
> >> 5 row(s) in 0.0300 seconds
> >>
> >> But when I do query from Drill, I get some big negative number:
> >>
> >> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> >> CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE
> row_key
> >> = 'AZ.OC.ICR'
> >> . . . . . . . . . . . > ;
> >> +------------+-------------+
> >> |    key     |   status    |
> >> +------------+-------------+
> >> | AZ.OC.ICR  | -939524096  |
> >> +------------+-------------+
> >>
> >> What is the correct way of converting binary data into corresponding
> >> representation?
> >>
> >> Thank you
> >>
> >> --
> >> With best wishes,                    Alex Ott
> >> http://alexott.net/
> >> Twitter: alexott_en (English), alexott (Russian)
> >> Skype: alex.ott
> >>
> >
> >
> >
> > --
> > With best wishes,                    Alex Ott
> > http://alexott.net/
> > Twitter: alexott_en (English), alexott (Russian)
> > Skype: alex.ott
>



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott

Re: Several questions...

Posted by Ted Dunning <te...@gmail.com>.
Yes.  Just right on that.

Regarding the integer conversion, can you saw what format your data is
in?  Is it exactly 4 bytes, big endian?



On Wed, Jul 22, 2015 at 5:34 AM, Alex Ott <al...@gmail.com> wrote:
> Ok, answering my first question - I need to take the only the column name
> into the backquotes, instead of taking the complete coordinates:
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CONVERT_FROM(urls.u.`raw-url`, 'UTF8') AS url FROM hbase.urls WHERE row_key
> = 'AZ.OC.ICR';
> +------------+-------------------+
> |    key     |        url        |
> +------------+-------------------+
> | AZ.OC.ICR  | http://RCI.CO.ZA  |
> +------------+-------------------+
>
>
> On Wed, Jul 22, 2015 at 1:22 PM, Alex Ott <al...@gmail.com> wrote:
>
>> Hello
>>
>> I'm starting to play with Apache Drill & try to use it with HBase.
>>
>> I have following questions:
>> - I have HBase table, where some columns have minus sign ('-') in the
>> name, like, 'raw-url', etc.  How I can query this table & do conversion of
>> the the corresponding columns? I tried to use single quotes around name,
>> but in this case the name of column itself is returned:
>>
>> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
>> CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
>> hbase.urls WHERE row_key = 'AZ.OC.ICR';
>> +------------+-----------------+
>> |    key     |       url       |
>> +------------+-----------------+
>> | AZ.OC.ICR  | urls.u.raw-url  |
>> +------------+-----------------+
>>
>> Use of backquotes or double quotes leads to the error.
>>
>> - Another question is about data conversion - I have 'status' column that
>> holds integer value (as binary), but when I'm trying to convert it from
>> binary to INT, then I get value different from stored in the DB:
>>
>> For example, for this row I have status field equal to 200 (0xC8)
>>
>> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
>> COLUMN
>> CELL
>>
>>  u:check-td                     timestamp=1422651539493,
>> value=2015-01-30T07:53:17Z
>>  u:checked                      timestamp=1422651539493,
>> value=\xFF
>>  u:imp-td                       timestamp=1414402209086,
>> value=2014-09-11T06:51:41Z
>>  u:raw-url                      timestamp=1411476725886, value=
>> http://RCI.CO.ZA
>>  u:status                       timestamp=1411476725886,
>> value=\x00\x00\x00\xC8
>> 5 row(s) in 0.0300 seconds
>>
>> But when I do query from Drill, I get some big negative number:
>>
>> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
>> CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE row_key
>> = 'AZ.OC.ICR'
>> . . . . . . . . . . . > ;
>> +------------+-------------+
>> |    key     |   status    |
>> +------------+-------------+
>> | AZ.OC.ICR  | -939524096  |
>> +------------+-------------+
>>
>> What is the correct way of converting binary data into corresponding
>> representation?
>>
>> Thank you
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>> Skype: alex.ott
>>
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott

Re: Several questions...

Posted by Alex Ott <al...@gmail.com>.
Ok, answering my first question - I need to take the only the column name
into the backquotes, instead of taking the complete coordinates:

0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
CONVERT_FROM(urls.u.`raw-url`, 'UTF8') AS url FROM hbase.urls WHERE row_key
= 'AZ.OC.ICR';
+------------+-------------------+
|    key     |        url        |
+------------+-------------------+
| AZ.OC.ICR  | http://RCI.CO.ZA  |
+------------+-------------------+


On Wed, Jul 22, 2015 at 1:22 PM, Alex Ott <al...@gmail.com> wrote:

> Hello
>
> I'm starting to play with Apache Drill & try to use it with HBase.
>
> I have following questions:
> - I have HBase table, where some columns have minus sign ('-') in the
> name, like, 'raw-url', etc.  How I can query this table & do conversion of
> the the corresponding columns? I tried to use single quotes around name,
> but in this case the name of column itself is returned:
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CONVERT_FROM('urls.u.raw-url', 'UTF8') AS url FROM
> hbase.urls WHERE row_key = 'AZ.OC.ICR';
> +------------+-----------------+
> |    key     |       url       |
> +------------+-----------------+
> | AZ.OC.ICR  | urls.u.raw-url  |
> +------------+-----------------+
>
> Use of backquotes or double quotes leads to the error.
>
> - Another question is about data conversion - I have 'status' column that
> holds integer value (as binary), but when I'm trying to convert it from
> binary to INT, then I get value different from stored in the DB:
>
> For example, for this row I have status field equal to 200 (0xC8)
>
> hbase(main):004:0> get 'urls', 'AZ.OC.ICR', {COLUMN => 'u'}
> COLUMN
> CELL
>
>  u:check-td                     timestamp=1422651539493,
> value=2015-01-30T07:53:17Z
>  u:checked                      timestamp=1422651539493,
> value=\xFF
>  u:imp-td                       timestamp=1414402209086,
> value=2014-09-11T06:51:41Z
>  u:raw-url                      timestamp=1411476725886, value=
> http://RCI.CO.ZA
>  u:status                       timestamp=1411476725886,
> value=\x00\x00\x00\xC8
> 5 row(s) in 0.0300 seconds
>
> But when I do query from Drill, I get some big negative number:
>
> 0: jdbc:drill:zk=local> select CONVERT_FROM(row_key, 'UTF8') as key,
> CONVERT_FROM(urls.u.status, 'INT') AS status FROM hbase.urls WHERE row_key
> = 'AZ.OC.ICR'
> . . . . . . . . . . . > ;
> +------------+-------------+
> |    key     |   status    |
> +------------+-------------+
> | AZ.OC.ICR  | -939524096  |
> +------------+-------------+
>
> What is the correct way of converting binary data into corresponding
> representation?
>
> Thank you
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott