You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Christopher Matta <cm...@mapr.com> on 2015/04/21 17:38:55 UTC

Querying OpenTDSB data stored in HBase

I’m trying to use Drill to query time-series data stored in OpenTSDB
<http://opentsdb.net/>. The row keys are supposed to be byte array encoded
according to this schema:
http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html

When trying to do a simple CONVERT_FROM I get the following results:

0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
+------------+
|   EXPR$0   |
+------------+
| U5p |
| U5p! |
| U5p" |
| U5p% |
| U5p& |
| U5p* |
| U5"� |
| U5"�! |

Which is a long way off of what’s described in the opentsdb schema page.

Any tips for properly querying this kind of data?

Chris Matta
cmatta@mapr.com
215-701-3146

Re: Querying OpenTDSB data stored in HBase

Posted by Jacques Nadeau <ja...@apache.org>.

If you want pushdown, you'd either need to add a new optimizer rule that
understood your function or use a function that support udf.

On Tue, Apr 21, 2015 at 9:24 AM, Christopher Matta <cm...@mapr.com> wrote:

> Thanks Jacques, I'll take a look. In this case would predicate queries end
> up doing full scans of the data?
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Tue, Apr 21, 2015 at 11:55 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > It doesn't look like the key has any UTF8 data in it.  I recommend you
> > create a UDF that breaks the bytes apart into the separate sections
> using a
> > complex output.  This way you could write something like this:
> >
> > select row.metric, row.tag[0].key, row.tag[0].value (
> > SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
> > )x
> >
> > The OPENTSDB_ROW function would take in a VarBinary and return a
> > ComplexWriter.  The code would then map the bytes to a logical structure
> > like this:
> >
> > {
> >   metric: xxx,
> >   time: xxx,
> >   tags: [
> >     {key: xxx, value xxx},
> >     {key: xxx, value xxx}
> >   ]
> > }
> >
> > On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > I’m trying to use Drill to query time-series data stored in OpenTSDB
> > > <http://opentsdb.net/>. The row keys are supposed to be byte array
> > encoded
> > > according to this schema:
> > > http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
> > >
> > > When trying to do a simple CONVERT_FROM I get the following results:
> > >
> > > 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
> > > CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > | U5p |
> > > | U5p! |
> > > | U5p" |
> > > | U5p% |
> > > | U5p& |
> > > | U5p* |
> > > | U5"� |
> > > | U5"�! |
> > >
> > > Which is a long way off of what’s described in the opentsdb schema
> page.
> > >
> > > Any tips for properly querying this kind of data?
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > > 
> > >
> >
>

Re: Querying OpenTDSB data stored in HBase

Posted by Ted Dunning <te...@gmail.com>.

IF you blow apart the data as a list, you have scanned the data.
Flattening from there will give you sample per row representation.

There won't be any pushdown of filtering into the UDF, but this should be
really, really fast anyway.


On Tue, Apr 21, 2015 at 12:24 PM, Christopher Matta <cm...@mapr.com> wrote:

> Thanks Jacques, I'll take a look. In this case would predicate queries end
> up doing full scans of the data?
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Tue, Apr 21, 2015 at 11:55 AM, Jacques Nadeau <ja...@apache.org>
> wrote:
>
> > It doesn't look like the key has any UTF8 data in it.  I recommend you
> > create a UDF that breaks the bytes apart into the separate sections
> using a
> > complex output.  This way you could write something like this:
> >
> > select row.metric, row.tag[0].key, row.tag[0].value (
> > SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
> > )x
> >
> > The OPENTSDB_ROW function would take in a VarBinary and return a
> > ComplexWriter.  The code would then map the bytes to a logical structure
> > like this:
> >
> > {
> >   metric: xxx,
> >   time: xxx,
> >   tags: [
> >     {key: xxx, value xxx},
> >     {key: xxx, value xxx}
> >   ]
> > }
> >
> > On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > I’m trying to use Drill to query time-series data stored in OpenTSDB
> > > <http://opentsdb.net/>. The row keys are supposed to be byte array
> > encoded
> > > according to this schema:
> > > http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
> > >
> > > When trying to do a simple CONVERT_FROM I get the following results:
> > >
> > > 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
> > > CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > | U5p |
> > > | U5p! |
> > > | U5p" |
> > > | U5p% |
> > > | U5p& |
> > > | U5p* |
> > > | U5"� |
> > > | U5"�! |
> > >
> > > Which is a long way off of what’s described in the opentsdb schema
> page.
> > >
> > > Any tips for properly querying this kind of data?
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > > 
> > >
> >
>

Re: Querying OpenTDSB data stored in HBase

Posted by Christopher Matta <cm...@mapr.com>.

Thanks Jacques, I'll take a look. In this case would predicate queries end
up doing full scans of the data?

Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Apr 21, 2015 at 11:55 AM, Jacques Nadeau <ja...@apache.org> wrote:

> It doesn't look like the key has any UTF8 data in it.  I recommend you
> create a UDF that breaks the bytes apart into the separate sections using a
> complex output.  This way you could write something like this:
>
> select row.metric, row.tag[0].key, row.tag[0].value (
> SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
> )x
>
> The OPENTSDB_ROW function would take in a VarBinary and return a
> ComplexWriter.  The code would then map the bytes to a logical structure
> like this:
>
> {
>   metric: xxx,
>   time: xxx,
>   tags: [
>     {key: xxx, value xxx},
>     {key: xxx, value xxx}
>   ]
> }
>
> On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > I’m trying to use Drill to query time-series data stored in OpenTSDB
> > <http://opentsdb.net/>. The row keys are supposed to be byte array
> encoded
> > according to this schema:
> > http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
> >
> > When trying to do a simple CONVERT_FROM I get the following results:
> >
> > 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
> > CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | U5p |
> > | U5p! |
> > | U5p" |
> > | U5p% |
> > | U5p& |
> > | U5p* |
> > | U5"� |
> > | U5"�! |
> >
> > Which is a long way off of what’s described in the opentsdb schema page.
> >
> > Any tips for properly querying this kind of data?
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> > 
> >
>

Re: Querying OpenTDSB data stored in HBase

Posted by Andries Engelbrecht <ae...@maprtech.com>.

Maybe look at using BYTE_SUBSTR to deconstruct the row_key in the elements of openTSDB, and then convert each element as appropriate.

http://drill.apache.org/docs/string-manipulation/#byte_substr


—Andries




On Apr 21, 2015, at 8:56 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Quick edit, my query should have used "tags" instead of "tag".  Corrected
> below:
> 
> SELECT row.metric, row.tags[0].key, row.tags[0].value (
>  SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
> )x
> 
> On Tue, Apr 21, 2015 at 8:55 AM, Jacques Nadeau <ja...@apache.org> wrote:
> 
>> It doesn't look like the key has any UTF8 data in it.  I recommend you
>> create a UDF that breaks the bytes apart into the separate sections using a
>> complex output.  This way you could write something like this:
>> 
>> select row.metric, row.tag[0].key, row.tag[0].value (
>> SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
>> )x
>> 
>> The OPENTSDB_ROW function would take in a VarBinary and return a
>> ComplexWriter.  The code would then map the bytes to a logical structure
>> like this:
>> 
>> {
>>  metric: xxx,
>>  time: xxx,
>>  tags: [
>>    {key: xxx, value xxx},
>>    {key: xxx, value xxx}
>>  ]
>> }
>> 
>> On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com>
>> wrote:
>> 
>>> I’m trying to use Drill to query time-series data stored in OpenTSDB
>>> <http://opentsdb.net/>. The row keys are supposed to be byte array
>>> encoded
>>> according to this schema:
>>> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
>>> 
>>> When trying to do a simple CONVERT_FROM I get the following results:
>>> 
>>> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
>>> CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
>>> +------------+
>>> |   EXPR$0   |
>>> +------------+
>>> | U5p |
>>> | U5p! |
>>> | U5p" |
>>> | U5p% |
>>> | U5p& |
>>> | U5p* |
>>> | U5"� |
>>> | U5"�! |
>>> 
>>> Which is a long way off of what’s described in the opentsdb schema page.
>>> 
>>> Any tips for properly querying this kind of data?
>>> 
>>> Chris Matta
>>> cmatta@mapr.com
>>> 215-701-3146
>>> 
>>> 
>> 
>>

Re: Querying OpenTDSB data stored in HBase

Posted by Jacques Nadeau <ja...@apache.org>.

Quick edit, my query should have used "tags" instead of "tag".  Corrected
below:

SELECT row.metric, row.tags[0].key, row.tags[0].value (
  SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
)x

On Tue, Apr 21, 2015 at 8:55 AM, Jacques Nadeau <ja...@apache.org> wrote:

> It doesn't look like the key has any UTF8 data in it.  I recommend you
> create a UDF that breaks the bytes apart into the separate sections using a
> complex output.  This way you could write something like this:
>
> select row.metric, row.tag[0].key, row.tag[0].value (
> SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
> )x
>
> The OPENTSDB_ROW function would take in a VarBinary and return a
> ComplexWriter.  The code would then map the bytes to a logical structure
> like this:
>
> {
>   metric: xxx,
>   time: xxx,
>   tags: [
>     {key: xxx, value xxx},
>     {key: xxx, value xxx}
>   ]
> }
>
> On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com>
> wrote:
>
>> I’m trying to use Drill to query time-series data stored in OpenTSDB
>> <http://opentsdb.net/>. The row keys are supposed to be byte array
>> encoded
>> according to this schema:
>> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
>>
>> When trying to do a simple CONVERT_FROM I get the following results:
>>
>> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
>> CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | U5p |
>> | U5p! |
>> | U5p" |
>> | U5p% |
>> | U5p& |
>> | U5p* |
>> | U5"� |
>> | U5"�! |
>>
>> Which is a long way off of what’s described in the opentsdb schema page.
>>
>> Any tips for properly querying this kind of data?
>>
>> Chris Matta
>> cmatta@mapr.com
>> 215-701-3146
>> 
>>
>
>

Re: Querying OpenTDSB data stored in HBase

Posted by Jacques Nadeau <ja...@apache.org>.

It doesn't look like the key has any UTF8 data in it.  I recommend you
create a UDF that breaks the bytes apart into the separate sections using a
complex output.  This way you could write something like this:

select row.metric, row.tag[0].key, row.tag[0].value (
SELECT OPENTSDB_ROW(rowkey) as row from hbase.t1
)x

The OPENTSDB_ROW function would take in a VarBinary and return a
ComplexWriter.  The code would then map the bytes to a logical structure
like this:

{
  metric: xxx,
  time: xxx,
  tags: [
    {key: xxx, value xxx},
    {key: xxx, value xxx}
  ]
}

On Tue, Apr 21, 2015 at 8:38 AM, Christopher Matta <cm...@mapr.com> wrote:

> I’m trying to use Drill to query time-series data stored in OpenTSDB
> <http://opentsdb.net/>. The row keys are supposed to be byte array encoded
> according to this schema:
> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
>
> When trying to do a simple CONVERT_FROM I get the following results:
>
> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
> CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
> +------------+
> |   EXPR$0   |
> +------------+
> | U5p |
> | U5p! |
> | U5p" |
> | U5p% |
> | U5p& |
> | U5p* |
> | U5"� |
> | U5"�! |
>
> Which is a long way off of what’s described in the opentsdb schema page.
>
> Any tips for properly querying this kind of data?
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> 
>

Re: Querying OpenTDSB data stored in HBase

Posted by Carol McDonald <cm...@maprtech.com>.

You can use the CONVERT_TO and CONVERT_FROM functions to encode and decode
data that is binary or complex. For example, HBase stores data as encoded
VARBINARY data. To read HBase data in Drill, convert every column of an
HBase table *from* binary to an SQL data type while selecting the data.
http://drill.apache.org/docs/data-types/#convert_to-and-convert_from-data-types


On Tue, Apr 21, 2015 at 11:46 AM, Carol McDonald <cm...@maprtech.com>
wrote:

> I don't have the ostb table to test , but for byte number conversion
> something like
>
> convert_from(row_key , 'BIGINT_BE')
>
> https://cwiki.apache.org/confluence/display/DRILL/SQL+Functions
>
> The following table provides the data types that you use with the
> CONVERT_TO and CONVERT_FROM functions:
> *Type**Input Type**Output Type*BOOLEAN_BYTEbytes(1)booleanTINYINT_BE
> bytes(1)tinyintTINYINTbytes(1)tinyintSMALLINT_BEbytes(2)smallintSMALLINT
> bytes(2)smallintINT_BEbytes(4)intINTbytes(4)intBIGINT_BEbytes(8)bigint
> BIGINTbytes(8)bigintFLOATbytes(4)float (float4)DOUBLEbytes(8)double
> (float8)INT_HADOOPVbytes(1-9)intBIGINT_HADOOPVbytes(1-9)bigint
> DATE_EPOCH_BEbytes(8)dateDATE_EPOCHbytes(8)dateTIME_EPOCH_BEbytes(8)time
> TIME_EPOCHbytes(8)timeUTF8bytesvarcharUTF16bytesvar16charUINT8bytes(8)
> uint8
>
>
> On Tue, Apr 21, 2015 at 11:38 AM, Christopher Matta <cm...@mapr.com>
> wrote:
>
>> I’m trying to use Drill to query time-series data stored in OpenTSDB
>> <http://opentsdb.net/>. The row keys are supposed to be byte array
>> encoded
>> according to this schema:
>> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
>>
>> When trying to do a simple CONVERT_FROM I get the following results:
>>
>> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
>> CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | U5p |
>> | U5p! |
>> | U5p" |
>> | U5p% |
>> | U5p& |
>> | U5p* |
>> | U5"� |
>> | U5"�! |
>>
>> Which is a long way off of what’s described in the opentsdb schema page.
>>
>> Any tips for properly querying this kind of data?
>>
>> Chris Matta
>> cmatta@mapr.com
>> 215-701-3146
>> 
>>
>
>

Re: Querying OpenTDSB data stored in HBase

Posted by Carol McDonald <cm...@maprtech.com>.

I don't have the ostb table to test , but for byte number conversion
something like

convert_from(row_key , 'BIGINT_BE')

https://cwiki.apache.org/confluence/display/DRILL/SQL+Functions

The following table provides the data types that you use with the
CONVERT_TO and CONVERT_FROM functions:
*Type**Input Type**Output Type*BOOLEAN_BYTEbytes(1)booleanTINYINT_BEbytes(1)
tinyintTINYINTbytes(1)tinyintSMALLINT_BEbytes(2)smallintSMALLINTbytes(2)
smallintINT_BEbytes(4)intINTbytes(4)intBIGINT_BEbytes(8)bigintBIGINTbytes(8)
bigintFLOATbytes(4)float (float4)DOUBLEbytes(8)double (float8)INT_HADOOPV
bytes(1-9)intBIGINT_HADOOPVbytes(1-9)bigintDATE_EPOCH_BEbytes(8)date
DATE_EPOCHbytes(8)dateTIME_EPOCH_BEbytes(8)timeTIME_EPOCHbytes(8)timeUTF8
bytesvarcharUTF16bytesvar16charUINT8bytes(8)uint8


On Tue, Apr 21, 2015 at 11:38 AM, Christopher Matta <cm...@mapr.com> wrote:

> I’m trying to use Drill to query time-series data stored in OpenTSDB
> <http://opentsdb.net/>. The row keys are supposed to be byte array encoded
> according to this schema:
> http://opentsdb.net/docs/build/html/user_guide/backends/hbase.html
>
> When trying to do a simple CONVERT_FROM I get the following results:
>
> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select
> CONVERT_FROM(row_key, 'UTF8') from tsdb limit 100;
> +------------+
> |   EXPR$0   |
> +------------+
> | U5p |
> | U5p! |
> | U5p" |
> | U5p% |
> | U5p& |
> | U5p* |
> | U5"� |
> | U5"�! |
>
> Which is a long way off of what’s described in the opentsdb schema page.
>
> Any tips for properly querying this kind of data?
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> 
>