You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Kirankumar Gowdru <kg...@salesforce.com> on 2015/10/20 20:33:35 UTC

Apache Drill to query OpenTSDB

I'm new to apache Drill. I have few questions.

1.An we use Apache drill to query TSDB? If so what is the bets approach?
2. Can we use drill to query to multiple hbase clusters?

Thanks,
Kiran

Re: Apache Drill to query OpenTSDB

Posted by Jim Scott <js...@maprtech.com>.
Here is what I put in the other thread:
You will have a problem when querying data stored in a table with OpenTSDB
formatted data, because the table can have data stored in a combination of
non-compacted, compacted and both compacted and non at the same time.

Drill should be able to query the non-compacted form (still binary data).
Drill cannot query the compacted form out of the box, and would require a
special adapter to decode the blobs. The mixed form would require an
adapter to recognize both formats at the same time.

This is not a simple task.


On Tue, Oct 20, 2015 at 4:01 PM, Jim Scott <js...@maprtech.com> wrote:

> There is another thread here on the user forum that started a couple days
> ago, that I personally responded to on this EXACT same question.
>
> On Tue, Oct 20, 2015 at 2:02 PM, Steven Phillips <st...@dremio.com>
> wrote:
>
>> 1. You might be able to run a query against OpenTSDB, but I'm not sure if
>> you will really be able to easily do anything useful right now. Every
>> column qualifier in an HBase table results in a column in Drill. In the
>> OpenTSDB format, the column qualifiers are simply time offsets from the
>> base timestamp which is encoded in the row key. And I believe this offset
>> could be either seconds or milliseconds, and a single row holds the data
>> for an entire hour. So there could very easily be thousands of columns.
>>
>> Another potential issue is that the column qualifiers are not Strings, but
>> some encoded integer. I am not sure Drill allows non-printable column
>> names. I'm not sure how Drill will handle the case of non-UTF8 column
>> qualifiers.
>>
>> If we can get past those potential issues, I think you would want to use
>> KVGEN and FLATTEN. Once you've done this, you could filter based on the
>> rowkey and the offset in order to return the data within a time range.
>>
>> 2. Yes, you can query multiple clusters. Just configure separate hbase
>> plugins in the Storage panel of the Web UI.
>>
>> On Tue, Oct 20, 2015 at 11:33 AM, Kirankumar Gowdru <
>> kgowdru@salesforce.com>
>> wrote:
>>
>> > I'm new to apache Drill. I have few questions.
>> >
>> > 1.An we use Apache drill to query TSDB? If so what is the bets approach?
>> > 2. Can we use drill to query to multiple hbase clusters?
>> >
>> > Thanks,
>> > Kiran
>> >
>>
>
>
>
> --
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
>
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Apache Drill to query OpenTSDB

Posted by Jim Scott <js...@maprtech.com>.
There is another thread here on the user forum that started a couple days
ago, that I personally responded to on this EXACT same question.

On Tue, Oct 20, 2015 at 2:02 PM, Steven Phillips <st...@dremio.com> wrote:

> 1. You might be able to run a query against OpenTSDB, but I'm not sure if
> you will really be able to easily do anything useful right now. Every
> column qualifier in an HBase table results in a column in Drill. In the
> OpenTSDB format, the column qualifiers are simply time offsets from the
> base timestamp which is encoded in the row key. And I believe this offset
> could be either seconds or milliseconds, and a single row holds the data
> for an entire hour. So there could very easily be thousands of columns.
>
> Another potential issue is that the column qualifiers are not Strings, but
> some encoded integer. I am not sure Drill allows non-printable column
> names. I'm not sure how Drill will handle the case of non-UTF8 column
> qualifiers.
>
> If we can get past those potential issues, I think you would want to use
> KVGEN and FLATTEN. Once you've done this, you could filter based on the
> rowkey and the offset in order to return the data within a time range.
>
> 2. Yes, you can query multiple clusters. Just configure separate hbase
> plugins in the Storage panel of the Web UI.
>
> On Tue, Oct 20, 2015 at 11:33 AM, Kirankumar Gowdru <
> kgowdru@salesforce.com>
> wrote:
>
> > I'm new to apache Drill. I have few questions.
> >
> > 1.An we use Apache drill to query TSDB? If so what is the bets approach?
> > 2. Can we use drill to query to multiple hbase clusters?
> >
> > Thanks,
> > Kiran
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Apache Drill to query OpenTSDB

Posted by Steven Phillips <st...@dremio.com>.
1. You might be able to run a query against OpenTSDB, but I'm not sure if
you will really be able to easily do anything useful right now. Every
column qualifier in an HBase table results in a column in Drill. In the
OpenTSDB format, the column qualifiers are simply time offsets from the
base timestamp which is encoded in the row key. And I believe this offset
could be either seconds or milliseconds, and a single row holds the data
for an entire hour. So there could very easily be thousands of columns.

Another potential issue is that the column qualifiers are not Strings, but
some encoded integer. I am not sure Drill allows non-printable column
names. I'm not sure how Drill will handle the case of non-UTF8 column
qualifiers.

If we can get past those potential issues, I think you would want to use
KVGEN and FLATTEN. Once you've done this, you could filter based on the
rowkey and the offset in order to return the data within a time range.

2. Yes, you can query multiple clusters. Just configure separate hbase
plugins in the Storage panel of the Web UI.

On Tue, Oct 20, 2015 at 11:33 AM, Kirankumar Gowdru <kg...@salesforce.com>
wrote:

> I'm new to apache Drill. I have few questions.
>
> 1.An we use Apache drill to query TSDB? If so what is the bets approach?
> 2. Can we use drill to query to multiple hbase clusters?
>
> Thanks,
> Kiran
>