You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Nicholas Iacobucci (Jira)" <ji...@apache.org> on 2019/08/31 14:11:00 UTC
[jira] [Created] (DRILL-7363) OpenTSDB Storage Plugin - Speed Up Query Planning

Nicholas Iacobucci created DRILL-7363:
-----------------------------------------

             Summary: OpenTSDB Storage Plugin - Speed Up Query Planning
                 Key: DRILL-7363
                 URL: https://issues.apache.org/jira/browse/DRILL-7363
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - Other
            Reporter: Nicholas Iacobucci


In the current implementation of the OpenTSDB storage plugin, simple queries that should return within 100ms will take at least 90 to 120 seconds of planning time.

While Drill is planning the query prior to execution, watching the OpenTSDB incoming query log shows many inefficient queries. For example, there are often upwards of 20 to 30 queries asking for all metrics from 47 years ago to be returned even though the original query passed to Drill has provided a start time less than this. Each of these queries takes 2-3 seconds to complete with our current small dataset.

From what I can tell, this is related to the storage plugin preparing the output columns and how it needs to try and resolve all tags so it can include them as columns. This can be seen in the *setupStructure()* method in the *Schema* constructor. 
(contrib\storage-opentsdb\src\main\java\org\apache\drill\exec\store\openTSDB\client\Schema.java)

I believe the storage plugin is getting every data point in the requested metric so that it can be confidant all tags will have an SQL column attributed to it.

I propose to modify the storage plugin and investigate an alternate way of enumerating all tags within a metric using the OpenTSDB metadata tables. It should be possible to query the metadata for a given metric name and have OpenTSDB return all available tags and values that exist in that metric.

The API endpoint is /api/search/lookup: [http://opentsdb.net/docs/build/html/api_http/search/lookup.html]

This will require the OpenTSDB server to have either 'realtime ts tracking/incrementing' enabled or to have the command 'tsdb uid metasync' run on a schedule. This keeps OpenTSDB's metadata tables up to date.

 

Further, there may be a way to open up tag filters to be sent in the Drill SQL query which can further improve query speed. If the end user knows what tag they want to filter on and are using an SQL WHERE <tag> = <value> clause, this occurs inside Drill once it obtains the unfiltered dataset from OpenTSDB, though OpenTSDB can do the filtering.

 

I will open a pull request once I have a base implementation ready, though I am interested in any comments, feedback or discussion.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)