You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/01/18 22:25:26 UTC
[jira] [Updated] (CALCITE-1591) Druid adapter: Use "groupBy" query with extractionFn for time dimension

     [ https://issues.apache.org/jira/browse/CALCITE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julian Hyde updated CALCITE-1591:
---------------------------------
    Component/s: druid

> Druid adapter: Use "groupBy" query with extractionFn for time dimension
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-1591
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1591
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>
> For queries that aggregate on the time dimension, or a function of it such as {{FLOOR(__time TO DAY)}}, as of the fix for CALCITE-1579 we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.
> In CALCITE-1578, [~nishantbangarwa] gives the following example query:
> {code}
> {
>   "queryType": "groupBy",
>   "dataSource": "druid_tpcds_ss_sold_time_subset",
>   "granularity": "ALL",
>   "dimensions": [
>     "i_brand_id",
>     {
>       "type" : "extraction",
>       "dimension" : "__time",
>       "outputName" :  "year",
>       "extractionFn" : {
>         "type" : "timeFormat",
>         "granularity" : "YEAR"
>       }
>     }
>   ],
>   "limitSpec": {
>     "type": "default",
>     "limit": 10,
>     "columns": [
>       {
>         "dimension": "$f3",
>         "direction": "ascending"
>       }
>     ]
>   },
>   "aggregations": [
>     {
>       "type": "longMax",
>       "name": "$f2",
>       "fieldName": "ss_quantity"
>     },
>     {
>       "type": "doubleSum",
>       "name": "$f3",
>       "fieldName": "ss_wholesale_cost"
>     }
>   ],
>   "intervals": [
>     "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
>   ]
> }
> {code}
> and for {{DruidAdapterIt. testGroupByDaySortDescLimit}}, [~bslim] suggests
> {code}
> {
>   "queryType": "groupBy",
>   "dataSource": "foodmart",
>   "granularity": "all",
>   "dimensions": [
>     "brand_name",
>     {
>       "type": "extraction",
>       "dimension": "__time",
>       "outputName": "day",
>       "extractionFn": {
>         "type": "timeFormat",
>         "granularity": "DAY"
>       }
>     }
>   ],
>   "aggregations": [
>     {
>       "type": "longSum",
>       "name": "S",
>       "fieldName": "unit_sales"
>     }
>   ],
>   "limitSpec": {
>     "type": "default",
>     "limit": 30,
>     "columns": [
>       {
>         "dimension": "S",
>         "direction": "ascending"
>       }
>     ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)