You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2018/04/18 18:49:00 UTC

[jira] [Commented] (CALCITE-1591) Druid adapter: Use "groupBy" query with extractionFn for time dimension

    [ https://issues.apache.org/jira/browse/CALCITE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443028#comment-16443028 ] 

slim bouguerra commented on CALCITE-1591:
-----------------------------------------

i think this one can be closed too, we are using Extraction functions to project expression on the top of Druid columns. 

> Druid adapter: Use "groupBy" query with extractionFn for time dimension
> -----------------------------------------------------------------------
>
>                 Key: CALCITE-1591
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1591
>             Project: Calcite
>          Issue Type: Bug
>          Components: druid
>            Reporter: Julian Hyde
>            Assignee: Julian Hyde
>            Priority: Major
>
> For queries that aggregate on the time dimension, or a function of it such as {{FLOOR(__time TO DAY)}}, as of the fix for CALCITE-1579 we generate a "groupBy" query that does not sort or apply limit. It would be better (in the sense that Druid is doing more of the work, and Hive is doing less work) if we use an extractionFn to create a dimension that we can sort on.
> In CALCITE-1578, [~nishantbangarwa] gives the following example query:
> {code}
> {
>   "queryType": "groupBy",
>   "dataSource": "druid_tpcds_ss_sold_time_subset",
>   "granularity": "ALL",
>   "dimensions": [
>     "i_brand_id",
>     {
>       "type" : "extraction",
>       "dimension" : "__time",
>       "outputName" :  "year",
>       "extractionFn" : {
>         "type" : "timeFormat",
>         "granularity" : "YEAR"
>       }
>     }
>   ],
>   "limitSpec": {
>     "type": "default",
>     "limit": 10,
>     "columns": [
>       {
>         "dimension": "$f3",
>         "direction": "ascending"
>       }
>     ]
>   },
>   "aggregations": [
>     {
>       "type": "longMax",
>       "name": "$f2",
>       "fieldName": "ss_quantity"
>     },
>     {
>       "type": "doubleSum",
>       "name": "$f3",
>       "fieldName": "ss_wholesale_cost"
>     }
>   ],
>   "intervals": [
>     "1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
>   ]
> }
> {code}
> and for {{DruidAdapterIt. testGroupByDaySortDescLimit}}, [~bslim] suggests
> {code}
> {
>   "queryType": "groupBy",
>   "dataSource": "foodmart",
>   "granularity": "all",
>   "dimensions": [
>     "brand_name",
>     {
>       "type": "extraction",
>       "dimension": "__time",
>       "outputName": "day",
>       "extractionFn": {
>         "type": "timeFormat",
>         "granularity": "DAY"
>       }
>     }
>   ],
>   "aggregations": [
>     {
>       "type": "longSum",
>       "name": "S",
>       "fieldName": "unit_sales"
>     }
>   ],
>   "limitSpec": {
>     "type": "default",
>     "limit": 30,
>     "columns": [
>       {
>         "dimension": "S",
>         "direction": "ascending"
>       }
>     ]
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)