You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/07/11 00:51:00 UTC
[jira] [Work logged] (HIVE-21934) Materialized view on top of Druid not pushing everything

     [ https://issues.apache.org/jira/browse/HIVE-21934?focusedWorklogId=275122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-275122 ]

ASF GitHub Bot logged work on HIVE-21934:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Jul/19 00:50
            Start Date: 11/Jul/19 00:50
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on pull request #717: HIVE-21934
URL: https://github.com/apache/hive/pull/717
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 275122)
            Time Spent: 10m
    Remaining Estimate: 0h

> Materialized view on top of Druid not pushing everything
> --------------------------------------------------------
>
>                 Key: HIVE-21934
>                 URL: https://issues.apache.org/jira/browse/HIVE-21934
>             Project: Hive
>          Issue Type: Improvement
>          Components: Druid integration, Materialized views
>            Reporter: slim bouguerra
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); Time taken: 0.005 seconds
> INFO : OK
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} |
> | |
> +----------------------------------------------------+
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`__time`) AS `yr___time_ok`
> FROM `druid_ssb.ssb_druid_100`
> GROUP BY MONTH(`__time`),
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`__time`);
> {code}
> {code}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Select Operator [SEL_1] |
> | Output:["_col0","_col1","_col2","_col3"] |
> | TableScan [TS_0] |
> | Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} |
> | |
> +----------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)