You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "slim bouguerra (JIRA)" <ji...@apache.org> on 2019/06/28 18:44:00 UTC

[jira] [Commented] (HIVE-21934) Materialized view on top of Druid not pushing every thing

    [ https://issues.apache.org/jira/browse/HIVE-21934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875140#comment-16875140 ] 

slim bouguerra commented on HIVE-21934:
---------------------------------------

view definition

{code}


CREATE MATERIALIZED VIEW `ssb_mv_druid_100`
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES (
 "druid.segment.granularity" = "MONTH",
 "druid.query.granularity" = "DAY")
AS
SELECT
 `__time` as `__time` ,
 cast(c_city as string) c_city,
 cast(c_nation as string) c_nation,
 cast(c_region as string) c_region,
 c_mktsegment as c_mktsegment,
 cast(d_weeknuminyear as string) d_weeknuminyear,
 cast(d_year as string) d_year,
 cast(d_yearmonth as string) d_yearmonth,
 cast(d_yearmonthnum as string) d_yearmonthnum,
 cast(p_brand1 as string) p_brand1,
 cast(p_category as string) p_category,
 cast(p_mfgr as string) p_mfgr,
 p_type,
 s_name,
 cast(s_city as string) s_city,
 cast(s_nation as string) s_nation,
 cast(s_region as string) s_region,
 cast(`lo_ordpriority` as string) lo_ordpriority, 
 cast(`lo_shippriority` as string) lo_shippriority, 
 `d_sellingseason`
 `lo_shipmode`, 
 lo_revenue,
 lo_supplycost ,
 lo_discount ,
 `lo_quantity`, 
 `lo_extendedprice`, 
 `lo_ordtotalprice`, 
 lo_extendedprice * lo_discount discounted_price,
 lo_revenue - lo_supplycost net_revenue
FROM
 customer_n1, dates_n1, lineorder_n1, ssb_part_n0, supplier_n0
where
 lo_orderdate = d_datekey
 and lo_partkey = p_partkey
 and lo_suppkey = s_suppkey
 and lo_custkey = c_custkey;

{code}

> Materialized view on top of Druid not pushing every thing
> ---------------------------------------------------------
>
>                 Key: HIVE-21934
>                 URL: https://issues.apache.org/jira/browse/HIVE-21934
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: slim bouguerra
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>
> The title is not very informative, but examples hopefully are.
> this is the plan with the view
> {code}
> explain SELECT MONTH(`dates_n1`.`__time`) AS `mn___time_ok`,
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`dates_n1`.`__time`) AS `yr___time_ok`
> FROM `mv_ssb_100_scale`.`lineorder_n0` `lineorder_n0`
> JOIN `mv_ssb_100_scale`.`dates_n1` `dates_n1` ON (`lineorder_n0`.`lo_orderdate` = `dates_n1`.`d_datekey`)
> JOIN `mv_ssb_100_scale`.`customer_n1` `customer_n1` ON (`lineorder_n0`.`lo_custkey` = `customer_n1`.`c_custkey`)
> JOIN `mv_ssb_100_scale`.`supplier_n0` `supplier_n0` ON (`lineorder_n0`.`lo_suppkey` = `supplier_n0`.`s_suppkey`)
> JOIN `mv_ssb_100_scale`.`ssb_part_n0` `ssb_part_n0` ON (`lineorder_n0`.`lo_partkey` = `ssb_part_n0`.`p_partkey`)
> GROUP BY MONTH(`dates_n1`.`__time`),
> CAST((MONTH(`dates_n1`.`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`dates_n1`.`__time`)
> INFO : Starting task [Stage-3:EXPLAIN] in serial mode
> INFO : Completed executing command(queryId=sbouguerra_20190627113101_1493ee87-0288-4e30-b53c-0ee729ce3977); Time taken: 0.005 seconds
> INFO : OK
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Vertex dependency in root stage |
> | Reducer 2 <- Map 1 (SIMPLE_EDGE) |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Stage-1 |
> | Reducer 2 vectorized, llap |
> | File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"] |
> | Group By Operator [GBY_11] (rows=300018951 width=38) |
> | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1, KEY._col2 |
> | <-Map 1 [SIMPLE_EDGE] vectorized, llap |
> | SHUFFLE [RS_10] |
> | PartitionCols:_col0, _col1, _col2 |
> | Group By Operator [GBY_9] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2","_col3"],aggregations:["sum(1)"],keys:_col0, _col1, _col2 |
> | Select Operator [SEL_8] (rows=600037902 width=38) |
> | Output:["_col0","_col1","_col2"] |
> | TableScan [TS_0] (rows=600037902 width=38) |
> | mv_ssb_100_scale@ssb_mv_druid_100,ssb_mv_druid_100,Tbl:COMPLETE,Col:NONE,Output:["vc"],properties:\{"druid.fieldNames":"vc","druid.fieldTypes":"timestamp","druid.query.json":"{\"queryType\":\"scan\",\"dataSource\":\"mv_ssb_100_scale.ssb_mv_druid_100\",\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"virtualColumns\":[{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"\\\"__time\\\"\",\"outputType\":\"LONG\"}],\"columns\":[\"vc\"],\"resultFormat\":\"compactedList\"}","druid.query.type":"scan"} |
> | |
> +----------------------------------------------------+
>  
> {code}
> if i use a simple druid table without MV 
> {code}
> explain SELECT MONTH(`__time`) AS `mn___time_ok`,
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT) AS `qr___time_ok`,
> SUM(1) AS `sum_number_of_records_ok`,
> YEAR(`__time`) AS `yr___time_ok`
> FROM `druid_ssb.ssb_druid_100`
> GROUP BY MONTH(`__time`),
> CAST((MONTH(`__time`) - 1) / 3 + 1 AS BIGINT),
> YEAR(`__time`);
> {code}
> {code}
> +----------------------------------------------------+
> | Explain |
> +----------------------------------------------------+
> | Plan optimized by CBO. |
> | |
> | Stage-0 |
> | Fetch Operator |
> | limit:-1 |
> | Select Operator [SEL_1] |
> | Output:["_col0","_col1","_col2","_col3"] |
> | TableScan [TS_0] |
> | Output:["extract_month","vc","$f3","extract_year"],properties:\{"druid.fieldNames":"extract_month,vc,extract_year,$f3","druid.fieldTypes":"int,bigint,int,bigint","druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_ssb.ssb_druid_100\",\"granularity\":\"all\",\"dimensions\":[{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_month\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"M\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}},\{\"type\":\"default\",\"dimension\":\"vc\",\"outputName\":\"vc\",\"outputType\":\"LONG\"},\{\"type\":\"extraction\",\"dimension\":\"__time\",\"outputName\":\"extract_year\",\"extractionFn\":{\"type\":\"timeFormat\",\"format\":\"yyyy\",\"timeZone\":\"America/New_York\",\"locale\":\"en-US\"}}],\"virtualColumns\":[\{\"type\":\"expression\",\"name\":\"vc\",\"expression\":\"CAST(((CAST((timestamp_extract(\\\"__time\\\",'MONTH','America/New_York') - 1), 'DOUBLE') / CAST(3, 'DOUBLE')) + CAST(1, 'DOUBLE')), 'LONG')\",\"outputType\":\"LONG\"}],\"limitSpec\":\{\"type\":\"default\"},\"aggregations\":[\{\"type\":\"longSum\",\"name\":\"$f3\",\"expression\":\"1\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"} |
> | |
> +----------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)