You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/01/22 14:40:00 UTC

[jira] [Updated] (HUDI-258) Hive Query engine not supporting join queries between RT and RO tables

     [ https://issues.apache.org/jira/browse/HUDI-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-258:
-------------------------------------
    Labels: bug-bash-0.6.0 help-requested user-support-issues  (was: bug-bash-0.6.0 help-requested)

> Hive Query engine not supporting join queries between RT and RO tables
> ----------------------------------------------------------------------
>
>                 Key: HUDI-258
>                 URL: https://issues.apache.org/jira/browse/HUDI-258
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>            Reporter: Balaji Varadarajan
>            Assignee: Nishith Agarwal
>            Priority: Major
>              Labels: bug-bash-0.6.0, help-requested, user-support-issues
>
> Description : [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
>  
> Root Cause: Hive is tracking getSplits calls by dataset basePath and does not take INputFormatClass into account. Hence getSplits() is called only once. In the case of RO and RT tables, they both have same dataset base-path but differ in the InputFormatClass. Due to this, Hive join query is returning weird results.
>  
> =============
> The result of the demo is very strange
> (Step 6(a))
>  
> {{ select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG';
>  select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';}}
> return as demo
> BUT!
>  
> {{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b  on a.key=b.key where a.ts != b.ts
> ...
> +--------+-------+-------+--+
> | a.key  | a.ts  | b.ts  |
> +--------+-------+-------+--+
> +--------+-------+-------+--+}}
>  
> {{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
> 2019-07-18 09:13:20 Starting to launch local task to process map join;  maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> 2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
> 2019-07-18 09:13:21 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable (317 bytes)
> 2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
> +---------------------+----------------------+----------------------+--+
> |        a.key        |         a.ts         |         b.ts         |
> +---------------------+----------------------+----------------------+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:29:00  | 2018-08-31 10:29:00  |
> +---------------------+----------------------+----------------------+--+
> 1 row selected (7.207 seconds)
> 0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Execution log at: /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
> 2019-07-18 09:13:51 Starting to launch local task to process map join;  maximum memory = 477626368
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
> 2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
> 2019-07-18 09:13:53 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable (317 bytes)
> 2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
> +---------------------+----------------------+----------------------+--+
> |        a.key        |         a.ts         |         b.ts         |
> +---------------------+----------------------+----------------------+--+
> | GOOG_2018-08-31 10  | 2018-08-31 10:59:00  | 2018-08-31 10:59:00  |
> +---------------------+----------------------+----------------------+--+}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)