You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "BALAJI VARADARAJAN (Jira)" <ji...@apache.org> on 2019/09/17 16:25:00 UTC
[jira] [Created] (HUDI-258) Hive Query engine not supporting join
queries between RT and RO tables
BALAJI VARADARAJAN created HUDI-258:
---------------------------------------
Summary: Hive Query engine not supporting join queries between RT and RO tables
Key: HUDI-258
URL: https://issues.apache.org/jira/browse/HUDI-258
Project: Apache Hudi (incubating)
Issue Type: Bug
Components: Hive Integration
Reporter: BALAJI VARADARAJAN
Description : [https://github.com/apache/incubator-hudi/issues/789#issuecomment-512740619]
Root Cause: Hive is tracking getSplits calls by dataset basePath and does not take INputFormatClass into account. Hence getSplits() is called only once. In the case of RO and RT tables, they both have same dataset base-path but differ in the InputFormatClass. Due to this, Hive join query is returning weird results.
=============
The result of the demo is very strange
(Step 6(a))
{{ select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor_rt where symbol = 'GOOG';
select `_hoodie_commit_time`, symbol, ts, volume, open, close from stock_ticks_mor where symbol = 'GOOG';}}
return as demo
BUT!
{{select a.key,a.ts, b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key=b.key where a.ts != b.ts
...
+--------+-------+-------+--+
| a.key | a.ts | b.ts |
+--------+-------+-------+--+
+--------+-------+-------+--+}}
{{0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor_rt a join stock_ticks_mor b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20190718091316_ec40e8f2-be17-4450-bb75-8db9f4390041.log
2019-07-18 09:13:20 Starting to launch local task to process map join; maximum memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2019-07-18 09:13:21 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable
2019-07-18 09:13:21 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-16_658_8306103829282410332-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile50--.hashtable (317 bytes)
2019-07-18 09:13:21 End of local task; Time Taken: 1.688 sec.
+---------------------+----------------------+----------------------+--+
| a.key | a.ts | b.ts |
+---------------------+----------------------+----------------------+--+
| GOOG_2018-08-31 10 | 2018-08-31 10:29:00 | 2018-08-31 10:29:00 |
+---------------------+----------------------+----------------------+--+
1 row selected (7.207 seconds)
0: jdbc:hive2://hiveserver:10000> select a.key,a.ts,b.ts from stock_ticks_mor a join stock_ticks_mor_rt b on a.key = b.key where a.key= 'GOOG_2018-08-31 10';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.8.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Execution log at: /tmp/root/root_20190718091348_72a5fc30-fc04-41c1-b2e3-5f943e4d5c08.log
2019-07-18 09:13:51 Starting to launch local task to process map join; maximum memory = 477626368
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2019-07-18 09:13:53 Dump the side-table for tag: 0 with group count: 1 into file: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable
2019-07-18 09:13:53 Uploaded 1 File to: file:/tmp/root/60ae1624-3514-4ddd-9bc1-5d2349d922d6/hive_2019-07-18_09-13-48_027_3613368446029280476-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile60--.hashtable (317 bytes)
2019-07-18 09:13:53 End of local task; Time Taken: 2.36 sec.
+---------------------+----------------------+----------------------+--+
| a.key | a.ts | b.ts |
+---------------------+----------------------+----------------------+--+
| GOOG_2018-08-31 10 | 2018-08-31 10:59:00 | 2018-08-31 10:59:00 |
+---------------------+----------------------+----------------------+--+}}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)