You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Denys Kuzmenko (JIRA)" <ji...@apache.org> on 2019/03/08 09:50:00 UTC
[jira] [Comment Edited] (HIVE-21397) BloomFilter for hive Managed
[ACID] table does not work as expected
[ https://issues.apache.org/jira/browse/HIVE-21397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787725#comment-16787725 ]
Denys Kuzmenko edited comment on HIVE-21397 at 3/8/19 9:49 AM:
---------------------------------------------------------------
In case of ACID, columns are embedded within the "row" struct and when initializing bloomFilterColumns under org.apache.orc.impl.WriterImpl - OrcUtils.includeColumns() omits them.
fieldNames used in findColumn() should contain flattened column names, instead of single "row" struct.
was (Author: dkuzmenko):
In case of ACID, when columns are embedded within the "row" struct, OrcUtils.includeColumns() omits them.
fieldNames used in findColumn() should contain flattened column names, instead of single "row" struct.
> BloomFilter for hive Managed [ACID] table does not work as expected
> -------------------------------------------------------------------
>
> Key: HIVE-21397
> URL: https://issues.apache.org/jira/browse/HIVE-21397
> Project: Hive
> Issue Type: Bug
> Components: Hive, HiveServer2, Transactions
> Affects Versions: 3.1.1
> Reporter: vaibhav
> Assignee: Denys Kuzmenko
> Priority: Blocker
>
> Steps to Reproduce this issue :
> -----------------------------------------
> 1. Create a HIveManaged table as below :
> -----------------------------------------
> {code:java}
> CREATE TABLE `bloomTest`(
> `msisdn` string,
> `imsi` varchar(20),
> `imei` bigint,
> `cell_id` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> LOCATION
> 'hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest;
> TBLPROPERTIES (
> 'bucketing_version'='2',
> 'orc.bloom.filter.columns'='msisdn,cell_id,imsi',
> 'orc.bloom.filter.fpp'='0.02',
> 'transactional'='true',
> 'transactional_properties'='default',
> 'transient_lastDdlTime'='1551206683') {code}
> -----------------------------------------
> 2. Insert a few rows.
> -----------------------------------------
> -----------------------------------------
> 3. Check if bloom filter or active : [ It does not show bloom filters for hive managed tables ]
> -----------------------------------------
> {code:java}
> [hive@c1162-node2 root]$ hive --orcfiledump hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000 | grep -i bloom
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Processing data file hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000/bucket_00000 [length: 791]
> Structure for hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/managed/hive/bloomTest/delta_0000001_0000001_0000/bucket_00000 {code}
> -----------------------------------------
> On Another hand: For hive External tables it works :
> -----------------------------------------
> {code:java}
> CREATE external TABLE `ext_bloomTest`(
> `msisdn` string,
> `imsi` varchar(20),
> `imei` bigint,
> `cell_id` bigint)
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES (
> 'bucketing_version'='2',
> 'orc.bloom.filter.columns'='msisdn,cell_id,imsi',
> 'orc.bloom.filter.fpp'='0.02') {code}
> -----------------------------------------
> {code:java}
> [hive@c1162-node2 root]$ hive --orcfiledump hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0 | grep -i bloom
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Processing data file hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0 [length: 755]
> Structure for hdfs://c1162-node2.squadron-labs.com:8020/warehouse/tablespace/external/hive/ext_bloomTest/000000_0
> Stream: column 1 section BLOOM_FILTER_UTF8 start: 41 length 110
> Stream: column 2 section BLOOM_FILTER_UTF8 start: 178 length 114
> Stream: column 4 section BLOOM_FILTER_UTF8 start: 340 length 109 {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)