You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Peng Luo <pe...@oneplus.com.INVALID> on 2020/01/15 13:06:06 UTC

【HBase】hive查询hbase数据重复

Hi all,
       HBase里只有一行记录,查询row_key只有一行记录。
       Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。


1,HBase建表语句:
hbase(main):003:0> describe 'test:table_name1'
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER',
MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1370 seconds

2,HBase只有一条数据
hbase(main):002:0> get ' test:table_name1','7772809'
COLUMN                                 CELL
 cf:id                                 timestamp=1579067194137, value=777280

3,Hive建表语句
drop table `test.hive_table_name1`;
CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
ROW FORMAT SERDE
  'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
       TBLPROPERTIES ('hbase.table.name'=' test:table_name1')

4,Hive查询结果
[cid:image001.png@01D5CBE4.8D60EFE0]


我现在应该从哪些方面去尝试定位这个问题出在哪?


Re: 【HBase】hive查询hbase数据重复

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
I can not see the picture of 'hive query result', so not sure what is the
real problem. But in section 2, a 'get' does not mean there is only one
row, try using 'scan'. And have you tried to send the question to the hive
community? They may know this better.

Thanks.

-------------------Chinese below---------------

Hive查询结果的图看不到,所以不确定具体是啥问题。不过在2里面,用'get'并不能说明HBase表里只有行数据,得用scan。另外建议你给Hive社区也发邮件问问,他们可能更清楚具体是什么问题。另外建议还是尽量用英文提问,这个是国际社区,用中文的话有很多人看不懂。

谢谢。

Peng Luo <pe...@oneplus.com.invalid> 于2020年1月15日周三 下午11:30写道:

> Hi all,
>
>        HBase里只有一行记录,查询row_key只有一行记录。
>
>        Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。
>
>
>
>
>
> *1,HBase建表语句:*
>
> hbase(main):003:0> describe
> 'test:table_name1'
>
>
> COLUMN FAMILIES
> DESCRIPTION
>
>
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> 'FOREVER',
>
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
> 1 row(s) in 0.1370 seconds
>
>
>
> *2,HBase只有一条数据*
>
> hbase(main):002:0> get ' test:table_name1','7772809'
>
> COLUMN
> CELL
>
>
>  cf:id                                 timestamp=1579067194137,
> value=777280
>
>
>
> *3,Hive建表语句*
>
> drop table `test.hive_table_name1`;
>
> CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
>
> ROW FORMAT SERDE
>
>   'org.apache.hadoop.hive.hbase.HBaseSerDe'
>
> STORED BY
>
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
>
>        TBLPROPERTIES ('hbase.table.name'=' test:table_name1')
>
>
>
> *4,Hive查询结果*
>
>
>
>
>
> 我现在应该从哪些方面去尝试定位这个问题出在哪?
>
>
>

Re: 【HBase】hive查询hbase数据重复

Posted by Josh Elser <el...@apache.org>.
Hi Peng,

While we recognize that the Apache communities are global communities 
where people speak all languages, the ASF requests that communication is 
done in English[1]

Could you translate your original message for us, please?

[1] https://www.apache.org/foundation/policies/conduct#diversity-statement

On 1/15/20 8:06 AM, Peng Luo wrote:
> Hi all,
> 
>         HBase里只有一行记录,查询row_key只有一行记录。
> 
>         Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。
> 
> *1,HBase建表语句:*
> 
> hbase(main):003:0> describe 'test:table_name1'
> 
> COLUMN FAMILIES DESCRIPTION
> 
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', 
> REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 
> 'FOREVER',
> 
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => 
> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> 
> 1 row(s) in 0.1370 seconds
> 
> *2,HBase只有一条数据*
> 
> hbase(main):002:0> get ' test:table_name1','7772809'
> 
> COLUMN                                 CELL
> 
>   cf:id                                 timestamp=1579067194137, 
> value=777280
> 
> *3,Hive建表语句*
> 
> drop table `test.hive_table_name1`;
> 
> CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
> 
> ROW FORMAT SERDE
> 
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'
> 
> STORED BY
> 
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> 
> WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
> 
>         TBLPROPERTIES ('hbase.table.name'=' test:table_name1')
> 
> *4,Hive查询结果*
> 
> 我现在应该从哪些方面去尝试定位这个问题出在哪?
>