You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Peng Luo <pe...@oneplus.com.INVALID> on 2020/01/15 13:06:06 UTC
【HBase】hive查询hbase数据重复
Hi all,
HBase里只有一行记录,查询row_key只有一行记录。
Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。
1,HBase建表语句:
hbase(main):003:0> describe 'test:table_name1'
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER',
MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
1 row(s) in 0.1370 seconds
2,HBase只有一条数据
hbase(main):002:0> get ' test:table_name1','7772809'
COLUMN CELL
cf:id timestamp=1579067194137, value=777280
3,Hive建表语句
drop table `test.hive_table_name1`;
CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
ROW FORMAT SERDE
'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
TBLPROPERTIES ('hbase.table.name'=' test:table_name1')
4,Hive查询结果
[cid:image001.png@01D5CBE4.8D60EFE0]
我现在应该从哪些方面去尝试定位这个问题出在哪?
Re: 【HBase】hive查询hbase数据重复
Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
I can not see the picture of 'hive query result', so not sure what is the
real problem. But in section 2, a 'get' does not mean there is only one
row, try using 'scan'. And have you tried to send the question to the hive
community? They may know this better.
Thanks.
-------------------Chinese below---------------
Hive查询结果的图看不到,所以不确定具体是啥问题。不过在2里面,用'get'并不能说明HBase表里只有行数据,得用scan。另外建议你给Hive社区也发邮件问问,他们可能更清楚具体是什么问题。另外建议还是尽量用英文提问,这个是国际社区,用中文的话有很多人看不懂。
谢谢。
Peng Luo <pe...@oneplus.com.invalid> 于2020年1月15日周三 下午11:30写道:
> Hi all,
>
> HBase里只有一行记录,查询row_key只有一行记录。
>
> Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。
>
>
>
>
>
> *1,HBase建表语句:*
>
> hbase(main):003:0> describe
> 'test:table_name1'
>
>
> COLUMN FAMILIES
> DESCRIPTION
>
>
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> 'FOREVER',
>
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
> 1 row(s) in 0.1370 seconds
>
>
>
> *2,HBase只有一条数据*
>
> hbase(main):002:0> get ' test:table_name1','7772809'
>
> COLUMN
> CELL
>
>
> cf:id timestamp=1579067194137,
> value=777280
>
>
>
> *3,Hive建表语句*
>
> drop table `test.hive_table_name1`;
>
> CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
>
> ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.hbase.HBaseSerDe'
>
> STORED BY
>
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
>
> TBLPROPERTIES ('hbase.table.name'=' test:table_name1')
>
>
>
> *4,Hive查询结果*
>
>
>
>
>
> 我现在应该从哪些方面去尝试定位这个问题出在哪?
>
>
>
Re: 【HBase】hive查询hbase数据重复
Posted by Josh Elser <el...@apache.org>.
Hi Peng,
While we recognize that the Apache communities are global communities
where people speak all languages, the ASF requests that communication is
done in English[1]
Could you translate your original message for us, please?
[1] https://www.apache.org/foundation/policies/conduct#diversity-statement
On 1/15/20 8:06 AM, Peng Luo wrote:
> Hi all,
>
> HBase里只有一行记录,查询row_key只有一行记录。
>
> Hive创建外部表关联到HBase的这个表,能查询到2行一模一样的数据。
>
> *1,HBase建表语句:*
>
> hbase(main):003:0> describe 'test:table_name1'
>
> COLUMN FAMILIES DESCRIPTION
>
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL =>
> 'FOREVER',
>
> MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
> 1 row(s) in 0.1370 seconds
>
> *2,HBase只有一条数据*
>
> hbase(main):002:0> get ' test:table_name1','7772809'
>
> COLUMN CELL
>
> cf:id timestamp=1579067194137,
> value=777280
>
> *3,Hive建表语句*
>
> drop table `test.hive_table_name1`;
>
> CREATE EXTERNAL TABLE `test.hive_table_name1`(`id` string )
>
> ROW FORMAT SERDE
>
> 'org.apache.hadoop.hive.hbase.HBaseSerDe'
>
> STORED BY
>
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key')
>
> TBLPROPERTIES ('hbase.table.name'=' test:table_name1')
>
> *4,Hive查询结果*
>
> 我现在应该从哪些方面去尝试定位这个问题出在哪?
>