You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ClownfishYang (Jira)" <ji...@apache.org> on 2020/12/02 09:41:00 UTC

[jira] [Created] (HBASE-25350) The scan command gives a row, but get does not have this row.

ClownfishYang created HBASE-25350:
-------------------------------------

             Summary: The scan command gives a row, but get does not have this row.
                 Key: HBASE-25350
                 URL: https://issues.apache.org/jira/browse/HBASE-25350
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.2.11
            Reporter: ClownfishYang


Hbase version:1.1.2

 

We used Hbase as a real-time database and then used hive external tables for our queries, but found that there was a problem with the data query for one table.

 
{code:java}
// sql, result id in (12045075, 12045076,...)
SELECT id FROM t1 LIMIT 10
// not result
SELECT id FROM t1 WHERE id = '12045075' LIMIT 10
// create table 
CREATE EXTERNAL TABLE `t1`(
`__key` string COMMENT '', 
`id` string COMMENT '主键ID')
COMMENT ''
ROW FORMAT SERDE 
'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
'hbase.columns.mapping'=':key,f:id', 
'serialization.format'='1')
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
'hbase.table.name'='t1', 
'numFiles'='0', 
'numRows'='0', 
'rawDataSize'='0', 
'totalSize'='0', 
'transient_lastDdlTime'='1606804842'){code}
During this period, I added space or like, but the cause of the problem could not be verified. I began to suspect that it was hbase.
{code:java}
// scan row
scan 't1', {COLUMNS => 'f:id', LIMIT => 10}
// result
ROW                                   COLUMN+CELL                                                                                                 
 12044083                             column=f:id, timestamp=1606293182000, value=12044083                                                        
 12044084                             column=f:id, timestamp=1606293183000, value=12044084                                                        
 12044085                             column=f:id, timestamp=1606293185000, value=12044085                                                        
 12044086                             column=f:id, timestamp=1606293190000, value=12044086                                                        
 12044087                             column=f:id, timestamp=1606293192000, value=12044087                                                        
 12044088                             column=f:id, timestamp=1606293197000, value=12044088                                                        
 12044089                             column=f:id, timestamp=1606293198000, value=12044089                                                        
 12044090                             column=f:id, timestamp=1606293204000, value=12044090                                                        
 12044091                             column=f:id, timestamp=1606293207000, value=12044091                                                        
 12044092                             column=f:id, timestamp=1606293208000, value=12044092   

// get row, not result
get 't1', "12044083" , {COLUMNS => 'f:id'}{code}
First of all, only row and ID queries will have this problem, and other column queries are normal.Now I think we have reason to suspect that there are invisible escape characters or something in the data, but how do I know?

The worst part is that I've used the Java API to make the call, and the returned data doesn't find any invisible escape characters on the row or ID.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)