You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Izaak Rubin (JIRA)" <ji...@apache.org> on 2008/07/10 19:40:31 UTC
[jira] Created: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Scanner: every cell in a row has the same timestamp
---------------------------------------------------
Key: HBASE-737
URL: https://issues.apache.org/jira/browse/HBASE-737
Project: Hadoop HBase
Issue Type: Bug
Components: client
Reporter: Izaak Rubin
Priority: Minor
A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
{code}
hbase(main):008:0> get 'table1', 'row2'
COLUMN CELL
fam1:letters timestamp=1215707612949, value=def
fam1:numbers timestamp=1215707629064, value=123
fam2:letters timestamp=1215711498969, value=abc
3 row(s) in 0.0100 seconds
{code}
However, using the scanners to retrieve these cells shows that they all have the same timestamp:
{code}
hbase(main):009:0> scan 'table1'
ROW COLUMN+CELL
row2 column=fam1:letters, timestamp=1215711498969, value=def
row2 column=fam1:numbers, timestamp=1215711498969, value=123
row2 column=fam2:letters, timestamp=1215711498969, value=abc
3 row(s) in 0.0600 seconds
{code}
The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613035#action_12613035 ]
Jim Kellerman commented on HBASE-737:
-------------------------------------
Crud. I knew I should change the InternalScanner interface, but at the time it seemed like it wasn't needed.
Ok. I've got this one.
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
> Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HBASE-737) Scanner: every cell in a
row has the same timestamp
Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612598#action_12612598 ]
irubin edited comment on HBASE-737 at 7/10/08 11:08 AM:
-------------------------------------------------------------
I'm pretty sure it's a Java issue. I added a debug line to HTable to print out what the scanner returns on a call to next(). Here's the result:
{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
row2 column=fam1:letters, timestamp=1215711498969, value=def
row2 column=fam1:numbers, timestamp=1215711498969, value=123
row2 column=fam2:letters, timestamp=1215711498969, value=abc
{code}
(scroll over, they all have the same timestamp.)
was (Author: irubin):
I'm pretty sure it's a Java issue. I added a debug line to HTable to print out what the scanner returns on a call to next(). Here's the result:
{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
row2 column=fam1:letters, timestamp=1215711498969, value=def
row2 column=fam1:numbers, timestamp=1215711498969, value=123
row2 column=fam2:letters, timestamp=1215711498969, value=abc
{code}
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HBASE-737:
--------------------------------
Affects Version/s: 0.2.0
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman updated HBASE-737:
--------------------------------
Fix Version/s: 0.2.0
Priority: Blocker (was: Minor)
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612598#action_12612598 ]
Izaak Rubin commented on HBASE-737:
-----------------------------------
I'm pretty sure it's a Java issue. I added a debug line to HTable to print out what the scanner returns on a call to next(). Here's the result:
{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
row2 column=fam1:letters, timestamp=1215711498969, value=def
row2 column=fam1:numbers, timestamp=1215711498969, value=123
row2 column=fam2:letters, timestamp=1215711498969, value=abc
{code}
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman reassigned HBASE-737:
-----------------------------------
Assignee: Jim Kellerman
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Assignee: Jim Kellerman
> Priority: Minor
> Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613019#action_12613019 ]
Izaak Rubin commented on HBASE-737:
-----------------------------------
I've done some investigating into the timestamp discrepancies. In HRegionServer.next(long), HStoreScanner.next(HStoreKey, Map<byte[],byte[]>) is called once per row to retrieve Cell data for that row. The HStoreKey contains the name of the row and a *single* timestamp for that row. When HRegionServer.next() constructs the actual Cell objects for a row, it uses the same single timestamp from the HStoreKey. This is why the scanners return the same timestamp for every Cell in a row.
It looks like, in order to fix the problem, the HStoreScanner will have to store more cell information. Does the HStoreKey even need to store a timestamp if timestamps aren't unique to a row?
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Kellerman resolved HBASE-737.
---------------------------------
Resolution: Fixed
Committed. Resolving.
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-737) Scanner: every cell in a row has the
same timestamp
Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612592#action_12612592 ]
Jim Kellerman commented on HBASE-737:
-------------------------------------
This might be a shell issue. Try writing a program to do the scan with Java.
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
> Key: HBASE-737
> URL: https://issues.apache.org/jira/browse/HBASE-737
> Project: Hadoop HBase
> Issue Type: Bug
> Components: client
> Affects Versions: 0.2.0
> Reporter: Izaak Rubin
> Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp. The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'
> COLUMN CELL
> fam1:letters timestamp=1215707612949, value=def
> fam1:numbers timestamp=1215707629064, value=123
> fam2:letters timestamp=1215711498969, value=abc
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'
> ROW COLUMN+CELL
> row2 column=fam1:letters, timestamp=1215711498969, value=def
> row2 column=fam1:numbers, timestamp=1215711498969, value=123
> row2 column=fam2:letters, timestamp=1215711498969, value=abc
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.