You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Izaak Rubin (JIRA)" <ji...@apache.org> on 2008/07/10 19:40:31 UTC

[jira] Created: (HBASE-737) Scanner: every cell in a row has the same timestamp

Scanner: every cell in a row has the same timestamp
---------------------------------------------------

                 Key: HBASE-737
                 URL: https://issues.apache.org/jira/browse/HBASE-737
             Project: Hadoop HBase
          Issue Type: Bug
          Components: client
            Reporter: Izaak Rubin
            Priority: Minor


A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:

{code}
hbase(main):008:0> get 'table1', 'row2'  
COLUMN                       CELL 
 fam1:letters                timestamp=1215707612949, value=def 
 fam1:numbers                timestamp=1215707629064, value=123 
 fam2:letters                timestamp=1215711498969, value=abc 
3 row(s) in 0.0100 seconds
{code}

However, using the scanners to retrieve these cells shows that they all have the same timestamp:

{code}
hbase(main):009:0> scan 'table1'  
ROW                          COLUMN+CELL
 row2                        column=fam1:letters, timestamp=1215711498969, value=def 
 row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
 row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
3 row(s) in 0.0600 seconds
{code}

The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613035#action_12613035 ] 

Jim Kellerman commented on HBASE-737:
-------------------------------------

Crud. I knew I should change the InternalScanner interface, but at the time it seemed like it wasn't needed.

Ok. I've got this one.

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612598#action_12612598 ] 

irubin edited comment on HBASE-737 at 7/10/08 11:08 AM:
-------------------------------------------------------------

I'm pretty sure it's a Java issue.  I added a debug line to HTable to print out what the scanner returns on a call to next().  Here's the result:

{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
 row2                        column=fam1:letters, timestamp=1215711498969, value=def 
 row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
 row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
{code}

(scroll over, they all have the same timestamp.)

      was (Author: irubin):
    I'm pretty sure it's a Java issue.  I added a debug line to HTable to print out what the scanner returns on a call to next().  Here's the result:

{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
 row2                        column=fam1:letters, timestamp=1215711498969, value=def 
 row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
 row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
{code}
  
> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-737:
--------------------------------

    Affects Version/s: 0.2.0

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-737:
--------------------------------

    Fix Version/s: 0.2.0
         Priority: Blocker  (was: Minor)

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612598#action_12612598 ] 

Izaak Rubin commented on HBASE-737:
-----------------------------------

I'm pretty sure it's a Java issue.  I added a debug line to HTable to print out what the scanner returns on a call to next().  Here's the result:

{code}
08/07/10 11:03:22 DEBUG client.HTable$ClientScanner: IZAAK: Scanner output on next is: row=row2, cells={(column=fam1:letters, timestamp=1215711498969, value=[B@1f86b7), (column=fam1:numbers, timestamp=1215711498969, value=[B@c57009), (column=fam2:letters, timestamp=1215711498969, value=[B@3e6b10)}
 row2                        column=fam1:letters, timestamp=1215711498969, value=def 
 row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
 row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
{code}

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HBASE-737:
-----------------------------------

    Assignee: Jim Kellerman

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Izaak Rubin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613019#action_12613019 ] 

Izaak Rubin commented on HBASE-737:
-----------------------------------

I've done some investigating into the timestamp discrepancies.  In HRegionServer.next(long), HStoreScanner.next(HStoreKey, Map<byte[],byte[]>) is called once per row to retrieve Cell data for that row.  The HStoreKey contains the name of the row and a *single* timestamp for that row.  When HRegionServer.next() constructs the actual Cell objects for a row, it uses the same single timestamp from the HStoreKey.  This is why the scanners return the same timestamp for every Cell in a row.  

It looks like, in order to fix the problem, the HStoreScanner will have to store more cell information.  Does the HStoreKey even need to store a timestamp if timestamps aren't unique to a row?

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-737.
---------------------------------

    Resolution: Fixed

Committed. Resolving.

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-737) Scanner: every cell in a row has the same timestamp

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612592#action_12612592 ] 

Jim Kellerman commented on HBASE-737:
-------------------------------------

This might be a shell issue. Try writing a program to do the scan with Java.

> Scanner: every cell in a row has the same timestamp
> ---------------------------------------------------
>
>                 Key: HBASE-737
>                 URL: https://issues.apache.org/jira/browse/HBASE-737
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.2.0
>            Reporter: Izaak Rubin
>            Priority: Minor
>
> A row can have multiple cells, and each cell can have a different timestamp.  The get command in the shell demonstrates that cells are being stored with different timestamps:
> {code}
> hbase(main):008:0> get 'table1', 'row2'  
> COLUMN                       CELL 
>  fam1:letters                timestamp=1215707612949, value=def 
>  fam1:numbers                timestamp=1215707629064, value=123 
>  fam2:letters                timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0100 seconds
> {code}
> However, using the scanners to retrieve these cells shows that they all have the same timestamp:
> {code}
> hbase(main):009:0> scan 'table1'  
> ROW                          COLUMN+CELL
>  row2                        column=fam1:letters, timestamp=1215711498969, value=def 
>  row2                        column=fam1:numbers, timestamp=1215711498969, value=123 
>  row2                        column=fam2:letters, timestamp=1215711498969, value=abc 
> 3 row(s) in 0.0600 seconds
> {code}
> The scanners are losing timestamp information somewhere along the line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.