You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/06/30 20:15:45 UTC

[jira] Commented: (HBASE-33) Add a HTable get/obstainScanner method that retrieves all versions of a particular column and row between two timestamps

    [ https://issues.apache.org/jira/browse/HBASE-33?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609318#action_12609318 ] 

Jim Kellerman commented on HBASE-33:
------------------------------------

Implementing a scanner that returns multiple Cells for a single column is going to force an API change. Google's API for scanners is somewhat different from HBase:

{code}
Scanner scanner(T);
ScanStream* stream;
stream = scanner.FetchColumnFamily("anchor");
stream->SetReturnAllVersions();
scanner.Lookup("com.cnn.www");
for (; !stream->Done(); stream->Next()) {
  printf("%s %s %lld %s\n",
    scanner.RowName(),
    stream->ColumnName(),
    stream->MicroTimestamp(),
    stream->Value());
}
{code}

In HBase, we currently cannot retrieve values for multiple timestamps for the same column:

{code}
HTable t = new HTable(conf, "tableName");
Scanner s = t.getScanner(columns, startRow, timestamp, filter);
try {
  RowResult r = null;
  while ((r = s.next()) != null) {
    System.out.print(Bytes.toString(r.getRow));
    for (Map.Entry<byte[], Cell> column: r.entrySet()) {
      System.out.print(" " + Bytes.toString(column.getKey()));
      Cell c column.getValue();
      System.out.println(" " + c.getTimestamp() + " " + Bytes.toString(c.getValue()));
    }
  }
} finally {
  s.close();
}
{code}

The problem is, how do we return multiple Cells per column, without seriously breaking the client API? Proposed solution:
- make Cell implement Iterable
- getValue() returns the "current" value
- getTimestamp returns the "current" timestamp
- hasNext() returns true if there are more values
- next() advances to the "next" value/timestamp

Initially the "current" value/timestamp point to the first timestamp/value, which preserves the current API.

Comments?


> Add a HTable get/obstainScanner method that retrieves all versions of a particular column and row between two timestamps
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-33
>                 URL: https://issues.apache.org/jira/browse/HBASE-33
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>            Reporter: Peter Dolan
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.2.0
>
>
> The use case:
> * A weblog application for which rows are user ids and posts are stored in a single column, with post date specified by the cell's timestamp.  The application would then need to be able to display all posts for the last week or month.
> * A feedfetcher for which rows are URLs and feed posts are stored in a single column with the post publish date or fetch time stored in the cell's timestamp.  The application would then need to be able to display all posts for the last week or month.
> Proposed API:
> // Get all versions of the specified row and column whose timestamps are in [minTimestamp, maxTimestamp]
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long minTimestamp, long maxTimestamp);
> // Get all versions of the specified row and column whose timestamps are >= minTimestamp
> SortedMap<long, byte[]> getTimestamps(Text row, Text column, long minTimestamp);
> I'd be happy to take this on myself, as I need it for the above use cases before migrating my application over to HBase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.