You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by jt...@ina.fr on 2009/03/04 17:41:58 UTC

Re : Re: Re : Re: Re : Re: Table design question

Sorry, the problem was caused by a bug from my code.

So, it works, I can identify the good row.

In fact my row keys are coded as revserseHostUrl@date1-date2
More like theses ones :

www.google.com@200801-200802
www.google.com@200902-200904
www.google.com@201001-201002

To identify the good row for the request www.google.com@200901, I should access two rows to find the closest date interval :

getClosestRowBefore(www.google.com@200901) = > www.google.com@200801-200802
getScanner(www.google.com@200801-200802*).next() => www.google.com@200902-200904

I use this method, and it works, but it is really slow, even if make the same requests several times !
All my columns are block cached.
Does theses two methods benefit from block caching ? 

Thank you for your time !

Jérôme 


----- Message d'origine -----
De: stack <st...@duboce.net>
Date: Vendredi, Février 27, 2009 7:10 pm
Objet: Re: Re : Re: Re : Re: Table design question

> getClosestRowBefore should work.  What are you supplying for row?  The
> column you ask for exists?
> 
> What happens if you open a scanner at the (non-existent) row 
> 'www.google.com@'?
> 
> St.Ack
> 
> On Fri, Feb 27, 2009 at 8:02 AM, <jt...@ina.fr> wrote:
> 
> > Hi,
> >
> > following the discussion with Stack, I have modified the way I 
> insert data
> > in hbase.
> >
> > Now, I insert data in an htable using url@date as row key.
> > Like this :
> >
> > Case3:
> > BactUpdate update = new BacthUpdate(www.google.com@20090218);
> > update.put('content:',
> > 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);
> > update.put('type:', 'text/html');
> > table.commit(update);
> >
> > I want to access this rows but with inexact keys. If i have 
> inserted these
> > rows :
> >
> > www.google.com@200801
> > www.google.com@200901
> > www.google.com@201001
> >
> > and make this request :
> >
> > www.google.com@200902, I would like to find the row with the 
> specified url
> > at the closest date from 200902 (www.google.com@200901 in my case)
> >
> > So, I thought i could use the method : 
> HTable.getClosestRowBefore(byte[]> row, byte[] column) to identify 
> a row which the key is less than the
> > requested one, and then scan to identify precisely the good row.
> >
> >
> > In fact, this methods returns always the row with the null key if 
> I request
> > a row that doesn't exactly match an inserted one.
> >
> > Is there really a way to make this kind of request in hbase ?
> >
> > Jérôme Thièvre
> >
> >
> >
> >
> >
> > ----- Message d'origine -----
> > De: stack <st...@duboce.net>
> > Date: Mercredi, Février 18, 2009 10:48 pm
> > Objet: Re: Re : Re: Table design question
> >
> > > On Wed, Feb 18, 2009 at 10:29 AM, <jt...@ina.fr> wrote:
> > >
> > > > >
> > > > > Currently we can only return records at an explicit date or
> > > older, not
> > > > > newer.
> > > > >
> > > > >
> > > > > Each record is made of 10 columns, and each insert is of 
> the type;
> > > > > >
> > > > > > insertRecord(url, date, record);
> > > > > >
> > > > > > There are several possible designs for my record table :
> > > > > >
> > > > > > 1. RowKey= url and all columns are labelled with the same 
> date.> > > >
> > > > > 2. RowKey=url and we use timestamp and version support of 
> hbase,> > > > and columns
> > > > > > names are columnFamily names (no label).
> > > > > >
> > > > > 3. RowKey=url+date, and columns names are columnFamily 
> names (no
> > > > > label).>
> > > > >
> > > > > Examples please (I've only had one cup of coffee so far this
> > > morning).> >
> > > > >
> > > >
> > > >
> > > >  Supposed colum families are : {'content:', 'type:'}
> > > > I want to insert a new record with url www.google.com at date
> > > 20090218 :
> > > >
> > > > Case 1:
> > > > BactUpdate update = new BacthUpdate(www.google.com);
> > > > update.put('content:20090218',
> > > > 
> 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);> 
> > > update.put('type:20090218', 'text/html');
> > > > table.commit(update);
> > > >
> > > > Case 2: Implies use hbase versioning
> > > > BactUpdate update = new BacthUpdate(www.google.com,
> > > toTimestamp(20090218> ));
> > > > update.put('content:',
> > > > 
> 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);> 
> > > update.put('type:', 'text/html');
> > > > table.commit(update);
> > >
> > >
> > >
> > > I like this schema best.
> > >
> > > But both case 1 and 2 will have issues in current hbase if
> > > thousands of
> > > versions (to be fixed in 0.20.0).  Just a heads up.
> > >
> > >
> > > >
> > > > Case3:
> > > > BactUpdate update = new BacthUpdate(www.google.com@20090218);
> > > > update.put('content:',
> > > > 
> 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);> 
> > > update.put('type:', 'text/html');
> > > > table.commit(update);
> > > >
> > >
> > >
> > > This will work fine in current hbase, even if thousands of 
> versions.> >
> > >
> > > Is it possible (or will it be) to load column names without 
> load cell
> > > > content ? Same questions for the timestamp ?
> > > >
> > >
> > > Cell has to have something in it.
> > >
> > > Or do you mean query hbase to find list of columns in a row 
> without> > returning data?  If the latter is your question, no, 
> there is no
> > > way to get
> > > listing without getting the payload too.
> > >
> > > St.Ack
> > >
> >
>