You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Peter Haidinyak <ph...@local.com> on 2011/02/02 22:27:22 UTC

Keyword schema

Hi all,
	I was just tasked to take the keywords used for a search and put them in HBase so we can slice and dice them. They are interested in standard stuff like highest frequency word, word pairs, etc.
	I know I'm not the first to do this so does anyone have a recommendation on how to setup a schema for this sort of task?

Thanks

-Pete

RE: doing a scan that will return random columns in a table's family

Posted by Peter Haidinyak <ph...@local.com>.
Thanks 

-----Original Message-----
From: Jonathan Gray [mailto:jgray@fb.com] 
Sent: Thursday, February 03, 2011 10:19 AM
To: user@hbase.apache.org
Subject: RE: doing a scan that will return random columns in a table's family

Result is just the client-side class which wraps whatever the server returns.  The ability to do this query is not really about whether Result has the methods to get at this data, but rather whether Scan supports this type of query (it does).

Scan.addFamily(family) will make it so that every column in the specified family is returned.

The Result that gets returned can be used in many different ways, including those described by David.  However, I generally recommend using Result.raw() and just operating directly on the KeyValues that are returned.  There is a non-zero cost to generating the maps via the convenience methods in Result.  Some of the more recent javadoc is a bit more explicit about that.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Thursday, February 03, 2011 8:59 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> Thanks
> 
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Thursday, February 03, 2011 8:53 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> By default that is what you get.  You do have to navigate through the results:
> 
> Result.getFamilyMap():
> 
> /**
>    * Map of qualifiers to values.
>    * <p>
>    * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
>    * @param family column family to get
>    * @return map of qualifiers to values
>    */
>   public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)
> 
> 
> Or if you don't know the family name even, use the Result.getMap() function
> 
> /**
>    * Map of families to all versions of its qualifiers and values.
>    * <p>
>    * Returns a three level Map of the form:
>    *
> <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
>    * <p>
>    * Note: All other map returning methods make use of this map internally.
>    * @return map from families to qualifiers to versions
>    */
>   public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long,
> byte[]>>> getMap()
> 
> -----Original Message-----
> From: Pete Haidinyak [mailto:javamann@cox.net]
> Sent: Thursday, February 03, 2011 12:01 AM
> To: user@hbase.apache.org
> Subject: doing a scan that will return random columns in a table's family
> 
> Hi,
> 	If I have a table:family where I add new columns with computer
> generated column name (I won't know what they are to add them to a scan)
> is it possible to do a scan that returns every column in a row?
> 
> Thanks
> 
> -Pete


RE: doing a scan that will return random columns in a table's family

Posted by Jonathan Gray <jg...@fb.com>.
Result is just the client-side class which wraps whatever the server returns.  The ability to do this query is not really about whether Result has the methods to get at this data, but rather whether Scan supports this type of query (it does).

Scan.addFamily(family) will make it so that every column in the specified family is returned.

The Result that gets returned can be used in many different ways, including those described by David.  However, I generally recommend using Result.raw() and just operating directly on the KeyValues that are returned.  There is a non-zero cost to generating the maps via the convenience methods in Result.  Some of the more recent javadoc is a bit more explicit about that.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Thursday, February 03, 2011 8:59 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> Thanks
> 
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Thursday, February 03, 2011 8:53 AM
> To: user@hbase.apache.org
> Subject: RE: doing a scan that will return random columns in a table's family
> 
> By default that is what you get.  You do have to navigate through the results:
> 
> Result.getFamilyMap():
> 
> /**
>    * Map of qualifiers to values.
>    * <p>
>    * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
>    * @param family column family to get
>    * @return map of qualifiers to values
>    */
>   public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)
> 
> 
> Or if you don't know the family name even, use the Result.getMap() function
> 
> /**
>    * Map of families to all versions of its qualifiers and values.
>    * <p>
>    * Returns a three level Map of the form:
>    *
> <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
>    * <p>
>    * Note: All other map returning methods make use of this map internally.
>    * @return map from families to qualifiers to versions
>    */
>   public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long,
> byte[]>>> getMap()
> 
> -----Original Message-----
> From: Pete Haidinyak [mailto:javamann@cox.net]
> Sent: Thursday, February 03, 2011 12:01 AM
> To: user@hbase.apache.org
> Subject: doing a scan that will return random columns in a table's family
> 
> Hi,
> 	If I have a table:family where I add new columns with computer
> generated column name (I won't know what they are to add them to a scan)
> is it possible to do a scan that returns every column in a row?
> 
> Thanks
> 
> -Pete


RE: doing a scan that will return random columns in a table's family

Posted by Peter Haidinyak <ph...@local.com>.
Thanks

-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov] 
Sent: Thursday, February 03, 2011 8:53 AM
To: user@hbase.apache.org
Subject: RE: doing a scan that will return random columns in a table's family

By default that is what you get.  You do have to navigate through the results:

Result.getFamilyMap():

/**
   * Map of qualifiers to values.
   * <p>
   * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
   * @param family column family to get
   * @return map of qualifiers to values
   */
  public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)


Or if you don't know the family name even, use the Result.getMap() function

/**
   * Map of families to all versions of its qualifiers and values.
   * <p>
   * Returns a three level Map of the form:
   * <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
   * <p>
   * Note: All other map returning methods make use of this map internally.
   * @return map from families to qualifiers to versions
   */
  public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

-----Original Message-----
From: Pete Haidinyak [mailto:javamann@cox.net] 
Sent: Thursday, February 03, 2011 12:01 AM
To: user@hbase.apache.org
Subject: doing a scan that will return random columns in a table's family

Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


RE: doing a scan that will return random columns in a table's family

Posted by "Buttler, David" <bu...@llnl.gov>.
By default that is what you get.  You do have to navigate through the results:

Result.getFamilyMap():

/**
   * Map of qualifiers to values.
   * <p>
   * Returns a Map of the form: <code>Map&lt;qualifier,value></code>
   * @param family column family to get
   * @return map of qualifiers to values
   */
  public NavigableMap<byte[], byte[]> getFamilyMap(byte [] family)


Or if you don't know the family name even, use the Result.getMap() function

/**
   * Map of families to all versions of its qualifiers and values.
   * <p>
   * Returns a three level Map of the form:
   * <code>Map<family,Map&lt;qualifier,Map&lt;timestamp,value>>></code>
   * <p>
   * Note: All other map returning methods make use of this map internally.
   * @return map from families to qualifiers to versions
   */
  public NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> getMap()

-----Original Message-----
From: Pete Haidinyak [mailto:javamann@cox.net] 
Sent: Thursday, February 03, 2011 12:01 AM
To: user@hbase.apache.org
Subject: doing a scan that will return random columns in a table's family

Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


doing a scan that will return random columns in a table's family

Posted by Pete Haidinyak <ja...@cox.net>.
Hi,
	If I have a table:family where I add new columns with computer generated  
column name (I won't know what they are to add them to a scan) is it  
possible to do a scan that returns every column in a row?

Thanks

-Pete


Re: Keyword schema

Posted by Ted Dunning <td...@maprtech.com>.
A small map-reduce program could do updates to Hbase or if your incremental
data is relatively small, you can do the update one by one.  This can work
fine, but it doesn't really solve the top-100 term problem.  For that, it
may be nice to have an occasional MR program that over-produces a list of
top items.  Then each update can to against your top item list which will be
approximately the right list for the time until the next MR update.

On Wed, Feb 2, 2011 at 11:04 PM, Pete Haidinyak <ja...@cox.net> wrote:

> I will be updating the keywords and their frequency every X minutes so I
> don't believe M/R would work well but I could be wrong. I've been doing this
> approach with other data and have been receiving sub 1 second on my queries
> with two overtaxed servers. I figured this problem has been solved many
> times before and I was just looking for guidance.
>
> Thanks
>
> -Pete
>
>
> On Wed, 02 Feb 2011 16:44:39 -0800, Jean-Daniel Cryans <
> jdcryans@apache.org> wrote:
>
>  I don't think HBase is really needed here, unless you somehow need
>> random read/write to those search queries.
>>
>> J-D
>>
>> On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com>
>> wrote:
>>
>>> Hi all,
>>>       I was just tasked to take the keywords used for a search and put
>>> them in HBase so we can slice and dice them. They are interested in standard
>>> stuff like highest frequency word, word pairs, etc.
>>>       I know I'm not the first to do this so does anyone have a
>>> recommendation on how to setup a schema for this sort of task?
>>>
>>> Thanks
>>>
>>> -Pete
>>>
>>>
>

Re: Keyword schema

Posted by Pete Haidinyak <ja...@cox.net>.
I will be updating the keywords and their frequency every X minutes so I  
don't believe M/R would work well but I could be wrong. I've been doing  
this approach with other data and have been receiving sub 1 second on my  
queries with two overtaxed servers. I figured this problem has been solved  
many times before and I was just looking for guidance.

Thanks

-Pete


On Wed, 02 Feb 2011 16:44:39 -0800, Jean-Daniel Cryans  
<jd...@apache.org> wrote:

> I don't think HBase is really needed here, unless you somehow need
> random read/write to those search queries.
>
> J-D
>
> On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com>  
> wrote:
>> Hi all,
>>        I was just tasked to take the keywords used for a search and put  
>> them in HBase so we can slice and dice them. They are interested in  
>> standard stuff like highest frequency word, word pairs, etc.
>>        I know I'm not the first to do this so does anyone have a  
>> recommendation on how to setup a schema for this sort of task?
>>
>> Thanks
>>
>> -Pete
>>


Re: Keyword schema

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't think HBase is really needed here, unless you somehow need
random read/write to those search queries.

J-D

On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <ph...@local.com> wrote:
> Hi all,
>        I was just tasked to take the keywords used for a search and put them in HBase so we can slice and dice them. They are interested in standard stuff like highest frequency word, word pairs, etc.
>        I know I'm not the first to do this so does anyone have a recommendation on how to setup a schema for this sort of task?
>
> Thanks
>
> -Pete
>