You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Bob Schulze <b....@ecircle.com> on 2009/10/07 11:58:16 UTC

memcached or existing hbase ?

I need a cache, that is read by many nodes often, written by a few nodes
rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
too big to fit into one node (so keeping local caches -or zookeeper- is
not an option).

There is hbase in place already for other applications, do I have a
further benefit (faster?) using memcached (instead, not on top of
course) or would it only be one more piece of software to maintain?

I read the memcache docs&wiki and are reasonable familiar with hbase but
 would appreciate a good reason to use this or that. I am asking in the
hadoop list, because I think also M/R jobs need this for joins
occationally, and memchache is recommended often.

Thx for any tips,

	Bob

Re: memcached or existing hbase ?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Bob,

I think it depends on what you want to do and what you need.  For example, vanilla memcached is not persistent - if your server goes down, you lost your data and you'll have to load it into the cache again (or just populate it lazily).  With HBase replication that shouldn't happen.  On the other hand, memcached is older, got a LOT more action over the years, and I have instances whose uptime is measured in *years*.  Memcached is lean and mean.  From your description, it doesn't sound like you need anything more than 1+ memcached instances.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Bob Schulze <b....@ecircle.com>
> To: "common-user@hadoop.apache.org" <co...@hadoop.apache.org>
> Sent: Wed, October 7, 2009 5:58:16 AM
> Subject: memcached or existing hbase ?
> 
> I need a cache, that is read by many nodes often, written by a few nodes
> rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
> too big to fit into one node (so keeping local caches -or zookeeper- is
> not an option).
> 
> There is hbase in place already for other applications, do I have a
> further benefit (faster?) using memcached (instead, not on top of
> course) or would it only be one more piece of software to maintain?
> 
> I read the memcache docs&wiki and are reasonable familiar with hbase but
> would appreciate a good reason to use this or that. I am asking in the
> hadoop list, because I think also M/R jobs need this for joins
> occationally, and memchache is recommended often.
> 
> Thx for any tips,
> 
>     Bob


Re: memcached or existing hbase ?

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
Hi Bob,

I also used memcached for metadata lookup in a hadoop job. I started 15
memcached server instances on 7 nodes. I noticed
1 million hits per memcached server (in my case), yet it didn't perform up
to my expections. So I used tokyocabinet (A BDB like
file based database) and it performed well.

I have written a white paper on Hadoop performance tunning. There is a case
study in which I have described my complete
scenario, approaches and statistics. You can find the paper here

http://www.impetus.com/impetusweb/whitepapers_main.jsp?download=HadoopPerformanceTuning.pdf


On Wed, Oct 7, 2009 at 5:07 PM, Paul Ingles <pa...@oobaloo.co.uk> wrote:

> Hi Bob,
>
> I don't have much in the way of usage stats to go on. However, we went on a
> similar journey with some document clustering we were doing a while ago.
>
> We wanted to do some simple key/value lookups during the process, and
> started using the existing RDBMS' that held the data already. This didn't
> really cut it so we decided to just throw Memcached onto our nodes and give
> that a go. It didn't really perform as we were expecting it to (we already
> use it for a bunch of our web apps and it works really well so it was a
> surprise). It seemed a little difficult to predict when some records would
> fall out of the cache, and so we had spurious errors making it difficult to
> depend on. We had a tight timeline so decided to just move on.
>
> In the end we installed HBase and gave it a go. Despite a few teething
> problems, it's been pretty good since then and the distribution and
> (relative) reliability meant we've stuck with it. It just seems to work for
> that kind of workload pretty well. Although I would like to go back some
> time and really figure out why memcached didn't work.
>
> Dataset wise, it was approximately 20m records, or a couple of gigabytes
> worth of data.
>
> HTH,
> Paul
>
>
> On 7 Oct 2009, at 10:58, Bob Schulze wrote:
>
>  I need a cache, that is read by many nodes often, written by a few nodes
>> rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
>> too big to fit into one node (so keeping local caches -or zookeeper- is
>> not an option).
>>
>> There is hbase in place already for other applications, do I have a
>> further benefit (faster?) using memcached (instead, not on top of
>> course) or would it only be one more piece of software to maintain?
>>
>> I read the memcache docs&wiki and are reasonable familiar with hbase but
>> would appreciate a good reason to use this or that. I am asking in the
>> hadoop list, because I think also M/R jobs need this for joins
>> occationally, and memchache is recommended often.
>>
>> Thx for any tips,
>>
>>        Bob
>>
>
>


-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

Re: memcached or existing hbase ?

Posted by Paul Ingles <pa...@oobaloo.co.uk>.
Hi Bob,

I don't have much in the way of usage stats to go on. However, we went  
on a similar journey with some document clustering we were doing a  
while ago.

We wanted to do some simple key/value lookups during the process, and  
started using the existing RDBMS' that held the data already. This  
didn't really cut it so we decided to just throw Memcached onto our  
nodes and give that a go. It didn't really perform as we were  
expecting it to (we already use it for a bunch of our web apps and it  
works really well so it was a surprise). It seemed a little difficult  
to predict when some records would fall out of the cache, and so we  
had spurious errors making it difficult to depend on. We had a tight  
timeline so decided to just move on.

In the end we installed HBase and gave it a go. Despite a few teething  
problems, it's been pretty good since then and the distribution and  
(relative) reliability meant we've stuck with it. It just seems to  
work for that kind of workload pretty well. Although I would like to  
go back some time and really figure out why memcached didn't work.

Dataset wise, it was approximately 20m records, or a couple of  
gigabytes worth of data.

HTH,
Paul

On 7 Oct 2009, at 10:58, Bob Schulze wrote:

> I need a cache, that is read by many nodes often, written by a few  
> nodes
> rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
> too big to fit into one node (so keeping local caches -or zookeeper-  
> is
> not an option).
>
> There is hbase in place already for other applications, do I have a
> further benefit (faster?) using memcached (instead, not on top of
> course) or would it only be one more piece of software to maintain?
>
> I read the memcache docs&wiki and are reasonable familiar with hbase  
> but
> would appreciate a good reason to use this or that. I am asking in the
> hadoop list, because I think also M/R jobs need this for joins
> occationally, and memchache is recommended often.
>
> Thx for any tips,
>
> 	Bob