You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Marcel Reutegger <ma...@gmx.net> on 2009/02/19 12:48:20 UTC

Jackrabbit on Hadoop

Hi all,

I recently committed a PersistenceManager and a DataStore
implementation based on Hadoop. It's in sandbox/jackrabbit-hadoop. The
PersistenceManager uses Hadoop HBase, which is something similar to
Googles BigTable. The data store implementation uses plain Hadoop
HDFS. The performance is surprisingly good. On my machine the
JCRAPITests take 40 seconds compared to 35 seconds with the default
derby persistence manager.

feedback welcome.

regards
 marcel

Re: Jackrabbit on Hadoop

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi,

Now it's even more fun. I just added a Journal implementation on
HBase. This makes it possible to run Jackrabbit cluster nodes on
Hadoop. Please note that for now a patch for HBase 0.19.0 is required
to get the journal to work.

regards
 marcel

Re: Jackrabbit on Hadoop

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi,

On Fri, Feb 20, 2009 at 11:21, imadhusudhanan <ma...@gmail.com> wrote:
> Hi Marcel,, its thats really good to see. We have a DFS cluster set up. I
> think I m ok with the DataStore implementation. Could I use it with out
> hadoop implementation ??.

I'm not sure I understand. The data store implementation in
sandbox/jackrabbit-hadoop uses the Hadoop API, so you cannot use this
persistence manager without Hadoop.

> The Persistence manager that u wrote is only when
> u talk about maintaining files in a DB correct ??

No, a PersistenceManager in Jackrabbit takes care of storing
ItemStates. Those are the elements that contain the data of a JCR
Item.

regards
 marcel

Re: Jackrabbit on Hadoop

Posted by imadhusudhanan <ma...@gmail.com>.

Hi Marcel,, its thats really good to see. We have a DFS cluster set up. I think I m ok with the DataStore implementation. Could I use it with out hadoop implementation ??. The Persistence manager that u wrote is only when u talk about maintaining files in a DB correct ??

 On Thu, 19 Feb 2009 Marcel Reutegger <ma...@gmx.net> wrote ---- 

 > Hi,
 > 
 > On Thu, Feb 19, 2009 at 15:07, Alessandro Bologna
 > <al...@gmail.com> wrote:
 > > sounds really great. I am assuming you tested it on a single machine.
 > 
 > correct. HDFS and HBase processes were all running on a single machine.
 > 
 > > Any plan to leverage HDFS in a JR cluster?
 > 
 > well, there's not really a plan. I wrote it just for fun to see how
 > difficult/easy it would be.
 > 
 > > Would it even be feasible?
 > 
 > I think so, but I'm not sure how well isolation levels are implemented
 > in HBase. I'll have to test that. I guess it would also make sense to
 > have a journal implementation on HBase to also move that part to
 > Hadoop.
 > 
 > regards
 >  marcel

Re: Jackrabbit on Hadoop

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi,

On Thu, Feb 19, 2009 at 15:07, Alessandro Bologna
<al...@gmail.com> wrote:
> sounds really great. I am assuming you tested it on a single machine.

correct. HDFS and HBase processes were all running on a single machine.

> Any plan to leverage HDFS in a JR cluster?

well, there's not really a plan. I wrote it just for fun to see how
difficult/easy it would be.

> Would it even be feasible?

I think so, but I'm not sure how well isolation levels are implemented
in HBase. I'll have to test that. I guess it would also make sense to
have a journal implementation on HBase to also move that part to
Hadoop.

regards
 marcel

Re: Jackrabbit on Hadoop

Posted by Alessandro Bologna <al...@gmail.com>.

Hi Marcel

sounds really great. I am assuming you tested it on a single machine. Any
plan to leverage HDFS in a JR cluster? Would it even be feasible?
Alessandro

On Thu, Feb 19, 2009 at 6:48 AM, Marcel Reutegger
<ma...@gmx.net>wrote:

> Hi all,
>
> I recently committed a PersistenceManager and a DataStore
> implementation based on Hadoop. It's in sandbox/jackrabbit-hadoop. The
> PersistenceManager uses Hadoop HBase, which is something similar to
> Googles BigTable. The data store implementation uses plain Hadoop
> HDFS. The performance is surprisingly good. On my machine the
> JCRAPITests take 40 seconds compared to 35 seconds with the default
> derby persistence manager.
>
> feedback welcome.
>
> regards
>  marcel
>