You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by imadhusudhanan <im...@zohocorp.com> on 2009/02/10 14:44:21 UTC

Fwd: Re: Moving to DFS System ..

Dear All,

I use the Apache Hadoop project as DFS. Have anyone dealt with the similar JR to DFS conversion.. ?? pls explain ... 

Regards,
MadhuSudhanan I.
www.zoho.com
'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..."



============ Forwarded Mail ============
>From : Jukka Zitting <ju...@gmail.com>
To : dev@jackrabbit.apache.org
Date :Tue, 10 Feb 2009 10:24:43 +0100
Subject : Re: Moving to DFS System ..
============ Forwarded Mail ============

 > Hi, 
 >  
 > On Tue, Feb 10, 2009 at 6:45 AM, imadhusudhanan 
 > <im...@zohocorp.com> wrote: 
 > > I have used jackrabbit 1.4 and was successful running in my local 
 > > environment with the repository provided by JR itself. Now that we have our 
 > > own DFS system  I would like to change the existing configuration to our DFS 
 > > instead using JR repository ... May I know how do I do this ... ?? Pls Help. 
 >  
 > Could you be a bit more specific? What "DFS" are you talking about? 
 >  
 > Also, the users@ list is a better place for questions about Jackrabbit usage. 
 >  
 > BR, 
 >  
 > Jukka Zitting

Re: Re: Moving to DFS System ..

Posted by imadhusudhanan <im...@zohocorp.com>.

Hi Alex,

    There are two options that might work for you, but both involve some 
coding effort: one is to use Jackrabbit's WebDAV server "library" to 
build your own server-side implementation of WebDAV that connects to a 
Hadoop FileSystem. 
    I under stand this as I got to define the classes under org.apache.jackrabbit.wedbav.xxx packages which will now be compatible to HadoopFileSystem. Correct ??
The other option would be to implement the full JCR 
API via Jackrabbit SPI [2], which is a simpler API than the full JCR 
API, and build this on top of a Hadoop FileSystem - but this is a 
rather huge effort. 
     Y u called this to be a huge effort ?? After all i can see very few apis in org.apache.jackrabbit.spi package compared to the former to implement. Do you mean that implementing the existing webdav server library with Hadoop File System will be easier when compared to the spi provision .. ?? 
    
        Also, as far as the scalability is concerned, which one do I prefer .. ?? 


    



Regards,
MadhuSudhanan I.
www.zoho.com
'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..."



---- On Mon, 16 Feb 2009 Alexander Klimetschek <ak...@day.com> wrote ---- 

 > Hi, 
 >  
 > (I only answer questions publicly, so this goes back to the Jackrabbit list) 
 >  
 > On Mon, Feb 16, 2009 at 3:39 PM, imadhusudhanan 
 > <im...@zohocorp.com> wrote: 
 > >     Currently we use Hadoop API to access files from Distributed File 
 > > System.  I would like to enable Webdav to the same DFS that I use using JR. 
 > > May I know how I can make it possible ?? 
 >  
 > Jackrabbit is not a generic WebDAV to file system mapper as you might 
 > think. Since it is a JCR repository, it must allow for all the 
 > fine-grained JCR features (nodes and residual properties, versioning, 
 > node types, locking etc.) that cannot be mapped onto simple OS file 
 > systems or simple filesystem abstractions (as what Hadoop contains for 
 > example). Theoretically that might be possible, but it's not an option 
 > for a performant implementation. Therefore Jackabbit has its own 
 > persistence abstraction (mainly around the PersistenceManager 
 > interface [1]), which is driven by the internal architecture to 
 > support the full JCR API. 
 >  
 > [1] http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/core/persistence/PersistenceManager.html 
 >  
 >  
 > >     Also Hadoop DFS  has its own FileSystem. I guess that an entry in 
 > > repository.xml <FileSystem> tag will change the file system to what ever I 
 > > specify say the org.apache.hadoop.fs.LocalFileSystem etc. 
 >  
 > No, you cannot use it. FileSystem is just a common name for 
 > persistence abstractions, but in this case, Hadoop's FileSystem (base 
 > class org.apache.hadoop.fs.FileSystem) and Jackrabbit's FileSystem 
 > (interface org.apache.jackrabbit.core.fs.FileSystem) are two 
 > completely different things. 
 >  
 > Also, Jackrabbit's FileSystem is somewhat deprecated and today not 
 > used for actual persistence - that's handled by PersistenceManagers 
 > which are at a low-level where they no longer "know" about the 
 > hierarchy, but solely work with uuids and node bundles. 
 >  
 > This means writing a PersistenceManager that works with a Hadoop 
 > FileSystem is probably very difficult or even impossible. Not sure how 
 > Marcel's implementation works, but it seems to use a different Hadoop 
 > API (not the Filesystem). 
 >  
 > There are two options that might work for you, but both involve some 
 > coding effort: one is to use Jackrabbit's WebDAV server "library" to 
 > build your own server-side implementation of WebDAV that connects to a 
 > Hadoop FileSystem. The other option would be to implement the full JCR 
 > API via Jackrabbit SPI [2], which is a simpler API than the full JCR 
 > API, and build this on top of a Hadoop FileSystem - but this is a 
 > rather huge effort. 
 >  
 > http://jackrabbit.apache.org/jackrabbit-spi.html 
 >  
 > Have a look at the following links if you are interested in more 
 > informations about Jackrabbit's architecture: 
 >  
 > http://jackrabbit.apache.org/jackrabbit-architecture.html 
 > http://jackrabbit.apache.org/how-jackrabbit-works.html 
 > http://jackrabbit.apache.org/jackrabbit-configuration.html 
 >  
 > Regards, 
 > Alex 
 >  
 > --  
 > Alexander Klimetschek 
 > alexander.klimetschek@day.com

Re: Re: Moving to DFS System ..

Posted by imadhusudhanan <im...@zohocorp.com>.

Hi Alex,

    Here is another solution which I would like to quote. I will not disturb the current repository configuration, yet I know the XML format for each WebdavRequest to communicate with. So when a user does a put request, I will override the AbstractWebdavServlet doPut() with my custom doPut which indeed extends AbstractWebdavServlet and write my custom code in it to store the file in DFS and respond with a success XML to the user. Its what a basic webdav server does I hope so. 

Regards,
MadhuSudhanan I.
www.zoho.com
'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..."



---- On Mon, 16 Feb 2009 Alexander Klimetschek <ak...@day.com> wrote ---- 

 > Hi, 
 >  
 > (I only answer questions publicly, so this goes back to the Jackrabbit list) 
 >  
 > On Mon, Feb 16, 2009 at 3:39 PM, imadhusudhanan 
 > <im...@zohocorp.com> wrote: 
 > >     Currently we use Hadoop API to access files from Distributed File 
 > > System.  I would like to enable Webdav to the same DFS that I use using JR. 
 > > May I know how I can make it possible ?? 
 >  
 > Jackrabbit is not a generic WebDAV to file system mapper as you might 
 > think. Since it is a JCR repository, it must allow for all the 
 > fine-grained JCR features (nodes and residual properties, versioning, 
 > node types, locking etc.) that cannot be mapped onto simple OS file 
 > systems or simple filesystem abstractions (as what Hadoop contains for 
 > example). Theoretically that might be possible, but it's not an option 
 > for a performant implementation. Therefore Jackabbit has its own 
 > persistence abstraction (mainly around the PersistenceManager 
 > interface [1]), which is driven by the internal architecture to 
 > support the full JCR API. 
 >  
 > [1] http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/core/persistence/PersistenceManager.html 
 >  
 >  
 > >     Also Hadoop DFS  has its own FileSystem. I guess that an entry in 
 > > repository.xml <FileSystem> tag will change the file system to what ever I 
 > > specify say the org.apache.hadoop.fs.LocalFileSystem etc. 
 >  
 > No, you cannot use it. FileSystem is just a common name for 
 > persistence abstractions, but in this case, Hadoop's FileSystem (base 
 > class org.apache.hadoop.fs.FileSystem) and Jackrabbit's FileSystem 
 > (interface org.apache.jackrabbit.core.fs.FileSystem) are two 
 > completely different things. 
 >  
 > Also, Jackrabbit's FileSystem is somewhat deprecated and today not 
 > used for actual persistence - that's handled by PersistenceManagers 
 > which are at a low-level where they no longer "know" about the 
 > hierarchy, but solely work with uuids and node bundles. 
 >  
 > This means writing a PersistenceManager that works with a Hadoop 
 > FileSystem is probably very difficult or even impossible. Not sure how 
 > Marcel's implementation works, but it seems to use a different Hadoop 
 > API (not the Filesystem). 
 >  
 > There are two options that might work for you, but both involve some 
 > coding effort: one is to use Jackrabbit's WebDAV server "library" to 
 > build your own server-side implementation of WebDAV that connects to a 
 > Hadoop FileSystem. The other option would be to implement the full JCR 
 > API via Jackrabbit SPI [2], which is a simpler API than the full JCR 
 > API, and build this on top of a Hadoop FileSystem - but this is a 
 > rather huge effort. 
 >  
 > http://jackrabbit.apache.org/jackrabbit-spi.html 
 >  
 > Have a look at the following links if you are interested in more 
 > informations about Jackrabbit's architecture: 
 >  
 > http://jackrabbit.apache.org/jackrabbit-architecture.html 
 > http://jackrabbit.apache.org/how-jackrabbit-works.html 
 > http://jackrabbit.apache.org/jackrabbit-configuration.html 
 >  
 > Regards, 
 > Alex 
 >  
 > --  
 > Alexander Klimetschek 
 > alexander.klimetschek@day.com

Re: Re: Moving to DFS System ..

Posted by Alexander Klimetschek <ak...@day.com>.

Hi,

(I only answer questions publicly, so this goes back to the Jackrabbit list)

On Mon, Feb 16, 2009 at 3:39 PM, imadhusudhanan
<im...@zohocorp.com> wrote:
>     Currently we use Hadoop API to access files from Distributed File
> System.  I would like to enable Webdav to the same DFS that I use using JR.
> May I know how I can make it possible ??

Jackrabbit is not a generic WebDAV to file system mapper as you might
think. Since it is a JCR repository, it must allow for all the
fine-grained JCR features (nodes and residual properties, versioning,
node types, locking etc.) that cannot be mapped onto simple OS file
systems or simple filesystem abstractions (as what Hadoop contains for
example). Theoretically that might be possible, but it's not an option
for a performant implementation. Therefore Jackabbit has its own
persistence abstraction (mainly around the PersistenceManager
interface [1]), which is driven by the internal architecture to
support the full JCR API.

[1] http://jackrabbit.apache.org/api/1.5/org/apache/jackrabbit/core/persistence/PersistenceManager.html

>     Also Hadoop DFS  has its own FileSystem. I guess that an entry in
> repository.xml <FileSystem> tag will change the file system to what ever I
> specify say the org.apache.hadoop.fs.LocalFileSystem etc.

No, you cannot use it. FileSystem is just a common name for
persistence abstractions, but in this case, Hadoop's FileSystem (base
class org.apache.hadoop.fs.FileSystem) and Jackrabbit's FileSystem
(interface org.apache.jackrabbit.core.fs.FileSystem) are two
completely different things.

Also, Jackrabbit's FileSystem is somewhat deprecated and today not
used for actual persistence - that's handled by PersistenceManagers
which are at a low-level where they no longer "know" about the
hierarchy, but solely work with uuids and node bundles.

This means writing a PersistenceManager that works with a Hadoop
FileSystem is probably very difficult or even impossible. Not sure how
Marcel's implementation works, but it seems to use a different Hadoop
API (not the Filesystem).

There are two options that might work for you, but both involve some
coding effort: one is to use Jackrabbit's WebDAV server "library" to
build your own server-side implementation of WebDAV that connects to a
Hadoop FileSystem. The other option would be to implement the full JCR
API via Jackrabbit SPI [2], which is a simpler API than the full JCR
API, and build this on top of a Hadoop FileSystem - but this is a
rather huge effort.

http://jackrabbit.apache.org/jackrabbit-spi.html

Have a look at the following links if you are interested in more
informations about Jackrabbit's architecture:

http://jackrabbit.apache.org/jackrabbit-architecture.html
http://jackrabbit.apache.org/how-jackrabbit-works.html
http://jackrabbit.apache.org/jackrabbit-configuration.html

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Re: Moving to DFS System ..

Posted by Alexander Klimetschek <ak...@day.com>.

On Tue, Feb 10, 2009 at 2:44 PM, imadhusudhanan
<im...@zohocorp.com> wrote:
> I use the Apache Hadoop project as DFS. Have anyone dealt with the similar JR to DFS conversion.. ?? pls explain ...

Still, what do you mean by DFS? Distributed File System? How do you
"use" it (ie. Apache Hadoop)  in your client applications, what is the
interface you use? Direct filesystem access, webdav, Hadoop API, etc?

Jackrabbit obviously mainly provides the JCR API as interface, but it
also provides a stable WebDAV filesystem-like mapping (only
nt:file/nt:folder in the repository) that can be mounted as file
system. The backend part of Jackrabbit (persistence managers,
datastore) is optimized for performance and pure JCR usage, it is an
integral part of Jackrabbit's internal architecture. If you want to
connect existing datasources via JCR, the Jackrabbit SPI interface is
thought to make development of such connectors/adaptors simpler.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Re: Moving to DFS System ..

Posted by imadhusudhanan <im...@zohocorp.com>.

HI Marcel,

    We used hadoop-core0.13.jar for our DFS System. I operate thru a DFSClient and few property files to connect to DFSServer. Now my client requirement   is this, he wants to store all his documents in his own DFS Server even he uses the WebDav. So now I want to connect WebDAV Client directly to  the DFS server. 
        
    One way seemed possible for me is override the methods in AbstractWebdavServlet and when I encounter a put request I could write some lines to connect to DFS Server and send a suitable DavResponse after a success message. Well can we deal the problem more professionallly .. ?? Pls help.

Regards,
MadhuSudhanan I.
'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..."



---- On Wed, 11 Feb 2009 Marcel Reutegger <ma...@gmx.net> wrote ---- 

 > Hi, 
 >  
 > I recently played around with Hadoop HBase and wrote a persistence 
 > manager on top of it. worked pretty well, though I'm not sure if 
 > that's what you have in mind. 
 >  
 > If there's interest I can commit it to the jackrabbit sandbox. 
 >  
 > regards 
 >  marcel 
 >  
 > On Tue, Feb 10, 2009 at 2:44 PM, imadhusudhanan 
 > <im...@zohocorp.com> wrote: 
 > > Dear All, 
 > > 
 > > I use the Apache Hadoop project as DFS. Have anyone dealt with the similar JR to DFS conversion.. ?? pls explain ... 
 > > 
 > > Regards, 
 > > MadhuSudhanan I. 
 > > www.zoho.com 
 > > 'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..." 
 > > 
 > > 
 > > 
 > > ============ Forwarded Mail ============ 
 > > From : Jukka Zitting <ju...@gmail.com> 
 > > To : dev@jackrabbit.apache.org 
 > > Date :Tue, 10 Feb 2009 10:24:43 +0100 
 > > Subject : Re: Moving to DFS System .. 
 > > ============ Forwarded Mail ============ 
 > > 
 > >  > Hi, 
 > >  > 
 > >  > On Tue, Feb 10, 2009 at 6:45 AM, imadhusudhanan 
 > >  > <im...@zohocorp.com> wrote: 
 > >  > > I have used jackrabbit 1.4 and was successful running in my local 
 > >  > > environment with the repository provided by JR itself. Now that we have our 
 > >  > > own DFS system  I would like to change the existing configuration to our DFS 
 > >  > > instead using JR repository ... May I know how do I do this ... ?? Pls Help. 
 > >  > 
 > >  > Could you be a bit more specific? What "DFS" are you talking about? 
 > >  > 
 > >  > Also, the users@ list is a better place for questions about Jackrabbit usage. 
 > >  > 
 > >  > BR, 
 > >  > 
 > >  > Jukka Zitting

Re: Re: Moving to DFS System ..

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi,

I recently played around with Hadoop HBase and wrote a persistence
manager on top of it. worked pretty well, though I'm not sure if
that's what you have in mind.

If there's interest I can commit it to the jackrabbit sandbox.

regards
 marcel

On Tue, Feb 10, 2009 at 2:44 PM, imadhusudhanan
<im...@zohocorp.com> wrote:
> Dear All,
>
> I use the Apache Hadoop project as DFS. Have anyone dealt with the similar JR to DFS conversion.. ?? pls explain ...
>
> Regards,
> MadhuSudhanan I.
> www.zoho.com
> 'If you wanna walk quick Walk Alone, if you wanna walk far Walk Together ..."
>
>
>
> ============ Forwarded Mail ============
> From : Jukka Zitting <ju...@gmail.com>
> To : dev@jackrabbit.apache.org
> Date :Tue, 10 Feb 2009 10:24:43 +0100
> Subject : Re: Moving to DFS System ..
> ============ Forwarded Mail ============
>
>  > Hi,
>  >
>  > On Tue, Feb 10, 2009 at 6:45 AM, imadhusudhanan
>  > <im...@zohocorp.com> wrote:
>  > > I have used jackrabbit 1.4 and was successful running in my local
>  > > environment with the repository provided by JR itself. Now that we have our
>  > > own DFS system  I would like to change the existing configuration to our DFS
>  > > instead using JR repository ... May I know how do I do this ... ?? Pls Help.
>  >
>  > Could you be a bit more specific? What "DFS" are you talking about?
>  >
>  > Also, the users@ list is a better place for questions about Jackrabbit usage.
>  >
>  > BR,
>  >
>  > Jukka Zitting