You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Egor Chernodarov <eg...@zarinsk.dem.ru> on 2005/08/30 13:38:17 UTC

NDFS question

Hello!

I want to test NDFS on my nutch installation, but I have some problem.
I have started from wiki, where is quick demo for NDFS:
http://wiki.apache.org/nutch/NutchDistributedFileSystem

On "$ nutch ndfs -put local_file /test/testfile"(or ./nutch admin db
-create and etc.) I always have exception "Could not obtain new output block":
=======================================================================
050830 061956 Waiting to find target node
Exception in thread "main" java.io.IOException: Could not obtain new
output block for file /test/testfile
        at org.apache.nutch.ndfs.NDFSClient$NameNodeCaller.getNewOutputBlock(NDFSClient.java:921)
        at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOutputStream(NDFSClient.java:616)
        at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFSClient.java:597)
        at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:85)
        at org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:76)
        at org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:71)
        at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:80)
        at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:94)
        at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
        at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
        at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
=======================================================================

On namenode I see something like this:
=======================================================================
050830 061445 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061447 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
050830 061448 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061451 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061454 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061455 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
050830 061457 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061500 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061503 Pending transfer from server.domain.com:7000 to 3 destinations
050830 061503 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
=======================================================================

But if I run datanode and namenode on the same server - all's ok!

On "$ nutch ndfs -report" I see list of my datanodes, but these
datanodes defined by external hostname. I think that namenode tries
to connect to datanodes by this NOT LOCAL hostnames. It is impossible
because firewall not allow incoming connection from external network
interfaces to this port(7000).

It's right? The error can be generated in this case?

So, can you tell me, please, what I can make to define namenode for use
local interfaces for data transfer?  I can't reconfigure firewall..

Red hat ES3.0, nutch-2005-08-25 (>nutch-0.7).
$ java -version
java version "1.4.2-01"
Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2-01)
Java HotSpot(TM) 64-Bit Server VM (build Blackdown-1.4.2-01, mixed mode)


Thanks for your time!


-- 
Best regards,
 Chernodarov Egor
                                


Re[2]: NDFS question

Posted by Egor Chernodarov <eg...@zarinsk.dem.ru>.
Hello, Doug!

I try with "mapred" branch, but anyway get errors like this:
$./nutch ndfs -put ./test.txt /test.txt
=====================
050831 055936 Client connection to 192.168.0.170:9000: starting
050831 060245 Waiting to find target node
=====================
On namenode I see :
050831 055936 Server connection on port 9000 from 192.168.0.170: starting

At the same time $./nutch ndfs -report    works fine:
=====================
Total effective bytes: 0 (0.0 k)
Effective replication multiplier: Infinity
-------------------------------------------------
Datanodes available: 1

Name: server.domain.com:7000
Total raw bytes: 75487932416 (70.30 Gb)
Used raw bytes: 7289752863 (6.78 Gb)
% used: 9.65%
Last contact with namenode: Wed Aug 31 06:08:32 CDT 2005
=====================

What also I can try? I really interested in NDFS...

Thanks for any help.


Tuesday, August 30, 2005, 10:51:10 PM, you wrote:

Doug Cutting> It sounds like you're using a nightly
Doug Cutting> build of trunk.  The NDFS code in 
Doug Cutting> trunk is old.  The NDFS code is currently
Doug Cutting> maintained in a branch named 
Doug Cutting> "mapred".  Please check out the mapred branch and retry.

Doug Cutting> svn co
Doug Cutting> https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred/

Doug Cutting> Doug

Doug Cutting> Egor Chernodarov wrote:
>> Hello!
>> 
>> I want to test NDFS on my nutch installation, but I have some problem.
>> I have started from wiki, where is quick demo for NDFS:
>> http://wiki.apache.org/nutch/NutchDistributedFileSystem
>> 
>> On "$ nutch ndfs -put local_file /test/testfile"(or ./nutch admin db
>> -create and etc.) I always have exception "Could not obtain new output block":
>> =======================================================================
>> 050830 061956 Waiting to find target node
>> Exception in thread "main" java.io.IOException: Could not obtain new
>> output block for file /test/testfile
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NameNodeCaller.getNewOutputBlock(NDFSClient.java:921)
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOutputStream(NDFSClient.java:616)
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFSClient.java:597)
>>         at
>> org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:85)
>>         at
>> org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:76)
>>         at
>> org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:71)
>>         at
>> org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:80)
>>         at
>> org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:94)
>>         at
>> org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
>>         at
>> org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
>>         at
>> org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
>> =======================================================================
>> 
>> On namenode I see something like this:
>> =======================================================================
>> 050830 061445 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061447 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> 050830 061448 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061451 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061454 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061455 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> 050830 061457 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061500 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061503 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061503 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> =======================================================================
>> 
>> But if I run datanode and namenode on the same server - all's ok!
>> 
>> On "$ nutch ndfs -report" I see list of my datanodes, but these
>> datanodes defined by external hostname. I think that namenode tries
>> to connect to datanodes by this NOT LOCAL hostnames. It is impossible
>> because firewall not allow incoming connection from external network
>> interfaces to this port(7000).
>> 
>> It's right? The error can be generated in this case?
>> 
>> So, can you tell me, please, what I can make to define namenode for use
>> local interfaces for data transfer?  I can't reconfigure firewall..
>> 
>> Red hat ES3.0, nutch-2005-08-25 (>nutch-0.7).
>> $ java -version
>> java version "1.4.2-01"
>> Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2-01)
>> Java HotSpot(TM) 64-Bit Server VM (build Blackdown-1.4.2-01, mixed mode)
>> 
>> 
>> Thanks for your time!
>> 
>> 



-- 
Best regards,               
 Chernodarov Egor


Re[2]: NDFS question

Posted by Egor Chernodarov <eg...@zarinsk.dem.ru>.
Hello, Doug!

 I have fixed my problem. As I suppose, problem was with network
 interfaces. Datanode take internet(external) address, instead local.
 I believe that it can be configured in virtual machine, but I can't
 find where it.

 I think that many peoples have several IP per server, but need to use
 specific IP for NDFS. My solution for this situation is below.
 Changes at NDFS/DataNode.java:
--------------------------------------------------------------------
   public DataNode(NutchConf conf, String datadir) throws IOException {
        this(/*InetAddress.getLocalHost().getHostName(),*/
             conf.get(InetAddress.getLocalHost().getHostName()+".realip",InetAddress.getLocalHost().getHostName()),
             new File(datadir),
             createSocketAddr(conf.get("fs.default.name", "local"))); 
--------------------------------------------------------------------

And now we can define hostname or IP for ndfs at nutch-site.xml like this:
<property>
  <name>your.hostname.here.realip</name>
  <value>192.168.0.24</value>
</property>


 P.S. I've used nutch-mapred release.

Tuesday, August 30, 2005, 10:51:10 PM, you wrote:

Doug Cutting> It sounds like you're using a nightly
Doug Cutting> build of trunk.  The NDFS code in 
Doug Cutting> trunk is old.  The NDFS code is currently
Doug Cutting> maintained in a branch named 
Doug Cutting> "mapred".  Please check out the mapred branch and retry.

Doug Cutting> svn co
Doug Cutting> https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred/

Doug Cutting> Doug

Doug Cutting> Egor Chernodarov wrote:
>> Hello!
>> 
>> I want to test NDFS on my nutch installation, but I have some problem.
>> I have started from wiki, where is quick demo for NDFS:
>> http://wiki.apache.org/nutch/NutchDistributedFileSystem
>> 
>> On "$ nutch ndfs -put local_file /test/testfile"(or ./nutch admin db
>> -create and etc.) I always have exception "Could not obtain new output block":
>> =======================================================================
>> 050830 061956 Waiting to find target node
>> Exception in thread "main" java.io.IOException: Could not obtain new
>> output block for file /test/testfile
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NameNodeCaller.getNewOutputBlock(NDFSClient.java:921)
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOutputStream(NDFSClient.java:616)
>>         at
>> org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFSClient.java:597)
>>         at
>> org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:85)
>>         at
>> org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:76)
>>         at
>> org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:71)
>>         at
>> org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:80)
>>         at
>> org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:94)
>>         at
>> org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
>>         at
>> org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
>>         at
>> org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
>> =======================================================================
>> 
>> On namenode I see something like this:
>> =======================================================================
>> 050830 061445 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061447 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> 050830 061448 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061451 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061454 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061455 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> 050830 061457 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061500 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061503 Pending transfer from server.domain.com:7000 to 3 destinations
>> 050830 061503 Renewed lease [Lease.  Holder:
>> NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
>> =======================================================================
>> 
>> But if I run datanode and namenode on the same server - all's ok!
>> 
>> On "$ nutch ndfs -report" I see list of my datanodes, but these
>> datanodes defined by external hostname. I think that namenode tries
>> to connect to datanodes by this NOT LOCAL hostnames. It is impossible
>> because firewall not allow incoming connection from external network
>> interfaces to this port(7000).
>> 
>> It's right? The error can be generated in this case?
>> 
>> So, can you tell me, please, what I can make to define namenode for use
>> local interfaces for data transfer?  I can't reconfigure firewall..
>> 
>> Red hat ES3.0, nutch-2005-08-25 (>nutch-0.7).
>> $ java -version
>> java version "1.4.2-01"
>> Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2-01)
>> Java HotSpot(TM) 64-Bit Server VM (build Blackdown-1.4.2-01, mixed mode)
>> 
>> 
>> Thanks for your time!
>> 
>> 



-- 
Best regards,               
 Chernodarov Egor


Re: NDFS question

Posted by Doug Cutting <cu...@nutch.org>.
It sounds like you're using a nightly build of trunk.  The NDFS code in 
trunk is old.  The NDFS code is currently maintained in a branch named 
"mapred".  Please check out the mapred branch and retry.

svn co https://svn.apache.org/repos/asf/lucene/nutch/branches/mapred/

Doug

Egor Chernodarov wrote:
> Hello!
> 
> I want to test NDFS on my nutch installation, but I have some problem.
> I have started from wiki, where is quick demo for NDFS:
> http://wiki.apache.org/nutch/NutchDistributedFileSystem
> 
> On "$ nutch ndfs -put local_file /test/testfile"(or ./nutch admin db
> -create and etc.) I always have exception "Could not obtain new output block":
> =======================================================================
> 050830 061956 Waiting to find target node
> Exception in thread "main" java.io.IOException: Could not obtain new
> output block for file /test/testfile
>         at org.apache.nutch.ndfs.NDFSClient$NameNodeCaller.getNewOutputBlock(NDFSClient.java:921)
>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOutputStream(NDFSClient.java:616)
>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFSClient.java:597)
>         at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:85)
>         at org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:76)
>         at org.apache.nutch.fs.NDFSFileSystem.create(NDFSFileSystem.java:71)
>         at org.apache.nutch.io.SequenceFile$Writer.<init>(SequenceFile.java:80)
>         at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:94)
>         at org.apache.nutch.db.WebDBWriter.<init>(WebDBWriter.java:1507)
>         at org.apache.nutch.db.WebDBWriter.createWebDB(WebDBWriter.java:1438)
>         at org.apache.nutch.tools.WebDBAdminTool.main(WebDBAdminTool.java:172)
> =======================================================================
> 
> On namenode I see something like this:
> =======================================================================
> 050830 061445 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061447 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
> 050830 061448 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061451 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061454 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061455 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
> 050830 061457 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061500 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061503 Pending transfer from server.domain.com:7000 to 3 destinations
> 050830 061503 Renewed lease [Lease.  Holder: NDFSClient_-1094164187, heldlocks: 1, pendingcreates: 1]
> =======================================================================
> 
> But if I run datanode and namenode on the same server - all's ok!
> 
> On "$ nutch ndfs -report" I see list of my datanodes, but these
> datanodes defined by external hostname. I think that namenode tries
> to connect to datanodes by this NOT LOCAL hostnames. It is impossible
> because firewall not allow incoming connection from external network
> interfaces to this port(7000).
> 
> It's right? The error can be generated in this case?
> 
> So, can you tell me, please, what I can make to define namenode for use
> local interfaces for data transfer?  I can't reconfigure firewall..
> 
> Red hat ES3.0, nutch-2005-08-25 (>nutch-0.7).
> $ java -version
> java version "1.4.2-01"
> Java(TM) 2 Runtime Environment, Standard Edition (build Blackdown-1.4.2-01)
> Java HotSpot(TM) 64-Bit Server VM (build Blackdown-1.4.2-01, mixed mode)
> 
> 
> Thanks for your time!
> 
> 

Re: Another NDFS question

Posted by Doug Cutting <cu...@nutch.org>.
Ian C. Blenke wrote:
>> The only somewhat complicated thing would be directory listings.  
>> These would be handled with a simple REST interface, where some simple 
>> XML is returned.  Ideally a stylesheet could be specified so that one 
>> can use the directory listing url to view the filesystem from a brower.
> 
>  From a bash scripting standpoint, this would be complicated to access 
> without a userspace command to wrap it.

Good point.  With WebDAV has cadaver for shell access, so maybe WebDAV 
is the way to go.

> A simple WebDAV interface seems like the closest thing to a standard 
> that you are attempting to approximate with the RESTful interface. The 
> added benefit would be support from DavFS2, Finder, Microsoft 
> Webfolders, etc.
> 
> Perhaps something that plugs into Jakarta Slide? A NDFS backend to Slide 
> would potentially benefit a distributed CMS as well (without  a 
> versioning history, as that appears to be beyond the scope of NDFS).
> 
> I would be interested in implementing something like this if there is 
> indeed interest.

That would be great!

NDFS is designed to reliably and efficiently support very large data 
collections.  It is not designed to be a full-featured replacement for 
desktop filesystems, but rather is a lean-and-mean storage system for 
distributed computations.  Its primary users are developers and system 
administrators.  Such folks don't require fancy graphical user 
interfaces, but they are a nice bonus.  Programmatic access from 
non-Java is also a goal.  Easy publishing from, e.g., web authoring 
tools is not a goal.

WebDAV looks to me to meet these needs without too much baggage.  It may 
encourage non-target audiences to use NDFS, but we can deal with that as 
a documentation issue.  For example, sophisticated versioning, security 
and permission systems are outside the scope of NDFS.

Doug

Re: Another NDFS question

Posted by "Ian C. Blenke" <ic...@nks.net>.
Doug Cutting wrote:

> Ian C. Blenke wrote:
>
>> When NDFS is exposed to userspace for scripts to use, admins types 
>> will embrace it for managing the cluster.
>
> Our intent is to add some servlets which run on each datanode 
> providing access to the filesystem for non-Java programs.
>
> Most operations would be quite simple, e.g.:
>
> - to write a file, post its content to a url like:
>   http://datanode:XXXX/write?name=my.file
>
> - to read a file, get file content from urls like:
>   http://datanode:XXXX/read?name=my.file
>   http://datanode:XXXX/read?name=my.file&start=2048&length=1024
>
> - to remove a file:
>   http://datanode:XXX/remove?name=my.file
> Similarly for rename, copy, etc.

Not very RESTful, but simple.

> The only somewhat complicated thing would be directory listings.  
> These would be handled with a simple REST interface, where some simple 
> XML is returned.  Ideally a stylesheet could be specified so that one 
> can use the directory listing url to view the filesystem from a brower.

 From a bash scripting standpoint, this would be complicated to access 
without a userspace command to wrap it.

A RESTish interface works well for perl/python/ruby, though I think they 
would much rather have a native object wrapper (SWIG something together).

> These servlets could easily be implemented in terms of the 
> NutchFileSystem API, and deployed with Jetty.  To my knowledge, no one 
> is currently working on this.  A volunteer would be welcome.

If portability is a key goal, FUSE or FiST probably aren't the ideal (no 
Windows or OS/X ports, for example).

A simple WebDAV interface seems like the closest thing to a standard 
that you are attempting to approximate with the RESTful interface. The 
added benefit would be support from DavFS2, Finder, Microsoft 
Webfolders, etc.

Perhaps something that plugs into Jakarta Slide? A NDFS backend to Slide 
would potentially benefit a distributed CMS as well (without  a 
versioning history, as that appears to be beyond the scope of NDFS).

I would be interested in implementing something like this if there is 
indeed interest.

- Ian C. Blenke <ic...@nks.net> <ia...@blenke.com> http://ian.blenke.com/



Re: [Nutch-dev] Re: Another NDFS question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
What you've just described, Doug, is WebDAV!   There is an  
implementation of it built into Tomcat, but a more full-featured  
version is Slide - http://jakarta.apache.org/slide/ .

There is also JSR (#170) for a content repository, being implemented  
open-source as Jackrabbit: http://incubator.apache.org/projects/ 
jackrabbit.html

Apache's mod_dav is also well worth mentioning, as it is extensible  
and surely quite fast.

I'm not sure how well any of these that I've mentioned jive with the  
goals of NDFS.  I have done a fair bit of homework on WebDAV in the  
past, once even implementing a prototype server before Slide was viable.

     Erik



On Aug 30, 2005, at 12:08 PM, Doug Cutting wrote:

> Ian C. Blenke wrote:
>
>> When NDFS is exposed to userspace for scripts to use, admins types  
>> will embrace it for managing the cluster.
>>
>
> Our intent is to add some servlets which run on each datanode  
> providing access to the filesystem for non-Java programs.
>
> Most operations would be quite simple, e.g.:
>
> - to write a file, post its content to a url like:
>   http://datanode:XXXX/write?name=my.file
>
> - to read a file, get file content from urls like:
>   http://datanode:XXXX/read?name=my.file
>   http://datanode:XXXX/read?name=my.file&start=2048&length=1024
>
> - to remove a file:
>   http://datanode:XXX/remove?name=my.file
>
> Similarly for rename, copy, etc.
>
> The only somewhat complicated thing would be directory listings.   
> These would be handled with a simple REST interface, where some  
> simple XML is returned.  Ideally a stylesheet could be specified so  
> that one can use the directory listing url to view the filesystem  
> from a brower.
>
> These servlets could easily be implemented in terms of the  
> NutchFileSystem API, and deployed with Jetty.  To my knowledge, no  
> one is currently working on this.  A volunteer would be welcome.
>
> Doug
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle  
> Practices
> Agile & Plan-Driven Development * Managing Projects & Teams *  
> Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/ 
> bsce5sf
> _______________________________________________
> Nutch-developers mailing list
> Nutch-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>


Re: Another NDFS question

Posted by Doug Cutting <cu...@nutch.org>.
Ian C. Blenke wrote:
> When NDFS is exposed to userspace for scripts to use, admins types will 
> embrace it for managing the cluster.

Our intent is to add some servlets which run on each datanode providing 
access to the filesystem for non-Java programs.

Most operations would be quite simple, e.g.:

- to write a file, post its content to a url like:
   http://datanode:XXXX/write?name=my.file

- to read a file, get file content from urls like:
   http://datanode:XXXX/read?name=my.file
   http://datanode:XXXX/read?name=my.file&start=2048&length=1024

- to remove a file:
   http://datanode:XXX/remove?name=my.file

Similarly for rename, copy, etc.

The only somewhat complicated thing would be directory listings.  These 
would be handled with a simple REST interface, where some simple XML is 
returned.  Ideally a stylesheet could be specified so that one can use 
the directory listing url to view the filesystem from a brower.

These servlets could easily be implemented in terms of the 
NutchFileSystem API, and deployed with Jetty.  To my knowledge, no one 
is currently working on this.  A volunteer would be welcome.

Doug

Another NDFS question

Posted by "Ian C. Blenke" <ic...@nks.net>.
Egor Chernodarov wrote:

>Hello!
>
>I want to test NDFS on my nutch installation, but I have some problem.
>I have started from wiki, where is quick demo for NDFS:
>http://wiki.apache.org/nutch/NutchDistributedFileSystem
>  
>
Would there be any interest in a FUSE (well FUSE-J) or FiST system level 
filesystem presentation?

I've written CornFS to solve an internal cluster storage problem, but 
NDFS looks like it would address the distributed archival problem with 
an eye toward retreival. 
(http://ian.blenke.com/blog/projects/cornfs/cornfs.html)

As Lucene/Apache are more multi-platform, something akin to a WebDAV 
backend might be more appropriate.

When NDFS is exposed to userspace for scripts to use, admins types will 
embrace it for managing the cluster.

It might not be a focus now, but it's seems to be a low hanging fruit 
that would only prove to help the project.

 - Ian C. Blenke <ic...@nks.net> <ia...@blenke.com> http://ian.blenke.com