You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Gal Nitzan <gn...@usa.net> on 2005/12/27 22:20:09 UTC

Trouble setting NDFS on multiple machines

Hi,

For some reason I am having trouble setting NDFS on multiple machines I
keep on getting an exception.

My settings follows the guide lines i.e Doug's cheat sheet on all three
machines:

  <name>fs.default.name</name>
  <value>nutchmst1.XXXXXX.com:9000</value>

all machines seems to be connecting to the namenode:

051227 223242 10 Opened server at 50010
051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
051227 223242 12 Client connection to x.x.22.185:9000: starting

051227 230013 Server connection on port 9000 from x.x.22.186: starting
051227 230013 Got brand-new heartbeat from nutchnd1:50010
051227 230013 Block report from nutchnd1:50010: 0 blocks.
051227 230013 Server connection on port 9000 from x.x.22.183: starting
051227 230013 Got brand-new heartbeat from nutchws1:50010
051227 230013 Block report from nutchws1:50010: 0 blocks.

The problem:::::::::::

nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-defaul
t.xml
051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-site.x
ml
051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
051227 230324 Client connection to x.x.22.185:9000: starting
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
        at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
[nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt urls
/urls.txt
051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-defaul
t.xml
051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-site.x
ml
051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
051227 230423 Client connection to x.x.22.185:9000: starting
Exception in thread "main" java.lang.NullPointerException
        at java.net.Socket.<init>(Socket.java:357)
        at java.net.Socket.<init>(Socket.java:207)
        at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOu
tputStream(NDFSClient.java:573)
        at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFS
Client.java:521)
        at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:83)
        at org.apache.nutch.fs.NDFSFileSystem.createRaw(NDFSFileSystem.j
ava:71)
        at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init>(NFSData
OutputStream.java:41)
        at org.apache.nutch.fs.NFSDataOutputStream.<init>(NFSDataOutputS
tream.java:129)
        at org.apache.nutch.fs.NutchFileSystem.create(NutchFileSystem.ja
va:187)
        at org.apache.nutch.fs.NutchFileSystem.create(NutchFileSystem.ja
va:174)
        at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile(NDFSFileSy
stem.java:178)
        at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile(NDFSFile
System.java:153)
        at
org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java:46                                                              )
        at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)



I know I am missing somthing but I can't figure out what.



Re: How can I set a search server over NDFS - Revised

Posted by Stefan Groschupf <sg...@media-style.com>.
My this helps:
http://wiki.media-style.com/display/nutchDocu/setup+multiple+search 
+sever
However this uses local file systems.
 From my point of view it makes no sense to use search server and  
ndfs since you need to stream data to the search server from a  
external datanode to later on stream data to the tomcat server again.
In general ndfs indexes are slow as I'm typing.
Anyway check that your search server is setuped to use ndfs and the  
path is a *absolute* path in the ndfs.
If you run search server and tomcat on one box (what makes no sense,  
only for testing) also verify that your search-server.txt is located  
in the ndfs, since I guess tomcat and search server share it  
configuration files.

HTH
Stefan


Am 28.12.2005 um 01:32 schrieb Gal Nitzan:

> Do I need to run server if I want to use the search to use NDFS?
>
> Any way, in the nutch-site.xml which reside under tomcat the  
> serch.dir =
> crawl and the name of the ndfs root is the same.
>
> However, I still get 0 results though I know for sure there are
> documents in the index.
>
>
>
> On Wed, 2005-12-28 at 01:42 +0200, Gal Nitzan wrote:
>> Hi,
>>
>> I have tried all available samples but was unsuccessful.
>>
>>
>> I am using the following command to start the server:
>>
>> bin/nutch-daemon.sh start server 9003 crawl
>>
>> I have setup a directory /hosts with the file search-servers.txt  
>> which
>> contains localhost 9003
>>
>> but the tomcat client does not connect to my search server at all.
>>
>> Any idea what am i doing wrong?
>>
>> Gal
>>
>>
>>
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



How can I set a search server over NDFS - Revised

Posted by Gal Nitzan <gn...@usa.net>.
Do I need to run server if I want to use the search to use NDFS?

Any way, in the nutch-site.xml which reside under tomcat the serch.dir =
crawl and the name of the ndfs root is the same.

However, I still get 0 results though I know for sure there are
documents in the index.



On Wed, 2005-12-28 at 01:42 +0200, Gal Nitzan wrote:
> Hi,
> 
> I have tried all available samples but was unsuccessful.
> 
> 
> I am using the following command to start the server:
> 
> bin/nutch-daemon.sh start server 9003 crawl
> 
> I have setup a directory /hosts with the file search-servers.txt which
> contains localhost 9003
> 
> but the tomcat client does not connect to my search server at all.
> 
> Any idea what am i doing wrong?
> 
> Gal
> 
> 
> 



Re: How can I set a search server over NDFS

Posted by Stefan Groschupf <sg...@media-style.com>.
Try to real dns name of the box or 127.0.0.1 instead of localhost.
Any exception?

Stefan
Am 28.12.2005 um 00:42 schrieb Gal Nitzan:

> Hi,
>
> I have tried all available samples but was unsuccessful.
>
>
> I am using the following command to start the server:
>
> bin/nutch-daemon.sh start server 9003 crawl
>
> I have setup a directory /hosts with the file search-servers.txt which
> contains localhost 9003
>
> but the tomcat client does not connect to my search server at all.
>
> Any idea what am i doing wrong?
>
> Gal
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



How can I set a search server over NDFS

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

I have tried all available samples but was unsuccessful.


I am using the following command to start the server:

bin/nutch-daemon.sh start server 9003 crawl

I have setup a directory /hosts with the file search-servers.txt which
contains localhost 9003

but the tomcat client does not connect to my search server at all.

Any idea what am i doing wrong?

Gal



Re: Trouble setting NDFS on multiple machines

Posted by Gal Nitzan <gn...@usa.net>.
I was able to find the problem. it seems that dev 0.8 will not run on
java 1.5. at least not on my cluster.

Gal


On Tue, 2005-12-27 at 23:20 +0200, Gal Nitzan wrote:
> Hi,
> 
> For some reason I am having trouble setting NDFS on multiple machines I
> keep on getting an exception.
> 
> My settings follows the guide lines i.e Doug's cheat sheet on all three
> machines:
> 
>   <name>fs.default.name</name>
>   <value>nutchmst1.XXXXXX.com:9000</value>
> 
> all machines seems to be connecting to the namenode:
> 
> 051227 223242 10 Opened server at 50010
> 051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
> 051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
> 051227 223242 12 Client connection to x.x.22.185:9000: starting
> 
> 051227 230013 Server connection on port 9000 from x.x.22.186: starting
> 051227 230013 Got brand-new heartbeat from nutchnd1:50010
> 051227 230013 Block report from nutchnd1:50010: 0 blocks.
> 051227 230013 Server connection on port 9000 from x.x.22.183: starting
> 051227 230013 Got brand-new heartbeat from nutchws1:50010
> 051227 230013 Block report from nutchws1:50010: 0 blocks.
> 
> The problem:::::::::::
> 
> nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-defaul
> t.xml
> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-site.x
> ml
> 051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
> 051227 230324 Client connection to x.x.22.185:9000: starting
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> [nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt urls
> /urls.txt
> 051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-defaul
> t.xml
> 051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-site.x
> ml
> 051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
> 051227 230423 Client connection to x.x.22.185:9000: starting
> Exception in thread "main" java.lang.NullPointerException
>         at java.net.Socket.<init>(Socket.java:357)
>         at java.net.Socket.<init>(Socket.java:207)
>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.nextBlockOu
> tputStream(NDFSClient.java:573)
>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>(NDFS
> Client.java:521)
>         at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:83)
>         at org.apache.nutch.fs.NDFSFileSystem.createRaw(NDFSFileSystem.j
> ava:71)
>         at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init>(NFSData
> OutputStream.java:41)
>         at org.apache.nutch.fs.NFSDataOutputStream.<init>(NFSDataOutputS
> tream.java:129)
>         at org.apache.nutch.fs.NutchFileSystem.create(NutchFileSystem.ja
> va:187)
>         at org.apache.nutch.fs.NutchFileSystem.create(NutchFileSystem.ja
> va:174)
>         at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile(NDFSFileSy
> stem.java:178)
>         at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile(NDFSFile
> System.java:153)
>         at
> org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java:46                                                              )
>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> 
> 
> 
> I know I am missing somthing but I can't figure out what.
> 
> 
> 



Re: Setting Search over NDFS

Posted by Byron Miller <by...@yahoo.com>.
I would recommend that you search the list for some
great discussions on NDFS. Doug has a nice writeup of
his vision of using a map reduce job to push the
indexes to your query servers so they're updates as
the webdb is and managed that way.

NDFS just wasn't designed for the I/O of a query. You
want to have queries either cached in memory or on
fast disk drives or else the latency of reading over a
network would slow things down terribly especially if
you have concurrent queries.

-byron

--- Gal Nitzan <gn...@usa.net> wrote:

> So just use the ndfs command to download the
> relevant files from NDFS 
> and put them on the search server and from there to
> follow the sample on 
> your documentation project?
> 
> Thanks for all the help.
> 
> P.S. Do you have a clear view for the solution to
> the "slowness in 
> search over NDFS"? if so I would be interested in
> giving a big hand on 
> that. Since it is a crucial part for my company...
> 
> Regards,
> 
> Gal
> 
> Stefan Groschupf wrote:
> > There will be a solution soon, if I found some
> more time, until this 
> > for smaller installation you need a shell script
> that download the 
> > index and segment to the box that runs the search
> server.
> > You also can move the index from ndfs to local
> instead of copy it.
> > check: "bin/nutch ndfs" for documentation.
> >
> > Am 28.12.2005 um 14:34 schrieb Gal Nitzan:
> >
> >> Whoa, that was fast...
> >>
> >> So all in all you would need two sets of the same
> data?
> >>
> >> Did I understand there is an effort to improve
> the "poor performance" 
> >> issue?
> >>
> >> And if we are at it, would you care to explain
> how to download the 
> >> index to local and what happens if the data is
> growing over the 
> >> boundaries of one machine? do you just add HD to
> the machine?
> >>
> >> Thanks,
> >>
> >> Gal
> >>
> >> Stefan Groschupf wrote:
> >>> Download index to a local file system.
> >>>
> >>> Am 28.12.2005 um 14:25 schrieb Gal Nitzan:
> >>>
> >>>> Hi,
> >>>>
> >>>> If using search over NDFS is too slow than what
> is the alternative 
> >>>> when all your data is in NDFS?
> >>>>
> >>>> Thanks, Gal
> >>>>
> >>>>
> >>>
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> >
> >
> > .
> >
> 
> 
> 


Re: Setting Search over NDFS

Posted by Gal Nitzan <gn...@usa.net>.
So just use the ndfs command to download the relevant files from NDFS 
and put them on the search server and from there to follow the sample on 
your documentation project?

Thanks for all the help.

P.S. Do you have a clear view for the solution to the "slowness in 
search over NDFS"? if so I would be interested in giving a big hand on 
that. Since it is a crucial part for my company...

Regards,

Gal

Stefan Groschupf wrote:
> There will be a solution soon, if I found some more time, until this 
> for smaller installation you need a shell script that download the 
> index and segment to the box that runs the search server.
> You also can move the index from ndfs to local instead of copy it.
> check: "bin/nutch ndfs" for documentation.
>
> Am 28.12.2005 um 14:34 schrieb Gal Nitzan:
>
>> Whoa, that was fast...
>>
>> So all in all you would need two sets of the same data?
>>
>> Did I understand there is an effort to improve the "poor performance" 
>> issue?
>>
>> And if we are at it, would you care to explain how to download the 
>> index to local and what happens if the data is growing over the 
>> boundaries of one machine? do you just add HD to the machine?
>>
>> Thanks,
>>
>> Gal
>>
>> Stefan Groschupf wrote:
>>> Download index to a local file system.
>>>
>>> Am 28.12.2005 um 14:25 schrieb Gal Nitzan:
>>>
>>>> Hi,
>>>>
>>>> If using search over NDFS is too slow than what is the alternative 
>>>> when all your data is in NDFS?
>>>>
>>>> Thanks, Gal
>>>>
>>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
>
>
> .
>



Re: Setting Search over NDFS

Posted by Stefan Groschupf <sg...@media-style.com>.
There will be a solution soon, if I found some more time, until this  
for smaller installation you need a shell script that download the  
index and segment to the box that runs the search server.
You also can move the index from ndfs to local instead of copy it.
check: "bin/nutch ndfs" for documentation.

Am 28.12.2005 um 14:34 schrieb Gal Nitzan:

> Whoa, that was fast...
>
> So all in all you would need two sets of the same data?
>
> Did I understand there is an effort to improve the "poor  
> performance" issue?
>
> And if we are at it, would you care to explain how to download the  
> index to local and what happens if the data is growing over the  
> boundaries of one machine? do you just add HD to the machine?
>
> Thanks,
>
> Gal
>
> Stefan Groschupf wrote:
>> Download index to a local file system.
>>
>> Am 28.12.2005 um 14:25 schrieb Gal Nitzan:
>>
>>> Hi,
>>>
>>> If using search over NDFS is too slow than what is the  
>>> alternative when all your data is in NDFS?
>>>
>>> Thanks, Gal
>>>
>>>
>>
>>
>> .
>>
>
>
>


Re: Setting Search over NDFS

Posted by Gal Nitzan <gn...@usa.net>.
Whoa, that was fast...

So all in all you would need two sets of the same data?

Did I understand there is an effort to improve the "poor performance" issue?

And if we are at it, would you care to explain how to download the index 
to local and what happens if the data is growing over the boundaries of 
one machine? do you just add HD to the machine?

Thanks,

Gal

Stefan Groschupf wrote:
> Download index to a local file system.
>
> Am 28.12.2005 um 14:25 schrieb Gal Nitzan:
>
>> Hi,
>>
>> If using search over NDFS is too slow than what is the alternative 
>> when all your data is in NDFS?
>>
>> Thanks, Gal
>>
>>
>
>
> .
>



Re: Setting Search over NDFS

Posted by Stefan Groschupf <sg...@media-style.com>.
Download index to a local file system.

Am 28.12.2005 um 14:25 schrieb Gal Nitzan:

> Hi,
>
> If using search over NDFS is too slow than what is the alternative  
> when all your data is in NDFS?
>
> Thanks, Gal
>
>


Setting Search over NDFS

Posted by Gal Nitzan <gn...@usa.net>.
Hi,

If using search over NDFS is too slow than what is the alternative when 
all your data is in NDFS?

Thanks, Gal


Re: Clustering Index job

Posted by Stefan Groschupf <sg...@media-style.com>.
Yes, you need to use map reduce on several boxes.
Anyway 100 mio files will also work on powerful box.
There are some configuration values in the nutch-default.xml that can  
improve indexing speed.


Am 28.12.2005 um 09:56 schrieb R.Mayoran:

> Hi,
>
> I need to index about 100million files.
>
> Is it possible to cluster this job?
>
> Are there any sugestions to increase the speed of indexing?
>
> Thank you in advance.
>
> Mayu.
>
>


Re: Clustering Index job

Posted by Byron Miller <by...@yahoo.com>.
Check the list for my earlier discussions. There are
tweaks you can do to enhance the performance if you
have available memory resources.

How large are your segments that you are indexing?
what file system do you use? what OS /JVM are you
building your index on?

-byron

--- "R.Mayoran" <ma...@team-lab.com> wrote:

> Hi,
> 
> I need to index about 100million files.
> 
> Is it possible to cluster this job?
> 
> Are there any sugestions to increase the speed of
> indexing?
> 
> Thank you in advance.
> 
> Mayu.
> 
> 


Clustering Index job

Posted by "R.Mayoran" <ma...@team-lab.com>.
Hi,

I need to index about 100million files.

Is it possible to cluster this job?

Are there any sugestions to increase the speed of indexing?

Thank you in advance.

Mayu.


Re: Trouble setting NDFS on multiple machines

Posted by Nutch Newbie <nu...@gmail.com>.
I had exactly similler problem with JDK 1.5. Also when I worked with
only one data node problem doesn't occur.

Thanks

On 12/28/05, Stefan Groschupf <sg...@media-style.com> wrote:
> Interesting!
> That is not a feature that is a bug, may you can open a minor bug
> report.
> Thanks.
> Stefan
> Am 28.12.2005 um 01:35 schrieb Gal Nitzan:
>
> > Thanks for the prompt reply. However it seems that the problem was
> > working with JDK 1.5
> >
> > When changed to 1.4.2 All seems to be working.
> >
> > Thanks.
> >
> > Gal.
> > On Wed, 2005-12-28 at 01:24 +0100, Stefan Groschupf wrote:
> >> The exception means that one client is unable to connect to one
> >> *datanode*.
> >> Check that the box that had this exception can open a connection to
> >> all other datanodes with the correct port.
> >> try
> >> telnet machineNameAsUsedInNameNode DATANODE_PORT
> >>
> >> Is it able to connect?
> >>
> >> Stefan
> >>
> >> Am 27.12.2005 um 22:20 schrieb Gal Nitzan:
> >>
> >>> Hi,
> >>>
> >>> For some reason I am having trouble setting NDFS on multiple
> >>> machines I
> >>> keep on getting an exception.
> >>>
> >>> My settings follows the guide lines i.e Doug's cheat sheet on all
> >>> three
> >>> machines:
> >>>
> >>>   <name>fs.default.name</name>
> >>>   <value>nutchmst1.XXXXXX.com:9000</value>
> >>>
> >>> all machines seems to be connecting to the namenode:
> >>>
> >>> 051227 223242 10 Opened server at 50010
> >>> 051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
> >>> 051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
> >>> 051227 223242 12 Client connection to x.x.22.185:9000: starting
> >>>
> >>> 051227 230013 Server connection on port 9000 from x.x.22.186:
> >>> starting
> >>> 051227 230013 Got brand-new heartbeat from nutchnd1:50010
> >>> 051227 230013 Block report from nutchnd1:50010: 0 blocks.
> >>> 051227 230013 Server connection on port 9000 from x.x.22.183:
> >>> starting
> >>> 051227 230013 Got brand-new heartbeat from nutchws1:50010
> >>> 051227 230013 Block report from nutchws1:50010: 0 blocks.
> >>>
> >>> The problem:::::::::::
> >>>
> >>> nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
> >>> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
> >>> defaul
> >>> t.xml
> >>> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
> >>> site.x
> >>> ml
> >>> 051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
> >>> 051227 230324 Client connection to x.x.22.185:9000: starting
> >>> Exception in thread "main"
> >>> java.lang.ArrayIndexOutOfBoundsException: 2
> >>>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> >>> [nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
> >>> urls
> >>> /urls.txt
> >>> 051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
> >>> defaul
> >>> t.xml
> >>> 051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
> >>> site.x
> >>> ml
> >>> 051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
> >>> 051227 230423 Client connection to x.x.22.185:9000: starting
> >>> Exception in thread "main" java.lang.NullPointerException
> >>>         at java.net.Socket.<init>(Socket.java:357)
> >>>         at java.net.Socket.<init>(Socket.java:207)
> >>>         at org.apache.nutch.ndfs.NDFSClient
> >>> $NDFSOutputStream.nextBlockOu
> >>> tputStream(NDFSClient.java:573)
> >>>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>
> >>> (NDFS
> >>> Client.java:521)
> >>>         at org.apache.nutch.ndfs.NDFSClient.create
> >>> (NDFSClient.java:83)
> >>>         at org.apache.nutch.fs.NDFSFileSystem.createRaw
> >>> (NDFSFileSystem.j
> >>> ava:71)
> >>>         at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init>
> >>> (NFSData
> >>> OutputStream.java:41)
> >>>         at org.apache.nutch.fs.NFSDataOutputStream.<init>
> >>> (NFSDataOutputS
> >>> tream.java:129)
> >>>         at org.apache.nutch.fs.NutchFileSystem.create
> >>> (NutchFileSystem.ja
> >>> va:187)
> >>>         at org.apache.nutch.fs.NutchFileSystem.create
> >>> (NutchFileSystem.ja
> >>> va:174)
> >>>         at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile
> >>> (NDFSFileSy
> >>> stem.java:178)
> >>>         at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile
> >>> (NDFSFile
> >>> System.java:153)
> >>>         at
> >>> org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java:
> >>> 46                                                              )
> >>>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> >>>
> >>>
> >>>
> >>> I know I am missing somthing but I can't figure out what.
> >>>
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------
> >> company:        http://www.media-style.com
> >> forum:        http://www.text-mining.org
> >> blog:            http://www.find23.net
> >>
> >>
> >
> >
> >
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>
>

Re: Trouble setting NDFS on multiple machines

Posted by Stefan Groschupf <sg...@media-style.com>.
Interesting!
That is not a feature that is a bug, may you can open a minor bug  
report.
Thanks.
Stefan
Am 28.12.2005 um 01:35 schrieb Gal Nitzan:

> Thanks for the prompt reply. However it seems that the problem was
> working with JDK 1.5
>
> When changed to 1.4.2 All seems to be working.
>
> Thanks.
>
> Gal.
> On Wed, 2005-12-28 at 01:24 +0100, Stefan Groschupf wrote:
>> The exception means that one client is unable to connect to one
>> *datanode*.
>> Check that the box that had this exception can open a connection to
>> all other datanodes with the correct port.
>> try
>> telnet machineNameAsUsedInNameNode DATANODE_PORT
>>
>> Is it able to connect?
>>
>> Stefan
>>
>> Am 27.12.2005 um 22:20 schrieb Gal Nitzan:
>>
>>> Hi,
>>>
>>> For some reason I am having trouble setting NDFS on multiple
>>> machines I
>>> keep on getting an exception.
>>>
>>> My settings follows the guide lines i.e Doug's cheat sheet on all
>>> three
>>> machines:
>>>
>>>   <name>fs.default.name</name>
>>>   <value>nutchmst1.XXXXXX.com:9000</value>
>>>
>>> all machines seems to be connecting to the namenode:
>>>
>>> 051227 223242 10 Opened server at 50010
>>> 051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
>>> 051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
>>> 051227 223242 12 Client connection to x.x.22.185:9000: starting
>>>
>>> 051227 230013 Server connection on port 9000 from x.x.22.186:  
>>> starting
>>> 051227 230013 Got brand-new heartbeat from nutchnd1:50010
>>> 051227 230013 Block report from nutchnd1:50010: 0 blocks.
>>> 051227 230013 Server connection on port 9000 from x.x.22.183:  
>>> starting
>>> 051227 230013 Got brand-new heartbeat from nutchws1:50010
>>> 051227 230013 Block report from nutchws1:50010: 0 blocks.
>>>
>>> The problem:::::::::::
>>>
>>> nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
>>> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
>>> defaul
>>> t.xml
>>> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
>>> site.x
>>> ml
>>> 051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
>>> 051227 230324 Client connection to x.x.22.185:9000: starting
>>> Exception in thread "main"  
>>> java.lang.ArrayIndexOutOfBoundsException: 2
>>>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
>>> [nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
>>> urls
>>> /urls.txt
>>> 051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
>>> defaul
>>> t.xml
>>> 051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch-
>>> site.x
>>> ml
>>> 051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
>>> 051227 230423 Client connection to x.x.22.185:9000: starting
>>> Exception in thread "main" java.lang.NullPointerException
>>>         at java.net.Socket.<init>(Socket.java:357)
>>>         at java.net.Socket.<init>(Socket.java:207)
>>>         at org.apache.nutch.ndfs.NDFSClient
>>> $NDFSOutputStream.nextBlockOu
>>> tputStream(NDFSClient.java:573)
>>>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init>
>>> (NDFS
>>> Client.java:521)
>>>         at org.apache.nutch.ndfs.NDFSClient.create 
>>> (NDFSClient.java:83)
>>>         at org.apache.nutch.fs.NDFSFileSystem.createRaw
>>> (NDFSFileSystem.j
>>> ava:71)
>>>         at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init>
>>> (NFSData
>>> OutputStream.java:41)
>>>         at org.apache.nutch.fs.NFSDataOutputStream.<init>
>>> (NFSDataOutputS
>>> tream.java:129)
>>>         at org.apache.nutch.fs.NutchFileSystem.create
>>> (NutchFileSystem.ja
>>> va:187)
>>>         at org.apache.nutch.fs.NutchFileSystem.create
>>> (NutchFileSystem.ja
>>> va:174)
>>>         at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile
>>> (NDFSFileSy
>>> stem.java:178)
>>>         at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile
>>> (NDFSFile
>>> System.java:153)
>>>         at
>>> org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java:
>>> 46                                                              )
>>>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
>>>
>>>
>>>
>>> I know I am missing somthing but I can't figure out what.
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------
>> company:        http://www.media-style.com
>> forum:        http://www.text-mining.org
>> blog:            http://www.find23.net
>>
>>
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



Re: Trouble setting NDFS on multiple machines

Posted by Gal Nitzan <gn...@usa.net>.
Thanks for the prompt reply. However it seems that the problem was
working with JDK 1.5

When changed to 1.4.2 All seems to be working.

Thanks.

Gal.
On Wed, 2005-12-28 at 01:24 +0100, Stefan Groschupf wrote:
> The exception means that one client is unable to connect to one  
> *datanode*.
> Check that the box that had this exception can open a connection to  
> all other datanodes with the correct port.
> try
> telnet machineNameAsUsedInNameNode DATANODE_PORT
> 
> Is it able to connect?
> 
> Stefan
> 
> Am 27.12.2005 um 22:20 schrieb Gal Nitzan:
> 
> > Hi,
> >
> > For some reason I am having trouble setting NDFS on multiple  
> > machines I
> > keep on getting an exception.
> >
> > My settings follows the guide lines i.e Doug's cheat sheet on all  
> > three
> > machines:
> >
> >   <name>fs.default.name</name>
> >   <value>nutchmst1.XXXXXX.com:9000</value>
> >
> > all machines seems to be connecting to the namenode:
> >
> > 051227 223242 10 Opened server at 50010
> > 051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
> > 051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
> > 051227 223242 12 Client connection to x.x.22.185:9000: starting
> >
> > 051227 230013 Server connection on port 9000 from x.x.22.186: starting
> > 051227 230013 Got brand-new heartbeat from nutchnd1:50010
> > 051227 230013 Block report from nutchnd1:50010: 0 blocks.
> > 051227 230013 Server connection on port 9000 from x.x.22.183: starting
> > 051227 230013 Got brand-new heartbeat from nutchws1:50010
> > 051227 230013 Block report from nutchws1:50010: 0 blocks.
> >
> > The problem:::::::::::
> >
> > nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
> > 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> > defaul
> > t.xml
> > 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> > site.x
> > ml
> > 051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
> > 051227 230324 Client connection to x.x.22.185:9000: starting
> > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
> >         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> > [nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt  
> > urls
> > /urls.txt
> > 051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> > defaul
> > t.xml
> > 051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> > site.x
> > ml
> > 051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
> > 051227 230423 Client connection to x.x.22.185:9000: starting
> > Exception in thread "main" java.lang.NullPointerException
> >         at java.net.Socket.<init>(Socket.java:357)
> >         at java.net.Socket.<init>(Socket.java:207)
> >         at org.apache.nutch.ndfs.NDFSClient 
> > $NDFSOutputStream.nextBlockOu
> > tputStream(NDFSClient.java:573)
> >         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init> 
> > (NDFS
> > Client.java:521)
> >         at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:83)
> >         at org.apache.nutch.fs.NDFSFileSystem.createRaw 
> > (NDFSFileSystem.j
> > ava:71)
> >         at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init> 
> > (NFSData
> > OutputStream.java:41)
> >         at org.apache.nutch.fs.NFSDataOutputStream.<init> 
> > (NFSDataOutputS
> > tream.java:129)
> >         at org.apache.nutch.fs.NutchFileSystem.create 
> > (NutchFileSystem.ja
> > va:187)
> >         at org.apache.nutch.fs.NutchFileSystem.create 
> > (NutchFileSystem.ja
> > va:174)
> >         at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile 
> > (NDFSFileSy
> > stem.java:178)
> >         at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile 
> > (NDFSFile
> > System.java:153)
> >         at
> > org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java: 
> > 46                                                              )
> >         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> >
> >
> >
> > I know I am missing somthing but I can't figure out what.
> >
> >
> >
> 
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
> 
> 



Re: Trouble setting NDFS on multiple machines

Posted by Stefan Groschupf <sg...@media-style.com>.
The exception means that one client is unable to connect to one  
*datanode*.
Check that the box that had this exception can open a connection to  
all other datanodes with the correct port.
try
telnet machineNameAsUsedInNameNode DATANODE_PORT

Is it able to connect?

Stefan

Am 27.12.2005 um 22:20 schrieb Gal Nitzan:

> Hi,
>
> For some reason I am having trouble setting NDFS on multiple  
> machines I
> keep on getting an exception.
>
> My settings follows the guide lines i.e Doug's cheat sheet on all  
> three
> machines:
>
>   <name>fs.default.name</name>
>   <value>nutchmst1.XXXXXX.com:9000</value>
>
> all machines seems to be connecting to the namenode:
>
> 051227 223242 10 Opened server at 50010
> 051227 223242 11 Starting DataNode in: /nutch/ndfs/data/data
> 051227 223242 11 using BLOCKREPORT_INTERVAL of 3500482msec
> 051227 223242 12 Client connection to x.x.22.185:9000: starting
>
> 051227 230013 Server connection on port 9000 from x.x.22.186: starting
> 051227 230013 Got brand-new heartbeat from nutchnd1:50010
> 051227 230013 Block report from nutchnd1:50010: 0 blocks.
> 051227 230013 Server connection on port 9000 from x.x.22.183: starting
> 051227 230013 Got brand-new heartbeat from nutchws1:50010
> 051227 230013 Block report from nutchws1:50010: 0 blocks.
>
> The problem:::::::::::
>
> nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt
> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> defaul
> t.xml
> 051227 230324 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> site.x
> ml
> 051227 230324 No FS indicated, using default:nutchmst1.XXX.com:9000
> 051227 230324 Client connection to x.x.22.185:9000: starting
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
> [nutchuser@nutchmst1 trunk]$ bin/nutch ndfs -copyFromLocal urls.txt  
> urls
> /urls.txt
> 051227 230422 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> defaul
> t.xml
> 051227 230423 parsing file:/home/nutchuser/nutch/trunk/conf/nutch- 
> site.x
> ml
> 051227 230423 No FS indicated, using default:nutchmst1.xxx.com:9000
> 051227 230423 Client connection to x.x.22.185:9000: starting
> Exception in thread "main" java.lang.NullPointerException
>         at java.net.Socket.<init>(Socket.java:357)
>         at java.net.Socket.<init>(Socket.java:207)
>         at org.apache.nutch.ndfs.NDFSClient 
> $NDFSOutputStream.nextBlockOu
> tputStream(NDFSClient.java:573)
>         at org.apache.nutch.ndfs.NDFSClient$NDFSOutputStream.<init> 
> (NDFS
> Client.java:521)
>         at org.apache.nutch.ndfs.NDFSClient.create(NDFSClient.java:83)
>         at org.apache.nutch.fs.NDFSFileSystem.createRaw 
> (NDFSFileSystem.j
> ava:71)
>         at org.apache.nutch.fs.NFSDataOutputStream$Summer.<init> 
> (NFSData
> OutputStream.java:41)
>         at org.apache.nutch.fs.NFSDataOutputStream.<init> 
> (NFSDataOutputS
> tream.java:129)
>         at org.apache.nutch.fs.NutchFileSystem.create 
> (NutchFileSystem.ja
> va:187)
>         at org.apache.nutch.fs.NutchFileSystem.create 
> (NutchFileSystem.ja
> va:174)
>         at org.apache.nutch.fs.NDFSFileSystem.doFromLocalFile 
> (NDFSFileSy
> stem.java:178)
>         at org.apache.nutch.fs.NDFSFileSystem.copyFromLocalFile 
> (NDFSFile
> System.java:153)
>         at
> org.apache.nutch.fs.NDFSShell.copyFromLocal(NDFSShell.java: 
> 46                                                              )
>         at org.apache.nutch.fs.NDFSShell.main(NDFSShell.java:234)
>
>
>
> I know I am missing somthing but I can't figure out what.
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net