You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jay Pound <we...@poundwebhosting.com> on 2005/08/07 05:34:44 UTC

ndfs problem needs fix

I'm copying data into the ndfs right now, I've had the server crash (bad mem
timings oops) it was running 2 datanodes and the namenode. It recovered from
a flat out crash perfectly (blue-screen kernel error system beeping windows
2003 64 sucks), I started the datanodes first then the namenode and it
replicated data and continued writing of the file. perfect. but here is
something people are going to run into big-time on one of my machines, I'm
running nutch on it from a network share, the network share went down and
here is what happens.
the network stops sending data. it waits saying could not complete file,
retrying, now it has all its data it had to send and recieve while it didnt
trying to catch up, the bad news is had I not re-connected to the server's
share drive the system would of hung waiting untill I did without copying
the rest of the 300gbytes of data I'm copying into it @ 5-22mbytes a sec
I'll let you know in the morning if it completes, if it is able to recover
fully from this data is transfering but it still says could not complete
file, retrying every 1/2-1/3 second.
good news data rates are up to 10mbytes on some machines minimum 4mbytes

Just in case someone has a single machine connected to a nas or iscsi or nfs
or even crappy samba and experiences a network outage.
-Jay



Re: luke??

Posted by Jay Pound <we...@poundwebhosting.com>.
I got it to work now, it wasent selecting the directory I had chosen, so I
typed it in and it works fine
BTW very cool tool
-J
----- Original Message ----- 
From: "Fredrik Andersson" <fi...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Sunday, August 07, 2005 6:16 PM
Subject: Re: luke??


> That's odd, Luke is working great both on Debian and Windows here.
> Have you validated the index, like no funny errors when running
> 'bin/nutch index'? Does the index directory contain all the necessary
> files (.fdt, .fdx, frq, etc, depends on what field you've chosen to
> index)? Try using the Searcher class to make a "manual" search to see
> if your index is flawed in some way, that's a good way to start.
>
> Fredrik
>
> On 8/7/05, Jay Pound <we...@poundwebhosting.com> wrote:
> > I tell luke to look to my index directory for 1 segment, it then tells
me
> > its not a lucene index, I point directly to the
l:/segments/2005xxx/index/
> > does it work properly in windows? very cool tool anyway check it out for
> > those who havent, I found it on Andrzej's website http://www.getopt.org
its
> > towards the bottom.
> > -J
> >
> >
> >
>
>



Re: luke??

Posted by Fredrik Andersson <fi...@gmail.com>.
That's odd, Luke is working great both on Debian and Windows here.
Have you validated the index, like no funny errors when running
'bin/nutch index'? Does the index directory contain all the necessary
files (.fdt, .fdx, frq, etc, depends on what field you've chosen to
index)? Try using the Searcher class to make a "manual" search to see
if your index is flawed in some way, that's a good way to start.

Fredrik

On 8/7/05, Jay Pound <we...@poundwebhosting.com> wrote:
> I tell luke to look to my index directory for 1 segment, it then tells me
> its not a lucene index, I point directly to the l:/segments/2005xxx/index/
> does it work properly in windows? very cool tool anyway check it out for
> those who havent, I found it on Andrzej's website http://www.getopt.org its
> towards the bottom.
> -J
> 
> 
>

Re: luke??

Posted by Jay Pound <we...@poundwebhosting.com>.
thanks
-J
----- Original Message ----- 
From: "EM" <em...@cpuedge.com>
To: <nu...@lucene.apache.org>
Sent: Sunday, August 07, 2005 4:29 PM
Subject: RE: luke??


> I've just downloaded and tried it, it works for me, try entering the
> directory without the 'index' part.
>
> -----Original Message-----
> From: Jay Pound [mailto:webmaster@poundwebhosting.com]
> Sent: Sunday, August 07, 2005 4:20 PM
> To: nutch-user@lucene.apache.org; nutch-dev@lucene.apache.org
> Subject: luke??
>
> I tell luke to look to my index directory for 1 segment, it then tells me
> its not a lucene index, I point directly to the l:/segments/2005xxx/index/
> does it work properly in windows? very cool tool anyway check it out for
> those who havent, I found it on Andrzej's website http://www.getopt.org
its
> towards the bottom.
> -J
>
>
>
>
>



RE: luke??

Posted by EM <em...@cpuedge.com>.
I've just downloaded and tried it, it works for me, try entering the
directory without the 'index' part.

-----Original Message-----
From: Jay Pound [mailto:webmaster@poundwebhosting.com] 
Sent: Sunday, August 07, 2005 4:20 PM
To: nutch-user@lucene.apache.org; nutch-dev@lucene.apache.org
Subject: luke??

I tell luke to look to my index directory for 1 segment, it then tells me
its not a lucene index, I point directly to the l:/segments/2005xxx/index/
does it work properly in windows? very cool tool anyway check it out for
those who havent, I found it on Andrzej's website http://www.getopt.org its
towards the bottom.
-J





luke??

Posted by Jay Pound <we...@poundwebhosting.com>.
I tell luke to look to my index directory for 1 segment, it then tells me
its not a lucene index, I point directly to the l:/segments/2005xxx/index/
does it work properly in windows? very cool tool anyway check it out for
those who havent, I found it on Andrzej's website http://www.getopt.org its
towards the bottom.
-J



Re: ndfs problem needs fix

Posted by Jay Pound <we...@poundwebhosting.com>.
#2 from your response:
I'm not yet sure how disk
> failures appear to a JVM.  Things are currently written so that if an
> exception is thrown during disk i/o then the datanode should take itself
> offline, initiating replication of its data.  We'll see if that's
> sufficient.
the data is replicated, the problem is the client stalls still trying to
write the data to that node, waiting for a response indefinately, the same
thing happens when the device runs out of hd space, the client hangs trying
to write the last block to the disk that it cannot write because there is
only >32mb available. I think that there should be a limit of like 100mb
available left on the disk so the machine dosent become unusable with so
little free space.
Thanks,
-J
----- Original Message ----- 
From: "Doug Cutting" <cu...@nutch.org>
To: <nu...@lucene.apache.org>
Sent: Monday, August 08, 2005 4:13 PM
Subject: Re: ndfs problem needs fix


> Jay Pound wrote:
> > 1.) we need to split up chunks of data into sub-folders as not to run
the
> > filesystem out of  its physical limitations of concurrent files in a
single
> > directory, like the way squid splits up its data into directories.
>
> I agree.  I am currently using reiser with NDFS so this is not a
> priority, but long-term it should be fixed.  Please file a bug report,
> and, ideally, contribute a patch.
>
> > 2.)when a datanode is set to store data on a nfs share / samba share
[...]
>
> That is not a recommended configuration.
>
> A datanode should reasonably handle disk failures.  Developing and
> debugging this may take time, however.  I'm not yet sure how disk
> failures appear to a JVM.  Things are currently written so that if an
> exception is thrown during disk i/o then the datanode should take itself
> offline, initiating replication of its data.  We'll see if that's
> sufficient.
>
> > 3.)we need to set a limit on how much of the filesystem can be used by
ndfs,
> > or a max # of 32mb chunks to be stored, when a single machine runs out
of
> > space the same thing happens as in #2 ndfs hangs waiting to write data
to
> > that particular datanode not transmitting data to the other datanodes
>
> The max storage per datanode was configurable, but we found that to be
> difficult, as it required separate configuration per datanode if
> datanodes have different devices.  So now all space on the device is
> assumed to be available to NDFS.  Probably making this optionally
> configurable would be better.  Please file a bug report, and, ideally,
> contribute a patch.
>
> Doug
>
>



Re: ndfs problem needs fix

Posted by Doug Cutting <cu...@nutch.org>.
Jay Pound wrote:
> 1.) we need to split up chunks of data into sub-folders as not to run the
> filesystem out of  its physical limitations of concurrent files in a single
> directory, like the way squid splits up its data into directories.

I agree.  I am currently using reiser with NDFS so this is not a 
priority, but long-term it should be fixed.  Please file a bug report, 
and, ideally, contribute a patch.

> 2.)when a datanode is set to store data on a nfs share / samba share [...]

That is not a recommended configuration.

A datanode should reasonably handle disk failures.  Developing and 
debugging this may take time, however.  I'm not yet sure how disk 
failures appear to a JVM.  Things are currently written so that if an 
exception is thrown during disk i/o then the datanode should take itself 
offline, initiating replication of its data.  We'll see if that's 
sufficient.

> 3.)we need to set a limit on how much of the filesystem can be used by ndfs,
> or a max # of 32mb chunks to be stored, when a single machine runs out of
> space the same thing happens as in #2 ndfs hangs waiting to write data to
> that particular datanode not transmitting data to the other datanodes

The max storage per datanode was configurable, but we found that to be 
difficult, as it required separate configuration per datanode if 
datanodes have different devices.  So now all space on the device is 
assumed to be available to NDFS.  Probably making this optionally 
configurable would be better.  Please file a bug report, and, ideally, 
contribute a patch.

Doug

luke??

Posted by Jay Pound <we...@poundwebhosting.com>.
I tell luke to look to my index directory for 1 segment, it then tells me
its not a lucene index, I point directly to the l:/segments/2005xxx/index/
does it work properly in windows? very cool tool anyway check it out for
those who havent, I found it on Andrzej's website http://www.getopt.org its
towards the bottom.
-J



Re: ndfs problem needs fix

Posted by Jay Pound <we...@poundwebhosting.com>.
ok here short and sweet I've found some problems that need fixing with ndfs:

1.) we need to split up chunks of data into sub-folders as not to run the
filesystem out of  its physical limitations of concurrent files in a single
directory, like the way squid splits up its data into directories.

2.)when a datanode is set to store data on a nfs share / samba share (via
conf) and the connection is severed the whole ndfs filesystem hangs untill
data can be written to that one drive, when the drive map is re-connected it
goes really fast for a few secs to catch up (50mb a sec for about 15secs)
this will also be a problem when a HD fails in a system, the datanode will
still function but the drive will not be able to send-recieve data because
its dead and ndfs will hang.

3.)we need to set a limit on how much of the filesystem can be used by ndfs,
or a max # of 32mb chunks to be stored, when a single machine runs out of
space the same thing happens as in #2 ndfs hangs waiting to write data to
that particular datanode not transmitting data to the other datanodes

Also I've found its much more stable now, I havent had any crashes when the
conditions are ideal for the way ndfs is now!

sorry about the big e-mails, my brain goes much faster than my fingers!!!
-J


----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
To: <we...@poundwebhosting.com>
Sent: Sunday, August 07, 2005 3:00 PM
Subject: Re: ndfs problem needs fix


> Jay Pound wrote:
>
> [.....................................]
>
> Jay,
>
> This is nothing personal, but I tend to skip your messages, because they
> are so badly formatted that it just hurts my eyes, and I don't have the
> time to parse paragraphs, which occupy half a page... Please try to be
> more concise and divide your messages into shorter paragraphs.
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>



Re: ndfs problem needs fix

Posted by Jay Pound <we...@poundwebhosting.com>.
ok here short and sweet I've found some problems that need fixing with ndfs:

1.) we need to split up chunks of data into sub-folders as not to run the
filesystem out of  its physical limitations of concurrent files in a single
directory, like the way squid splits up its data into directories.

2.)when a datanode is set to store data on a nfs share / samba share (via
conf) and the connection is severed the whole ndfs filesystem hangs untill
data can be written to that one drive, when the drive map is re-connected it
goes really fast for a few secs to catch up (50mb a sec for about 15secs)
this will also be a problem when a HD fails in a system, the datanode will
still function but the drive will not be able to send-recieve data because
its dead and ndfs will hang.

3.)we need to set a limit on how much of the filesystem can be used by ndfs,
or a max # of 32mb chunks to be stored, when a single machine runs out of
space the same thing happens as in #2 ndfs hangs waiting to write data to
that particular datanode not transmitting data to the other datanodes

Also I've found its much more stable now, I havent had any crashes when the
conditions are ideal for the way ndfs is now!

sorry about the big e-mails, my brain goes much faster than my fingers!!!
-J


----- Original Message ----- 
From: "Andrzej Bialecki" <ab...@getopt.org>
To: <we...@poundwebhosting.com>
Sent: Sunday, August 07, 2005 3:00 PM
Subject: Re: ndfs problem needs fix


> Jay Pound wrote:
>
> [.....................................]
>
> Jay,
>
> This is nothing personal, but I tend to skip your messages, because they
> are so badly formatted that it just hurts my eyes, and I don't have the
> time to parse paragraphs, which occupy half a page... Please try to be
> more concise and divide your messages into shorter paragraphs.
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>