You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by brian4 <bq...@gmail.com> on 2013/07/18 19:06:01 UTC

Nutch 2.2.1 Freezing / Deadlocked During Generator Job

On one machine, nutch just suddenly started freezing during the generator
job.  The same files and scripts that worked fine previously now result in
freezing. I checked with the system administrator and no changes were made
to the machine.

I can also run the same crawl (using all of the same programs and files)
from another machine and it runs fine.  Although it is one machine for now,
I am worried that it might randomly happen on other machines at some point
as well, so I can't rely on it for regular crawling. 

I attached one thread dump, the others are the same.  From the JVM thread
dumps:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.1-b03 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000adb2800 nid=0x7a07 waiting
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"pool-1-thread-1-EventThread" daemon prio=10 tid=0x000000000a68a000
nid=0x793b waiting on condition [0x00000000427aa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000bce90078> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)

...

"main" prio=10 tid=0x000000000a14b800 nid=0x7913 waiting on condition
[0x0000000040971000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1387)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:583)
	at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:50)
	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)
	at org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:223)
	at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:279)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:287)


Looking at the dumps, it looks like it may be due to / related to a deadlock
caused by a zookeeper/hbase issue listed at the following link, but maybe it
can be avoided in the nutch generator itself.

https://issues.apache.org/jira/browse/HBASE-2966


However even if that is the cause we would have to wait for gora to be
updated to use the fixed hbase once it's fixed and then for nutch to be
updated to use the updated gora, so I am hoping maybe someone has an idea of
a workaround I could use now.

Otherwise I am thinking of trying to switch to another data store.  Which
data store is most reliable and does not have such deadlock issues?  It
seems like maybe a lot of people use Cassandra, but I had the impression
there were more issues getting it to work correctly than with HBase.


Version info:

Hbase: 0.90.6
Nutch: 2.2.1 (also happened with 2.1)
JDK: jdk1.7.0_05 (also happened with 1.6)

Machine info:

bash-3.2$ uname -a
Linux sc-d01-bh 2.6.18-308.4.1.el5 #1 SMP Tue Apr 17 17:08:00 EDT 2012
x86_64 x86_64 x86_64 GNU/Linux

bash-3.2$ cat /proc/version
Linux version 2.6.18-308.4.1.el5 (mockbuild@builder10.centos.org) (gcc
version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Tue Apr 17 17:08:00 EDT
2012





--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by brian4 <bq...@gmail.com>.

It turns out it was an issue with the machine itself.  I got in touch with
the system administrator again after noticing some other weird behavior and
he thought there might have been an issue with an NFS mount which he cleared
up now.  After that nutch was working fine again.



--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894p4080386.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Brian,
Gora >=0.3 deprecates the gora-sql 0.1.1-incubating artifact.
This means Nutch 2.2.1 and MySQL/HSQLDB are incompatible.
Lewis


On Wed, Jul 24, 2013 at 12:42 PM, brian4 <bq...@gmail.com> wrote:

> It definitely has nothing to do with HBase - I switched to use MySQL and I
> am
> still having the exact same problem - freezing in the exact same spot.
>
> The new thread dump is similar to this one:
>
> http://lucene.472066.n3.nabble.com/Nutch-frozen-but-not-exiting-td604954.html
>
> He reported a similar freezing issue and unfortunately there was no
> resolution found (at least not listed in that thread).
>
> Maybe it is something to do with a java memory setting so I may start
> playing around with those next.
>
> Here is my full thread dump now:
>
> bash-3.2$ jstack -F 29017 >> generator_dump_mysql_F.log
> Attaching to process ID 29017, please wait...
> Debugger attached successfully.
> Server compiler detected.
> JVM version is 23.1-b03
>
> Deadlock Detection:
>
> No deadlocks found.
>
> Thread 29110: (state = BLOCKED)
>
>
> Thread 29058: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
> line=186 (Interpreted frame)
>  -
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
> @bci=42, line=2043 (Interpreted frame)
>  - org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run()
> @bci=55, line=1345 (Interpreted frame)
>
>
> Thread 29047: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - org.apache.hadoop.mapred.Task$TaskReporter.run() @bci=45, line=658
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)
>
>
> Thread 29046: (state = IN_NATIVE)
>  - java.io.UnixFileSystem.getSpace(java.io.File, int) @bci=0 (Interpreted
> frame)
>  - java.io.File.getUsableSpace() @bci=34, line=1758 (Interpreted frame)
>  - org.apache.hadoop.fs.DF.getAvailable() @bci=4, line=79 (Interpreted
> frame)
>  -
>
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(java.lang.String,
> long, org.apache.hadoop.conf.Configuration, boolean) @bci=239, line=367
> (Interpreted frame)
>  -
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(java.lang.String,
> long, org.apache.hadoop.conf.Configuration, boolean) @bci=18, line=146
> (Interpreted frame)
>  -
>
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(java.lang.String,
> long, org.apache.hadoop.conf.Configuration) @bci=6, line=127 (Interpreted
> frame)
>  - org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(int, long)
> @bci=33, line=121 (Interpreted frame)
>  - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill() @bci=75,
> line=1397 (Interpreted frame)
>  - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush() @bci=102,
> line=1303 (Interpreted frame)
>  -
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(org.apache.hadoop.mapreduce.TaskAttemptContext)
> @bci=4, line=698 (Interpreted frame)
>  -
>
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
> org.apache.hadoop.mapred.Task$TaskReporter) @bci=324, line=767 (Interpreted
> frame)
>  - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=364
> (Interpreted frame)
>  - org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run()
> @bci=221, line=223 (Interpreted frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471
> (Interpreted frame)
>  - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334
> (Interpreted frame)
>  - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Interpreted
> frame)
>  -
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=46, line=1110 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=603
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)
>
>
> Thread 29045: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
>  - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
> @bci=20, line=226 (Interpreted frame)
>  -
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long)
> @bci=68, line=2082 (Interpreted frame)
>  - java.util.concurrent.ThreadPoolExecutor.awaitTermination(long,
> java.util.concurrent.TimeUnit) @bci=68, line=1433 (Interpreted frame)
>  - org.apache.hadoop.mapred.LocalJobRunner$Job.run() @bci=202, line=341
> (Interpreted frame)
>
>
> Thread 29040: (state = BLOCKED)
>
>
> Thread 29039: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=135 (Interpreted
> frame)
>  - java.lang.ref.ReferenceQueue.remove() @bci=2, line=151 (Interpreted
> frame)
>  - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=177
> (Interpreted frame)
>
>
> Thread 29038: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame)
>  - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=133
> (Interpreted frame)
>
>
> Thread 29036: (state = BLOCKED)
>  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
>  -
>
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapred.RunningJob) @bci=80, line=1387 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.Job.waitForCompletion(boolean) @bci=30,
> line=583 (Interpreted frame)
>  - org.apache.nutch.util.NutchJob.waitForCompletion(boolean) @bci=2,
> line=50
> (Interpreted frame)
>  - org.apache.nutch.crawl.GeneratorJob.run(java.util.Map) @bci=361,
> line=199
> (Interpreted frame)
>  - org.apache.nutch.crawl.GeneratorJob.generate(long, long, boolean,
> boolean) @bci=224, line=223 (Interpreted frame)
>  - org.apache.nutch.crawl.GeneratorJob.run(java.lang.String[]) @bci=386,
> line=279 (Interpreted frame)
>  -
> org.apache.hadoop.util.ToolRunner.run(org.apache.hadoop.conf.Configuration,
> org.apache.hadoop.util.Tool, java.lang.String[]) @bci=38, line=65
> (Interpreted frame)
>  - org.apache.nutch.crawl.GeneratorJob.main(java.lang.String[]) @bci=11,
> line=287 (Interpreted frame)
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894p4080155.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*Lewis*

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by brian4 <bq...@gmail.com>.

It definitely has nothing to do with HBase - I switched to use MySQL and I am
still having the exact same problem - freezing in the exact same spot.

The new thread dump is similar to this one:
http://lucene.472066.n3.nabble.com/Nutch-frozen-but-not-exiting-td604954.html

He reported a similar freezing issue and unfortunately there was no
resolution found (at least not listed in that thread).

Maybe it is something to do with a java memory setting so I may start
playing around with those next.

Here is my full thread dump now:

bash-3.2$ jstack -F 29017 >> generator_dump_mysql_F.log
Attaching to process ID 29017, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 23.1-b03

Deadlock Detection:

No deadlocks found.

Thread 29110: (state = BLOCKED)


Thread 29058: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
line=186 (Interpreted frame)
 -
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
@bci=42, line=2043 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run()
@bci=55, line=1345 (Interpreted frame)


Thread 29047: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - org.apache.hadoop.mapred.Task$TaskReporter.run() @bci=45, line=658
(Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)


Thread 29046: (state = IN_NATIVE)
 - java.io.UnixFileSystem.getSpace(java.io.File, int) @bci=0 (Interpreted
frame)
 - java.io.File.getUsableSpace() @bci=34, line=1758 (Interpreted frame)
 - org.apache.hadoop.fs.DF.getAvailable() @bci=4, line=79 (Interpreted
frame)
 -
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(java.lang.String,
long, org.apache.hadoop.conf.Configuration, boolean) @bci=239, line=367
(Interpreted frame)
 -
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(java.lang.String,
long, org.apache.hadoop.conf.Configuration, boolean) @bci=18, line=146
(Interpreted frame)
 -
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(java.lang.String,
long, org.apache.hadoop.conf.Configuration) @bci=6, line=127 (Interpreted
frame)
 - org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(int, long)
@bci=33, line=121 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill() @bci=75,
line=1397 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush() @bci=102,
line=1303 (Interpreted frame)
 -
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(org.apache.hadoop.mapreduce.TaskAttemptContext)
@bci=4, line=698 (Interpreted frame)
 -
org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
org.apache.hadoop.mapred.TaskUmbilicalProtocol,
org.apache.hadoop.mapred.Task$TaskReporter) @bci=324, line=767 (Interpreted
frame)
 - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=364
(Interpreted frame)
 - org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run()
@bci=221, line=223 (Interpreted frame)
 - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=471
(Interpreted frame)
 - java.util.concurrent.FutureTask$Sync.innerRun() @bci=29, line=334
(Interpreted frame)
 - java.util.concurrent.FutureTask.run() @bci=4, line=166 (Interpreted
frame)
 -
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=46, line=1110 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=603
(Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=722 (Interpreted frame)


Thread 29045: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
@bci=20, line=226 (Interpreted frame)
 -
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long)
@bci=68, line=2082 (Interpreted frame)
 - java.util.concurrent.ThreadPoolExecutor.awaitTermination(long,
java.util.concurrent.TimeUnit) @bci=68, line=1433 (Interpreted frame)
 - org.apache.hadoop.mapred.LocalJobRunner$Job.run() @bci=202, line=341
(Interpreted frame)


Thread 29040: (state = BLOCKED)


Thread 29039: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=135 (Interpreted
frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=151 (Interpreted
frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=177
(Interpreted frame)


Thread 29038: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=503 (Interpreted frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=133
(Interpreted frame)


Thread 29036: (state = BLOCKED)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 -
org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(org.apache.hadoop.mapred.JobConf,
org.apache.hadoop.mapred.RunningJob) @bci=80, line=1387 (Interpreted frame)
 - org.apache.hadoop.mapreduce.Job.waitForCompletion(boolean) @bci=30,
line=583 (Interpreted frame)
 - org.apache.nutch.util.NutchJob.waitForCompletion(boolean) @bci=2, line=50
(Interpreted frame)
 - org.apache.nutch.crawl.GeneratorJob.run(java.util.Map) @bci=361, line=199
(Interpreted frame)
 - org.apache.nutch.crawl.GeneratorJob.generate(long, long, boolean,
boolean) @bci=224, line=223 (Interpreted frame)
 - org.apache.nutch.crawl.GeneratorJob.run(java.lang.String[]) @bci=386,
line=279 (Interpreted frame)
 -
org.apache.hadoop.util.ToolRunner.run(org.apache.hadoop.conf.Configuration,
org.apache.hadoop.util.Tool, java.lang.String[]) @bci=38, line=65
(Interpreted frame)
 - org.apache.nutch.crawl.GeneratorJob.main(java.lang.String[]) @bci=11,
line=287 (Interpreted frame)






--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894p4080155.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by brian4 <bq...@gmail.com>.

The zookeeper library is included with the hbase package so it's not an issue
of the zookeeper version not matching what hbase expects - I did not install
it separately, and the same set-up worked fine with other machines.  I found
hbase does work with the latest zookeeper in the same branch (3.3.x) but
this did not resolve the issue on that machine.

I even tried removing all hbase folders (data and app) and starting from a
fresh extract of hbase, still the same problem.

I think this is due to a bug with hbase, maybe it is the same one as
mentioned above, if so it wasn't fixed by the fixes to zookeeper.





--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894p4079519.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Brian,

On Friday, July 19, 2013, brian4 <bq...@gmail.com> wrote:
> No not continuous or large-scale.  Crawls are just run each day.  The
> machine that has the freezing issue was the one I was planning to use to
do
> the daily crawls.

Think this is most certainly a local config bottleneck. Nutch is capable of
generating many many URLs.

> Think I should try an older version of hbase?

u said your using 0.90.6... I think your Okay here, its just a case of
finding what wrong with the ZK setup... as this looks like where its @.

> For using a newer version there is a chain of dependencies, I tried seeing
> if I could just use the latest zookeeper with ...

Please make sure the ZK edition matches what is expected by your hbase
distro. You can check this by navigating to the relevant source tag in the
hbase svn area.

hth
have a gd weekend

-- 
*Lewis*

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by brian4 <bq...@gmail.com>.

>Are these continuous crawls? 

No not continuous or large-scale.  Crawls are just run each day.  The
machine that has the freezing issue was the one I was planning to use to do
the daily crawls.

>What values d
>you have set for generate.max.count? 

generate.max.count is set as the default, -1

> I wiould check
>your hbase/zk installation before you think about ditching everything and
>jumping ship. 

Think I should try an older version of hbase?

For using a newer version there is a chain of dependencies, I tried seeing
if I could just use the latest zookeeper with the older hbase but it didn't
work (the API changed I guess).  Similarly a newer HBase wouldn't work with
gora.  Not to mention it's not really clear if the issue is resolved in a
newer HBase - that issue said it was still unresolved, which is odd because
I thought the newer hbase uses the newer zookeeper in which the issue is
supposed to be resolved.






--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-2-1-Freezing-Deadlocked-During-Generator-Job-tp4078894p4079144.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 2.2.1 Freezing / Deadlocked During Generator Job

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Brian,
On Thursday, July 18, 2013, brian4 <bq...@gmail.com> wrote:
> On one machine, nutch just suddenly started freezing during the generator
> job.
Are these continuous crawls? What values d
you have set for generate.max.count? I ask as calls must be made to the
backed the determine a limit for URLs to generate into batches... I suppose
if you're running with a -1 value for this figure the call could be
expensive as well.
>
> I can also run the same crawl (using all of the same programs and files)
> from another machine and it runs fine.  Although it is one machine for
now,
> I am worried that it might randomly happen on other machines at some point
> as well, so I can't rely on it for regular crawling.

Mmm. So maybe you are not doing continuous large scale crawls as I thought
above?

> Looking at the dumps, it looks like it may be due to / related to a
deadlock
> caused by a zookeeper/hbase issue listed at the following link, but maybe
it
> can be avoided in the nutch generator itself.
>
> https://issues.apache.org/jira/browse/HBASE-2966

Yep

>
>
> However even if that is the cause we would have to wait for gora to be
> updated to use the fixed hbase once it's fixed and then for nutch to be
> updated to use the updated gora, so I am hoping maybe someone has an idea
of
> a workaround I could use now.

I've not heard anyone coming here with a similar problem! I am confused on
this one.

>
> Otherwise I am thinking of trying to switch to another data store.  Which
> data store is most reliable and does not have such deadlock issues?

If this is a problem with a zookeeper server then it may not be linked to
Gora. There is not one line of zookeeper code within Gora. I wiould check
your hbase/zk installation before you think about ditching everything and
jumping ship.

It
> seems like maybe a lot of people use Cassandra, but I had the impression
> there were more issues getting it to work correctly than with HBase.

Every1 to their own I suppose here. There are a number of *stable* backends
which can be used. If getting things working  easily is your primary
criteria then I would't say there is much between the available options.
hth

-- 
*Lewis*