You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Farzad Valad <ho...@farzad.net> on 2011/07/06 21:27:31 UTC

Frozen Again

So this time I went through the thread dump and don't see any socket 
waits.  Any thoughts why it is stuck this time?

Thanks,
Farzad.

Re: Frozen Again

Posted by Farzad Valad <ho...@farzad.net>.
I don't see an attachment in the email.

On 7/7/2011 1:20 PM, Karl Wright wrote:
> Attached please find an instrumented
> framework\pull-agent\src\main\java\org\apache\manifoldcf\crawler\system\ResetManager.java
> class.  Please rebuild with this class, cause the hang, and capture
> standard out so I can see it.
>
> Thanks!
> Karl
>
>
> On Thu, Jul 7, 2011 at 2:12 PM, Karl Wright<da...@gmail.com>  wrote:
>> Thanks.  I maybe can send you an instrumented ResetManager class later
>> today, if you are in a position to rebuild MCF and try this again.
>>
>> Karl
>>
>> On Thu, Jul 7, 2011 at 2:06 PM, Farzad Valad<ho...@farzad.net>  wrote:
>>> I'm attaching the current thread dump file that goes with the log file.  It
>>> is easy to recreate just cause an insert failure do size mismatch between
>>> the column and value, where the value can't fit. More than happy to test and
>>> help out.
>>>
>>> On 7/6/2011 2:44 PM, Farzad Valad wrote:
>>>> You are right, it was db error.  In this case I tried to insert a value
>>>> larger than the column size and the insert failed.  I'll grab the log next
>>>> time too, but unfortunately deleted and running another test with a larger
>>>> column.  As soon as it finishes or errors, I'll reproduce this one again and
>>>> send you the stack trace.
>>>>
>>>> On 7/6/2011 2:36 PM, Karl Wright wrote:
>>>>> I have seen this before.  The critical traceback, which you see for
>>>>> ALL the worker threads, is:
>>>>>
>>>>> "Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
>>>>> Object.wait() [0x000000000b1af000]
>>>>>     java.lang.Thread.State: WAITING (on object monitor)
>>>>>          at java.lang.Object.wait(Native Method)
>>>>>          at java.lang.Object.wait(Object.java:485)
>>>>>          at
>>>>> org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
>>>>>          - locked<0x00000000e0005528>    (a
>>>>> org.apache.manifoldcf.crawler.system.WorkerResetManager)
>>>>>          at
>>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)
>>>>>
>>>>>
>>>>> ManifoldCF has code in it for dealing with database errors that
>>>>> requires all worker threads to be brought into the same state.  This
>>>>> code has never worked properly, and I've never been able to figure out
>>>>> why.  But the underlying problem is that you've had a database error
>>>>> of some kind which requires a reset.  This is usually a connection
>>>>> error.
>>>>>
>>>>> Can you look at manifoldcf.log and send the last stack trace in it?
>>>>> It could be too short a connection lifetime in either the manifoldcf
>>>>> configuration or in the postgresql configuration.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>    wrote:
>>>>>> So this time I went through the thread dump and don't see any socket
>>>>>> waits.
>>>>>>   Any thoughts why it is stuck this time?
>>>>>>
>>>>>> Thanks,
>>>>>> Farzad.
>>>>>>
>>>


Re: Frozen Again

Posted by Karl Wright <da...@gmail.com>.
Attached please find an instrumented
framework\pull-agent\src\main\java\org\apache\manifoldcf\crawler\system\ResetManager.java
class.  Please rebuild with this class, cause the hang, and capture
standard out so I can see it.

Thanks!
Karl


On Thu, Jul 7, 2011 at 2:12 PM, Karl Wright <da...@gmail.com> wrote:
> Thanks.  I maybe can send you an instrumented ResetManager class later
> today, if you are in a position to rebuild MCF and try this again.
>
> Karl
>
> On Thu, Jul 7, 2011 at 2:06 PM, Farzad Valad <ho...@farzad.net> wrote:
>> I'm attaching the current thread dump file that goes with the log file.  It
>> is easy to recreate just cause an insert failure do size mismatch between
>> the column and value, where the value can't fit. More than happy to test and
>> help out.
>>
>> On 7/6/2011 2:44 PM, Farzad Valad wrote:
>>>
>>> You are right, it was db error.  In this case I tried to insert a value
>>> larger than the column size and the insert failed.  I'll grab the log next
>>> time too, but unfortunately deleted and running another test with a larger
>>> column.  As soon as it finishes or errors, I'll reproduce this one again and
>>> send you the stack trace.
>>>
>>> On 7/6/2011 2:36 PM, Karl Wright wrote:
>>>>
>>>> I have seen this before.  The critical traceback, which you see for
>>>> ALL the worker threads, is:
>>>>
>>>> "Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
>>>> Object.wait() [0x000000000b1af000]
>>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>>         at java.lang.Object.wait(Native Method)
>>>>         at java.lang.Object.wait(Object.java:485)
>>>>         at
>>>> org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
>>>>         - locked<0x00000000e0005528>  (a
>>>> org.apache.manifoldcf.crawler.system.WorkerResetManager)
>>>>         at
>>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)
>>>>
>>>>
>>>> ManifoldCF has code in it for dealing with database errors that
>>>> requires all worker threads to be brought into the same state.  This
>>>> code has never worked properly, and I've never been able to figure out
>>>> why.  But the underlying problem is that you've had a database error
>>>> of some kind which requires a reset.  This is usually a connection
>>>> error.
>>>>
>>>> Can you look at manifoldcf.log and send the last stack trace in it?
>>>> It could be too short a connection lifetime in either the manifoldcf
>>>> configuration or in the postgresql configuration.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>  wrote:
>>>>>
>>>>> So this time I went through the thread dump and don't see any socket
>>>>> waits.
>>>>>  Any thoughts why it is stuck this time?
>>>>>
>>>>> Thanks,
>>>>> Farzad.
>>>>>
>>>
>>
>>
>

Re: Frozen Again

Posted by Karl Wright <da...@gmail.com>.
Thanks.  I maybe can send you an instrumented ResetManager class later
today, if you are in a position to rebuild MCF and try this again.

Karl

On Thu, Jul 7, 2011 at 2:06 PM, Farzad Valad <ho...@farzad.net> wrote:
> I'm attaching the current thread dump file that goes with the log file.  It
> is easy to recreate just cause an insert failure do size mismatch between
> the column and value, where the value can't fit. More than happy to test and
> help out.
>
> On 7/6/2011 2:44 PM, Farzad Valad wrote:
>>
>> You are right, it was db error.  In this case I tried to insert a value
>> larger than the column size and the insert failed.  I'll grab the log next
>> time too, but unfortunately deleted and running another test with a larger
>> column.  As soon as it finishes or errors, I'll reproduce this one again and
>> send you the stack trace.
>>
>> On 7/6/2011 2:36 PM, Karl Wright wrote:
>>>
>>> I have seen this before.  The critical traceback, which you see for
>>> ALL the worker threads, is:
>>>
>>> "Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
>>> Object.wait() [0x000000000b1af000]
>>>    java.lang.Thread.State: WAITING (on object monitor)
>>>         at java.lang.Object.wait(Native Method)
>>>         at java.lang.Object.wait(Object.java:485)
>>>         at
>>> org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
>>>         - locked<0x00000000e0005528>  (a
>>> org.apache.manifoldcf.crawler.system.WorkerResetManager)
>>>         at
>>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)
>>>
>>>
>>> ManifoldCF has code in it for dealing with database errors that
>>> requires all worker threads to be brought into the same state.  This
>>> code has never worked properly, and I've never been able to figure out
>>> why.  But the underlying problem is that you've had a database error
>>> of some kind which requires a reset.  This is usually a connection
>>> error.
>>>
>>> Can you look at manifoldcf.log and send the last stack trace in it?
>>> It could be too short a connection lifetime in either the manifoldcf
>>> configuration or in the postgresql configuration.
>>>
>>> Karl
>>>
>>>
>>> On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>  wrote:
>>>>
>>>> So this time I went through the thread dump and don't see any socket
>>>> waits.
>>>>  Any thoughts why it is stuck this time?
>>>>
>>>> Thanks,
>>>> Farzad.
>>>>
>>
>
>

Re: Frozen Again

Posted by Farzad Valad <ho...@farzad.net>.
I'm attaching the current thread dump file that goes with the log file.  
It is easy to recreate just cause an insert failure do size mismatch 
between the column and value, where the value can't fit. More than happy 
to test and help out.

On 7/6/2011 2:44 PM, Farzad Valad wrote:
> You are right, it was db error.  In this case I tried to insert a 
> value larger than the column size and the insert failed.  I'll grab 
> the log next time too, but unfortunately deleted and running another 
> test with a larger column.  As soon as it finishes or errors, I'll 
> reproduce this one again and send you the stack trace.
>
> On 7/6/2011 2:36 PM, Karl Wright wrote:
>> I have seen this before.  The critical traceback, which you see for
>> ALL the worker threads, is:
>>
>> "Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
>> Object.wait() [0x000000000b1af000]
>>     java.lang.Thread.State: WAITING (on object monitor)
>>          at java.lang.Object.wait(Native Method)
>>          at java.lang.Object.wait(Object.java:485)
>>          at 
>> org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
>>          - locked<0x00000000e0005528>  (a
>> org.apache.manifoldcf.crawler.system.WorkerResetManager)
>>          at 
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)
>>
>>
>> ManifoldCF has code in it for dealing with database errors that
>> requires all worker threads to be brought into the same state.  This
>> code has never worked properly, and I've never been able to figure out
>> why.  But the underlying problem is that you've had a database error
>> of some kind which requires a reset.  This is usually a connection
>> error.
>>
>> Can you look at manifoldcf.log and send the last stack trace in it?
>> It could be too short a connection lifetime in either the manifoldcf
>> configuration or in the postgresql configuration.
>>
>> Karl
>>
>>
>> On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>  wrote:
>>> So this time I went through the thread dump and don't see any socket 
>>> waits.
>>>   Any thoughts why it is stuck this time?
>>>
>>> Thanks,
>>> Farzad.
>>>
>


Re: Frozen Again

Posted by Farzad Valad <ho...@farzad.net>.
You are right, it was db error.  In this case I tried to insert a value 
larger than the column size and the insert failed.  I'll grab the log 
next time too, but unfortunately deleted and running another test with a 
larger column.  As soon as it finishes or errors, I'll reproduce this 
one again and send you the stack trace.

On 7/6/2011 2:36 PM, Karl Wright wrote:
> I have seen this before.  The critical traceback, which you see for
> ALL the worker threads, is:
>
> "Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
> Object.wait() [0x000000000b1af000]
>     java.lang.Thread.State: WAITING (on object monitor)
>          at java.lang.Object.wait(Native Method)
>          at java.lang.Object.wait(Object.java:485)
>          at org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
>          - locked<0x00000000e0005528>  (a
> org.apache.manifoldcf.crawler.system.WorkerResetManager)
>          at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)
>
>
> ManifoldCF has code in it for dealing with database errors that
> requires all worker threads to be brought into the same state.  This
> code has never worked properly, and I've never been able to figure out
> why.  But the underlying problem is that you've had a database error
> of some kind which requires a reset.  This is usually a connection
> error.
>
> Can you look at manifoldcf.log and send the last stack trace in it?
> It could be too short a connection lifetime in either the manifoldcf
> configuration or in the postgresql configuration.
>
> Karl
>
>
> On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad<ho...@farzad.net>  wrote:
>> So this time I went through the thread dump and don't see any socket waits.
>>   Any thoughts why it is stuck this time?
>>
>> Thanks,
>> Farzad.
>>


Re: Frozen Again

Posted by Karl Wright <da...@gmail.com>.
I have seen this before.  The critical traceback, which you see for
ALL the worker threads, is:

"Worker thread '36'" daemon prio=6 tid=0x00000000077ed000 nid=0xa98 in
Object.wait() [0x000000000b1af000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.manifoldcf.crawler.system.ResetManager.waitForReset(ResetManager.java:107)
        - locked <0x00000000e0005528> (a
org.apache.manifoldcf.crawler.system.WorkerResetManager)
        at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:110)


ManifoldCF has code in it for dealing with database errors that
requires all worker threads to be brought into the same state.  This
code has never worked properly, and I've never been able to figure out
why.  But the underlying problem is that you've had a database error
of some kind which requires a reset.  This is usually a connection
error.

Can you look at manifoldcf.log and send the last stack trace in it?
It could be too short a connection lifetime in either the manifoldcf
configuration or in the postgresql configuration.

Karl


On Wed, Jul 6, 2011 at 3:27 PM, Farzad Valad <ho...@farzad.net> wrote:
> So this time I went through the thread dump and don't see any socket waits.
>  Any thoughts why it is stuck this time?
>
> Thanks,
> Farzad.
>