You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Ahmet Arslan <io...@yahoo.com> on 2013/06/24 13:59:15 UTC

hanging crawler

Hello All,

I hava a MCF 1.2 setup ( with postgresql-9.2) where I crawl some newspaper sites using Web connectors.

I use following setting for jobs:

Maximum hop count for link type 'link': 1
Maximum hop count for link type 'redirect': Unlimited
Hop count mode: No deletes, forever

Start method:Start at beginning of schedule window
Schedule type:Scan every document once
Maximum run time: 90 minutes

I scheduled jobs to run every two hours. However after some crawl hangs. I found these exceptions in the log.

What could be wrong? Any suggestions?

Thanks,
Ahmet

ERROR 2013-06-24 10:39:34,999 (Worker thread '1') - Worker thread aborting and restarting due to database connection reset: Database exception: SQLException doing query (25P02): ERROR: current transaction is aborted, commands ignored until end of transaction block
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (25P02): ERROR: current transaction is aborted, commands ignored until end of transaction block
at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1430)
at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
at org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
Caused by: org.postgresql.util.PSQLException: ERROR: current transaction is aborted, commands ignored until end of transaction block
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
ERROR 2013-06-24 10:39:33,473 (Worker thread '1') - Explain failed with error Database exception: SQLException doing query (40001): ERROR: could not serialize access due to read/write dependencies among transactions
  Detail: Reason code: Canceled on identification as a pivot, during conflict out checking.
  Hint: The transaction might succeed if retried.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database exception: SQLException doing query (40001): ERROR: could not serialize access due to read/write dependencies among transactions
  Detail: Reason code: Canceled on identification as a pivot, during conflict out checking.
  Hint: The transaction might succeed if retried.
at org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
at org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.explainQuery(DBInterfacePostgreSQL.java:1233)
at org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1449)
at org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
at org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
at org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
at org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
  Detail: Reason code: Canceled on identification as a pivot, during conflict out checking.
  Hint: The transaction might succeed if retried.
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)

Re: hanging crawler

Posted by Karl Wright <da...@gmail.com>.
Hi Ahmet,

Sorry, googlemail has bug and keeps sending my mail before I am ready.

First, the following error indicates that a transaction should be retried:

org.apache.manifoldcf.core.
interfaces.ManifoldCFException: Database exception: SQLException doing
query (40001): ERROR: could not serialize access due to read/write
dependencies among transactions

The code to retry is already there, as is the code in the
DBInterfacePostgresql.java class to catch the exception.  But where this is
happening is actually trying to print out the EXPLAIN for a long-running
query - and I don't think we've ever seen an EXPLAIN take such a long time
before.

The second error occurs because the transaction has been aborted by
Postgresql but ManifoldCF isn't yet aware of it.  When ManifoldCF sees a
database error it does not know, it tries to reset all connections.  This
logic may or may not work properly; I have seen it hang before, however.

So I think what has happened is: (a) you had a really long running
"addDocuments()" transaction, and (b) it was so long that it tried to print
an EXPLAIN for it, and (c) that failed.  Then the reset logic hung
ManifoldCF.

So there are two bugs here:
- Reset logic hangs manifoldCF sometimes
- EXPLAIN may require retry

Can you create tickets for both of these?

Thanks,

Karl



On Mon, Jun 24, 2013 at 8:05 AM, Karl Wright <da...@gmail.com> wrote:

> Hi Ahmet,
>
> Several things are happening here.
>
> First, the following error indicates that a transaction should be retried:
>
> What is happening is that the database connections are being pooled, and
> they are
>
>
> On Mon, Jun 24, 2013 at 7:59 AM, Ahmet Arslan <io...@yahoo.com> wrote:
>
>> Hello All,
>>
>> I hava a MCF 1.2 setup ( with postgresql-9.2) where I crawl some
>> newspaper sites using Web connectors.
>>
>> I use following setting for jobs:
>>
>> Maximum hop count for link type 'link': 1
>> Maximum hop count for link type 'redirect': Unlimited
>> Hop count mode: No deletes, forever
>>
>> Start method: Start at beginning of schedule window
>> Schedule type: Scan every document once
>> Maximum run time: 90 minutes
>>
>> I scheduled jobs to run every two hours. However after some crawl hangs.
>> I found these exceptions in the log.
>>
>> What could be wrong? Any suggestions?
>>
>> Thanks,
>> Ahmet
>>
>> ERROR 2013-06-24 10:39:34,999 (Worker thread '1') - Worker thread
>> aborting and restarting due to database connection reset: Database
>> exception: SQLException doing query (25P02): ERROR: current transaction is
>> aborted, commands ignored until end of transaction block
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>> exception: SQLException doing query (25P02): ERROR: current transaction is
>> aborted, commands ignored until end of transaction block
>> at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
>> at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
>>  at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1430)
>> at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
>>  at
>> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
>>  at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
>> Caused by: org.postgresql.util.PSQLException: ERROR: current transaction
>> is aborted, commands ignored until end of transaction block
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
>>  at
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
>>  at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>>  ERROR 2013-06-24 10:39:33,473 (Worker thread '1') - Explain failed with
>> error Database exception: SQLException doing query (40001): ERROR: could
>> not serialize access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>> exception: SQLException doing query (40001): ERROR: could not serialize
>> access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.explainQuery(DBInterfacePostgreSQL.java:1233)
>>  at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1449)
>> at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
>>  at
>> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
>>  at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
>> Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize
>> access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
>>  at
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
>>  at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>>
>
>

Re: hanging crawler

Posted by Karl Wright <da...@gmail.com>.
Hi Ahmet,

Several things are happening here.

First, the following error indicates that a transaction should be retried:

What is happening is that the database connections are being pooled, and
they are


On Mon, Jun 24, 2013 at 7:59 AM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hello All,
>
> I hava a MCF 1.2 setup ( with postgresql-9.2) where I crawl some newspaper
> sites using Web connectors.
>
> I use following setting for jobs:
>
> Maximum hop count for link type 'link': 1
> Maximum hop count for link type 'redirect': Unlimited
> Hop count mode: No deletes, forever
>
> Start method: Start at beginning of schedule window
> Schedule type: Scan every document once
> Maximum run time: 90 minutes
>
> I scheduled jobs to run every two hours. However after some crawl hangs. I
> found these exceptions in the log.
>
> What could be wrong? Any suggestions?
>
> Thanks,
> Ahmet
>
> ERROR 2013-06-24 10:39:34,999 (Worker thread '1') - Worker thread aborting
> and restarting due to database connection reset: Database exception:
> SQLException doing query (25P02): ERROR: current transaction is aborted,
> commands ignored until end of transaction block
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
> exception: SQLException doing query (25P02): ERROR: current transaction is
> aborted, commands ignored until end of transaction block
> at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
> at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
> at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1430)
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
> at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
> at
> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
> Caused by: org.postgresql.util.PSQLException: ERROR: current transaction
> is aborted, commands ignored until end of transaction block
> at
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
> at
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
> at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
> at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>  ERROR 2013-06-24 10:39:33,473 (Worker thread '1') - Explain failed with
> error Database exception: SQLException doing query (40001): ERROR: could
> not serialize access due to read/write dependencies among transactions
>   Detail: Reason code: Canceled on identification as a pivot, during
> conflict out checking.
>   Hint: The transaction might succeed if retried.
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
> exception: SQLException doing query (40001): ERROR: could not serialize
> access due to read/write dependencies among transactions
>   Detail: Reason code: Canceled on identification as a pivot, during
> conflict out checking.
>   Hint: The transaction might succeed if retried.
> at
> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
> at
> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.explainQuery(DBInterfacePostgreSQL.java:1233)
> at
> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1449)
> at
> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
> at
> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
> at
> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
> at
> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
> Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize
> access due to read/write dependencies among transactions
>   Detail: Reason code: Canceled on identification as a pivot, during
> conflict out checking.
>   Hint: The transaction might succeed if retried.
> at
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
> at
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
> at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
> at
> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>