You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by aasoj j <aa...@gmail.com> on 2009/12/10 02:29:17 UTC

Failure in updating LOCAL_REVISIONS causes infinite sync loop

Hi,

We are facing a problem with synchronization in a cluster scenario.
Please see if you can identify and help.

Deployment: In our scenario, network latency between one of the JR
(node1) nodes and the DB is high. A huge number of content updates
were made through a different JR node (NODE2) with low network latency
to the DB. Cluster nodes are configured to poll the DB journal every
10 seconds.

Problem: We observed that node1 is not able to synchronize completely
and keeps going in a loop (see revision 220819 coming multiple times
in the logs below). While the cluster sync loop was going on node1
(one with slow DB access), we fetched some content and found a few
instances of missing updates. The issue was not seen after restarting
the application.

2009-12-07 23:30:10,975| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 220819
2009-12-07 23:30:11,058| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 220820
...
2009-12-07 23:43:27,856| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 233131
2009-12-07 23:43:27,938| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 233132
2009-12-07 23:43:28,023| WARN| DatabaseJournal|1189|ClusterNode-node1
Failed to update local revision.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

Last packet sent to the server was 1 ms ago.
        at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown
Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
        at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:995)
        at org.apache.jackrabbit.core.journal.DatabaseJournal$DatabaseRevision.set(DatabaseJournal.java:1186)
        at org.apache.jackrabbit.core.cluster.ClusterNode.setRevision(ClusterNode.java:827)
        at org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:231)
        at org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:188)
        at org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:329)
        at org.apache.jackrabbit.core.cluster.ClusterNode.run(ClusterNode.java:295)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
        at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
        at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2428)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
        ... 14 more
2009-12-07 23:43:28,023| INFO| AbstractJournal|233|ClusterNode-node1
Synchronized to revision: 233132
2009-12-07 23:44:00,425| INFO| DocNumberCache|162|RMI TCP
Connection(192640)-66.196.81.46 size=1024/1024, #accesses=2281,
#hits=2281, #misses=0, cacheRatio=100%
2009-12-07 23:44:01,921| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 220819
2009-12-07 23:44:02,007| INFO| ClusterNode|812|ClusterNode-node1
Processing revision: 220820
2009-12-07 23:44:02,008| INFO| ClusterNode|812|ClusterNode- Processing
revision: 220821


The looping is due to failure in updating LOCAL_REVISIONS table. This
could be because DB resets the idle connection.

1) What is a recommended solution to this problem?
2) Is there a way we can configure max number of revisions after which
LOCAL_REVISION table is updated than than updating it after taking all
the revisions?
3) Is it recommended to set DB autoreconnect to true to avoid this situation?

Thanks in advance
Regards
aasoj

Re: Failure in updating LOCAL_REVISIONS causes infinite sync loop

Posted by aasoj j <aa...@gmail.com>.
Thanks Martijin. I will try reducing the polling interval and try again.

The problem of failure to update LOCAL_REVISIONS and taking up 12000
revisions again is recurring. The same sequence is observed many
times. I had one more query. When the cycle restarts do the search
indexes and cache get inconsistent? Also, is it recommended to set DB
autoreconnect to true to avoid this situation?

regards
aasoj

On Thu, Dec 10, 2009 at 5:18 PM, Martijn Hendriks <mh...@gmail.com> wrote:
> Hi,
>
> Is this a recurring problem, or did the update of the LOCAL_REVISIONS
> table fail only once?
>
> You could try to set the polling interval lower, e.g., to 1 second.
> This probably decreases the number of revisions that are processed
> each sync loop. Note that the logs tell you that node1 must pull in
> more than 12000 revisions. That would explain the missing updates.
>
> Best regards,
> Martijn
>
> On Thu, Dec 10, 2009 at 2:29 AM, aasoj j <aa...@gmail.com> wrote:
>> Hi,
>>
>> We are facing a problem with synchronization in a cluster scenario.
>> Please see if you can identify and help.
>>
>> Deployment: In our scenario, network latency between one of the JR
>> (node1) nodes and the DB is high. A huge number of content updates
>> were made through a different JR node (NODE2) with low network latency
>> to the DB. Cluster nodes are configured to poll the DB journal every
>> 10 seconds.
>>
>> Problem: We observed that node1 is not able to synchronize completely
>> and keeps going in a loop (see revision 220819 coming multiple times
>> in the logs below). While the cluster sync loop was going on node1
>> (one with slow DB access), we fetched some content and found a few
>> instances of missing updates. The issue was not seen after restarting
>> the application.
>>
>> 2009-12-07 23:30:10,975| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 220819
>> 2009-12-07 23:30:11,058| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 220820
>> ...
>> 2009-12-07 23:43:27,856| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 233131
>> 2009-12-07 23:43:27,938| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 233132
>> 2009-12-07 23:43:28,023| WARN| DatabaseJournal|1189|ClusterNode-node1
>> Failed to update local revision.
>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>> Communications link failure
>>
>> Last packet sent to the server was 1 ms ago.
>>        at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown
>> Source)
>>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
>>        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
>>        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
>>        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
>>        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
>>        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
>>        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
>>        at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:995)
>>        at org.apache.jackrabbit.core.journal.DatabaseJournal$DatabaseRevision.set(DatabaseJournal.java:1186)
>>        at org.apache.jackrabbit.core.cluster.ClusterNode.setRevision(ClusterNode.java:827)
>>        at org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:231)
>>        at org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:188)
>>        at org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:329)
>>        at org.apache.jackrabbit.core.cluster.ClusterNode.run(ClusterNode.java:295)
>>        at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.net.SocketException: Connection reset
>>        at java.net.SocketInputStream.read(SocketInputStream.java:168)
>>        at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
>>        at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
>>        at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
>>        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2428)
>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
>>        ... 14 more
>> 2009-12-07 23:43:28,023| INFO| AbstractJournal|233|ClusterNode-node1
>> Synchronized to revision: 233132
>> 2009-12-07 23:44:00,425| INFO| DocNumberCache|162|RMI TCP
>> Connection(192640)-66.196.81.46 size=1024/1024, #accesses=2281,
>> #hits=2281, #misses=0, cacheRatio=100%
>> 2009-12-07 23:44:01,921| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 220819
>> 2009-12-07 23:44:02,007| INFO| ClusterNode|812|ClusterNode-node1
>> Processing revision: 220820
>> 2009-12-07 23:44:02,008| INFO| ClusterNode|812|ClusterNode- Processing
>> revision: 220821
>>
>>
>> The looping is due to failure in updating LOCAL_REVISIONS table. This
>> could be because DB resets the idle connection.
>>
>> 1) What is a recommended solution to this problem?
>> 2) Is there a way we can configure max number of revisions after which
>> LOCAL_REVISION table is updated than than updating it after taking all
>> the revisions?
>> 3) Is it recommended to set DB autoreconnect to true to avoid this situation?
>>
>> Thanks in advance
>> Regards
>> aasoj
>>
>

Re: Failure in updating LOCAL_REVISIONS causes infinite sync loop

Posted by Martijn Hendriks <mh...@gmail.com>.
Hi,

Is this a recurring problem, or did the update of the LOCAL_REVISIONS
table fail only once?

You could try to set the polling interval lower, e.g., to 1 second.
This probably decreases the number of revisions that are processed
each sync loop. Note that the logs tell you that node1 must pull in
more than 12000 revisions. That would explain the missing updates.

Best regards,
Martijn

On Thu, Dec 10, 2009 at 2:29 AM, aasoj j <aa...@gmail.com> wrote:
> Hi,
>
> We are facing a problem with synchronization in a cluster scenario.
> Please see if you can identify and help.
>
> Deployment: In our scenario, network latency between one of the JR
> (node1) nodes and the DB is high. A huge number of content updates
> were made through a different JR node (NODE2) with low network latency
> to the DB. Cluster nodes are configured to poll the DB journal every
> 10 seconds.
>
> Problem: We observed that node1 is not able to synchronize completely
> and keeps going in a loop (see revision 220819 coming multiple times
> in the logs below). While the cluster sync loop was going on node1
> (one with slow DB access), we fetched some content and found a few
> instances of missing updates. The issue was not seen after restarting
> the application.
>
> 2009-12-07 23:30:10,975| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 220819
> 2009-12-07 23:30:11,058| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 220820
> ...
> 2009-12-07 23:43:27,856| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 233131
> 2009-12-07 23:43:27,938| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 233132
> 2009-12-07 23:43:28,023| WARN| DatabaseJournal|1189|ClusterNode-node1
> Failed to update local revision.
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
> Communications link failure
>
> Last packet sent to the server was 1 ms ago.
>        at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown
> Source)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
>        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1074)
>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2985)
>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2871)
>        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3414)
>        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1936)
>        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2060)
>        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542)
>        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734)
>        at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:995)
>        at org.apache.jackrabbit.core.journal.DatabaseJournal$DatabaseRevision.set(DatabaseJournal.java:1186)
>        at org.apache.jackrabbit.core.cluster.ClusterNode.setRevision(ClusterNode.java:827)
>        at org.apache.jackrabbit.core.journal.AbstractJournal.doSync(AbstractJournal.java:231)
>        at org.apache.jackrabbit.core.journal.AbstractJournal.sync(AbstractJournal.java:188)
>        at org.apache.jackrabbit.core.cluster.ClusterNode.sync(ClusterNode.java:329)
>        at org.apache.jackrabbit.core.cluster.ClusterNode.run(ClusterNode.java:295)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.SocketException: Connection reset
>        at java.net.SocketInputStream.read(SocketInputStream.java:168)
>        at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:113)
>        at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:160)
>        at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:188)
>        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2428)
>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2882)
>        ... 14 more
> 2009-12-07 23:43:28,023| INFO| AbstractJournal|233|ClusterNode-node1
> Synchronized to revision: 233132
> 2009-12-07 23:44:00,425| INFO| DocNumberCache|162|RMI TCP
> Connection(192640)-66.196.81.46 size=1024/1024, #accesses=2281,
> #hits=2281, #misses=0, cacheRatio=100%
> 2009-12-07 23:44:01,921| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 220819
> 2009-12-07 23:44:02,007| INFO| ClusterNode|812|ClusterNode-node1
> Processing revision: 220820
> 2009-12-07 23:44:02,008| INFO| ClusterNode|812|ClusterNode- Processing
> revision: 220821
>
>
> The looping is due to failure in updating LOCAL_REVISIONS table. This
> could be because DB resets the idle connection.
>
> 1) What is a recommended solution to this problem?
> 2) Is there a way we can configure max number of revisions after which
> LOCAL_REVISION table is updated than than updating it after taking all
> the revisions?
> 3) Is it recommended to set DB autoreconnect to true to avoid this situation?
>
> Thanks in advance
> Regards
> aasoj
>