You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by Dennis <de...@muhlesteins.com> on 2005/07/28 18:30:32 UTC

Re: 5.5.10 cluster exception

Well, I'll move this discussion over to the user thread then..
See below.

Remy Maucherat wrote:

> Dennis wrote:
>
>> I'm working with the cvs tagged 5.5.10 version of tomcat to check out
>> some clustering fixes.
>>
>> When I bring a 2nd server into the pool, I get this exception repeated
>> every time an mcast packet is received:
>>
>> ==CUT==
>> java.lang.ArrayIndexOutOfBoundsException
>>         at
>> java.lang.System.arraycopy(Ljava.lang.Object;ILjava.lang.Object;II)V(Unknown
>>
>> Source)
>>         at
>> org.apache.catalina.cluster.mcast.McastMember.getMember(McastMember.java:181)
>>
>>         at
>> org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:209)
>>
>>         at
>> org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:253)
>>
>> ==END CUT==
>>
>> Here are the relevant lines of code from McastMember.java:
>>
>> ==CUT==
>> byte[] domaind = new byte[dlen];
>> System.arraycopy(data, nlen + 24, domaind, 0, domaind.length);
>> ==END CUT==
>>
>> I added some debugging to figure out the length of the data.  The
>> exception occurs because data.length is 23.  Obviously 23+24 is going to
>> throw an ArrayIndexOBE.
>>
>> My question is.. is there a configuration thing that is causing data to
>> be sent to be less than the desired length?  It appears the data is not
>> coming in in the format expected.
>>
>> Thoughts?
>
After more digging and logging, I've fixed the exception.  Clustering
does work for some people and not for others. The problem is that when
receiving a DatagramPacket the byte buffer is set at a constant size.  I
noticed there is one receivePacket in McastServiceImpl.  The packet
get's re-used.  Here is some output from my logging with annotation:

# I added the following line when a node sends a packet.  Notice that
the data length is 50.  This is because name length (ip address has 23
characters and there are 27 other bytes.
name: tcp://192.168.1.27:4002 domain: dev addr:(len) 4 data len: 50
# the following output when I receive a packet.  The other server in the
node had a name length of 22, hense data length of 49 instead of 50.
Received a packet of length: 49 with offset: 0
Data.length: 49
Alive: 2649783
Port: 4002
Address: 192.168.1.5
Name Length: 22
Name: tcp://192.168.1.5:4002  #notice packet from other server has 22
for length and is receive successfully.
dlen: 3
# notice that this packet was receive successfully (It's from the other
server ) no exception occurs.
Received a packet of length: 49 with offset: 0  #here is the problem.
Data.length: 49
Alive: 7794
Port: 4002
Address: 192.168.1.27
Name Length: 23
Name: tcp://192.168.1.27:4002 # this packet has a buffer of size 49 but
it has length 23 for name and was sent with a 50 byte buffer
dlen: 3
# exception thrown and packet not received:
Jul 28, 2005 10:07:46 AM
org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread run
== cut exception ==

So I got rid of the exception by allocating a new DatagramPacket
(receivePacket) in the receive thread (McastServiceImpl.java approx line
206).  That way, the buffer had the correct number of bytes and no
exception was thrown.

The reason it worked for you is because both of your servers probably
had the same length in their ip address.  So the bug occurs when people
have a cluster with ip addresses that are not the same length.

I'll file a bug and possibly a patch if I come to an confident enough
understanding of the architecture.

-Dennis

>
> Your post is OT on this mailing list, but I thought I would play a
> little with the clustering.
>
> This works for me, with this config:
>         <Cluster
> className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
>
> managerClassName="org.apache.catalina.cluster.session.DeltaManager"
>                  expireSessionsOnShutdown="false"
>                  useDirtyFlag="true"
>                  notifyListenersOnReplication="true">
>
>             <Membership
>                
> className="org.apache.catalina.cluster.mcast.McastService"
>                 mcastAddr="228.0.0.4"
>                 mcastPort="45564"
>                 mcastFrequency="500"
>                 mcastClusterDomain="dev"
>                 mcastDropTime="3000"/>
>
>             <Receiver
>
> className="org.apache.catalina.cluster.tcp.ReplicationListener"
>                 tcpListenAddress="auto"
>                 tcpListenPort="4002"
>                 tcpSelectorTimeout="100"
>                 tcpThreadCount="6"/>
>
>             <Sender
>
> className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
>                 replicationMode="pooled"
>                 ackTimeout="15000"/>
>
>             <Valve
> className="org.apache.catalina.cluster.tcp.ReplicationValve"
>
> filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
>
>
>             <ClusterListener
> className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
>
>         </Cluster>
>
> The domain feature for membership is new in this release (which caused
> the data packets sent to change).
>
> I recommend trying tomcat-user, or filing a bug if you can give
> working instructions on how to reproduce the problem.
>
> Rémy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org

Re: 5.5.10 cluster exception

Posted by Dennis <de...@muhlesteins.com>.

Rainer Jung wrote:

>Hi Dennis,
>
>i tried with
>
>15/conf/server.xml:                tcpListenAddress="192.168.0.180"
>25/conf/server.xml:                tcpListenAddress="192.168.0.30"
>
>and otherwise the same cluster config which you posted on tomcat-dev. No
>problem with membership and multicast, no exception.
>
>So there must be something more than just differently sized IP address
>strings.
>  
>
Well, I forgot to mention something important. I'm using the BEA JRockit jvm

and I'm betting they've done some kind of internal optimization at the java.net
class level.  I'll try again with the sun jvm and see if that solves the issue.

I had done some print statements to see what the byte[] contained.  It always
said it was 49 bytes long but there were 50 bytes worth of data sent (and received).

>Regards,
>
>Rainer
>
>  
>
>>Well, I'll move this discussion over to the user thread then..
>>See below.
>>
>>Remy Maucherat wrote:
>>
>>    
>>
>>>Dennis wrote:
>>>
>>>      
>>>
>>>>I'm working with the cvs tagged 5.5.10 version of tomcat to check out
>>>>some clustering fixes.
>>>>
>>>>When I bring a 2nd server into the pool, I get this exception repeated
>>>>every time an mcast packet is received:
>>>>
>>>>==CUT==
>>>>java.lang.ArrayIndexOutOfBoundsException
>>>>        at
>>>>java.lang.System.arraycopy(Ljava.lang.Object;ILjava.lang.Object;II)V(Unknown
>>>>
>>>>Source)
>>>>        at
>>>>org.apache.catalina.cluster.mcast.McastMember.getMember(McastMember.java:181)
>>>>
>>>>        at
>>>>org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:209)
>>>>
>>>>        at
>>>>org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:253)
>>>>
>>>>==END CUT==
>>>>
>>>>Here are the relevant lines of code from McastMember.java:
>>>>
>>>>==CUT==
>>>>byte[] domaind = new byte[dlen];
>>>>System.arraycopy(data, nlen + 24, domaind, 0, domaind.length);
>>>>==END CUT==
>>>>
>>>>I added some debugging to figure out the length of the data.  The
>>>>exception occurs because data.length is 23.  Obviously 23+24 is going
>>>>to
>>>>throw an ArrayIndexOBE.
>>>>
>>>>My question is.. is there a configuration thing that is causing data to
>>>>be sent to be less than the desired length?  It appears the data is not
>>>>coming in in the format expected.
>>>>
>>>>Thoughts?
>>>>        
>>>>
>>After more digging and logging, I've fixed the exception.  Clustering
>>does work for some people and not for others. The problem is that when
>>receiving a DatagramPacket the byte buffer is set at a constant size.  I
>>noticed there is one receivePacket in McastServiceImpl.  The packet
>>get's re-used.  Here is some output from my logging with annotation:
>>
>># I added the following line when a node sends a packet.  Notice that
>>the data length is 50.  This is because name length (ip address has 23
>>characters and there are 27 other bytes.
>>name: tcp://192.168.1.27:4002 domain: dev addr:(len) 4 data len: 50
>># the following output when I receive a packet.  The other server in the
>>node had a name length of 22, hense data length of 49 instead of 50.
>>Received a packet of length: 49 with offset: 0
>>Data.length: 49
>>Alive: 2649783
>>Port: 4002
>>Address: 192.168.1.5
>>Name Length: 22
>>Name: tcp://192.168.1.5:4002  #notice packet from other server has 22
>>for length and is receive successfully.
>>dlen: 3
>># notice that this packet was receive successfully (It's from the other
>>server ) no exception occurs.
>>Received a packet of length: 49 with offset: 0  #here is the problem.
>>Data.length: 49
>>Alive: 7794
>>Port: 4002
>>Address: 192.168.1.27
>>Name Length: 23
>>Name: tcp://192.168.1.27:4002 # this packet has a buffer of size 49 but
>>it has length 23 for name and was sent with a 50 byte buffer
>>dlen: 3
>># exception thrown and packet not received:
>>Jul 28, 2005 10:07:46 AM
>>org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread run
>>== cut exception ==
>>
>>So I got rid of the exception by allocating a new DatagramPacket
>>(receivePacket) in the receive thread (McastServiceImpl.java approx line
>>206).  That way, the buffer had the correct number of bytes and no
>>exception was thrown.
>>
>>The reason it worked for you is because both of your servers probably
>>had the same length in their ip address.  So the bug occurs when people
>>have a cluster with ip addresses that are not the same length.
>>
>>I'll file a bug and possibly a patch if I come to an confident enough
>>understanding of the architecture.
>>
>>-Dennis
>>
>>    
>>
>>>Your post is OT on this mailing list, but I thought I would play a
>>>little with the clustering.
>>>
>>>This works for me, with this config:
>>>        <Cluster
>>>className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
>>>
>>>managerClassName="org.apache.catalina.cluster.session.DeltaManager"
>>>                 expireSessionsOnShutdown="false"
>>>                 useDirtyFlag="true"
>>>                 notifyListenersOnReplication="true">
>>>
>>>            <Membership
>>>
>>>className="org.apache.catalina.cluster.mcast.McastService"
>>>                mcastAddr="228.0.0.4"
>>>                mcastPort="45564"
>>>                mcastFrequency="500"
>>>                mcastClusterDomain="dev"
>>>                mcastDropTime="3000"/>
>>>
>>>            <Receiver
>>>
>>>className="org.apache.catalina.cluster.tcp.ReplicationListener"
>>>                tcpListenAddress="auto"
>>>                tcpListenPort="4002"
>>>                tcpSelectorTimeout="100"
>>>                tcpThreadCount="6"/>
>>>
>>>            <Sender
>>>
>>>className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
>>>                replicationMode="pooled"
>>>                ackTimeout="15000"/>
>>>
>>>            <Valve
>>>className="org.apache.catalina.cluster.tcp.ReplicationValve"
>>>
>>>filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
>>>
>>>
>>>            <ClusterListener
>>>className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
>>>
>>>        </Cluster>
>>>
>>>The domain feature for membership is new in this release (which caused
>>>the data packets sent to change).
>>>
>>>I recommend trying tomcat-user, or filing a bug if you can give
>>>working instructions on how to reproduce the problem.
>>>
>>>Rémy
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>>
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org

Re: 5.5.10 cluster exception

Posted by Rainer Jung <to...@kippdata.de>.

Hi Dennis,

i tried with

15/conf/server.xml:                tcpListenAddress="192.168.0.180"
25/conf/server.xml:                tcpListenAddress="192.168.0.30"

and otherwise the same cluster config which you posted on tomcat-dev. No
problem with membership and multicast, no exception.

So there must be something more than just differently sized IP address
strings.

Regards,

Rainer

> Well, I'll move this discussion over to the user thread then..
> See below.
>
> Remy Maucherat wrote:
>
>> Dennis wrote:
>>
>>> I'm working with the cvs tagged 5.5.10 version of tomcat to check out
>>> some clustering fixes.
>>>
>>> When I bring a 2nd server into the pool, I get this exception repeated
>>> every time an mcast packet is received:
>>>
>>> ==CUT==
>>> java.lang.ArrayIndexOutOfBoundsException
>>>         at
>>> java.lang.System.arraycopy(Ljava.lang.Object;ILjava.lang.Object;II)V(Unknown
>>>
>>> Source)
>>>         at
>>> org.apache.catalina.cluster.mcast.McastMember.getMember(McastMember.java:181)
>>>
>>>         at
>>> org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:209)
>>>
>>>         at
>>> org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:253)
>>>
>>> ==END CUT==
>>>
>>> Here are the relevant lines of code from McastMember.java:
>>>
>>> ==CUT==
>>> byte[] domaind = new byte[dlen];
>>> System.arraycopy(data, nlen + 24, domaind, 0, domaind.length);
>>> ==END CUT==
>>>
>>> I added some debugging to figure out the length of the data.  The
>>> exception occurs because data.length is 23.  Obviously 23+24 is going
>>> to
>>> throw an ArrayIndexOBE.
>>>
>>> My question is.. is there a configuration thing that is causing data to
>>> be sent to be less than the desired length?  It appears the data is not
>>> coming in in the format expected.
>>>
>>> Thoughts?
>>
> After more digging and logging, I've fixed the exception.  Clustering
> does work for some people and not for others. The problem is that when
> receiving a DatagramPacket the byte buffer is set at a constant size.  I
> noticed there is one receivePacket in McastServiceImpl.  The packet
> get's re-used.  Here is some output from my logging with annotation:
>
> # I added the following line when a node sends a packet.  Notice that
> the data length is 50.  This is because name length (ip address has 23
> characters and there are 27 other bytes.
> name: tcp://192.168.1.27:4002 domain: dev addr:(len) 4 data len: 50
> # the following output when I receive a packet.  The other server in the
> node had a name length of 22, hense data length of 49 instead of 50.
> Received a packet of length: 49 with offset: 0
> Data.length: 49
> Alive: 2649783
> Port: 4002
> Address: 192.168.1.5
> Name Length: 22
> Name: tcp://192.168.1.5:4002  #notice packet from other server has 22
> for length and is receive successfully.
> dlen: 3
> # notice that this packet was receive successfully (It's from the other
> server ) no exception occurs.
> Received a packet of length: 49 with offset: 0  #here is the problem.
> Data.length: 49
> Alive: 7794
> Port: 4002
> Address: 192.168.1.27
> Name Length: 23
> Name: tcp://192.168.1.27:4002 # this packet has a buffer of size 49 but
> it has length 23 for name and was sent with a 50 byte buffer
> dlen: 3
> # exception thrown and packet not received:
> Jul 28, 2005 10:07:46 AM
> org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread run
> == cut exception ==
>
> So I got rid of the exception by allocating a new DatagramPacket
> (receivePacket) in the receive thread (McastServiceImpl.java approx line
> 206).  That way, the buffer had the correct number of bytes and no
> exception was thrown.
>
> The reason it worked for you is because both of your servers probably
> had the same length in their ip address.  So the bug occurs when people
> have a cluster with ip addresses that are not the same length.
>
> I'll file a bug and possibly a patch if I come to an confident enough
> understanding of the architecture.
>
> -Dennis
>
>>
>> Your post is OT on this mailing list, but I thought I would play a
>> little with the clustering.
>>
>> This works for me, with this config:
>>         <Cluster
>> className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
>>
>> managerClassName="org.apache.catalina.cluster.session.DeltaManager"
>>                  expireSessionsOnShutdown="false"
>>                  useDirtyFlag="true"
>>                  notifyListenersOnReplication="true">
>>
>>             <Membership
>>
>> className="org.apache.catalina.cluster.mcast.McastService"
>>                 mcastAddr="228.0.0.4"
>>                 mcastPort="45564"
>>                 mcastFrequency="500"
>>                 mcastClusterDomain="dev"
>>                 mcastDropTime="3000"/>
>>
>>             <Receiver
>>
>> className="org.apache.catalina.cluster.tcp.ReplicationListener"
>>                 tcpListenAddress="auto"
>>                 tcpListenPort="4002"
>>                 tcpSelectorTimeout="100"
>>                 tcpThreadCount="6"/>
>>
>>             <Sender
>>
>> className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
>>                 replicationMode="pooled"
>>                 ackTimeout="15000"/>
>>
>>             <Valve
>> className="org.apache.catalina.cluster.tcp.ReplicationValve"
>>
>> filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
>>
>>
>>             <ClusterListener
>> className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
>>
>>         </Cluster>
>>
>> The domain feature for membership is new in this release (which caused
>> the data packets sent to change).
>>
>> I recommend trying tomcat-user, or filing a bug if you can give
>> working instructions on how to reproduce the problem.
>>
>> Rémy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org