You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "Roger Fischer (CW)" <rf...@Brocade.com> on 2017/07/25 22:46:06 UTC

Missing object when loading from Cassandra with multiple queries

Hello,

I am using Ignite with Cassandra, loading data from Cassandra on-demand using multiple query statements.  But only a (seemingly random) subset of the rows/objects are loaded into ignite.

When I load using a single query, all rows/objects are loaded correctly.

In another environment, the same data, config and code loads correctly with multiple queries. The main difference is that this environment uses older, slower computers.

When I repeat the same load request multiple times, each repetition adds a few more rows/objects, until eventually all the rows/objects are loaded into the cache.

The environment where it works (all matching rows are loaded into the cache) is a cluster of 3 old desktop machines with 2 cores each. The same 3 nodes also run Cassandra.

The environment where only a part of the rows are loaded is a cluster of 3 modern serves (VMs) with 8 cores. The same 3 nodes also run Cassandra.

The only theory I have at this time is that with more cores, more queries / inserts are executed in parallel and something goes wrong with that higher level of parallelism.

I am calling loadCache( null, String[]);. The string array has 7 queries, one for each partition in Cassandra.

I have verified the queries; they return the correct rows when executed in cqlsh.

There are no error, warning or info logs during the load, neither from the client nor from the 3 servers.

I turned on additional logging in the environment that has the problem. Because I am loading 15K rows, there are thousands of logs, and an analysis is difficult. However, the following logs happen to stand out:


[19:07:45,946][FINE][pool-12-thread-4][GridQueryProcessor] Store [space=FcPortStatsCache, key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=1690890517, hash=-725839099, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=3, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c], val=com.abc.poc.icpoc.model.FcPortStats [idHash=117595137, hash=-1000695178, portId=007cb4ec-4dfd-47e5-8ae7-4781da08ac7c, dateTime=Fri Jul 07 15:09:00 UTC 2017, portType=1, switchId=339c5bcd-b503-4d54-948e-7ae6c40f31c2, rxUtil=78.479034, txUtil=91.488396, higherUtil=91.488396, lowerUtil=78.479034, rxRate=18411.0, txRate=15424.0, higherRate=18411.0, lowerRate=15424.0, c3Discards=0.0, crcErrors=0.0]]

There are 15,414 of these logs. Only 5627 objects were loaded into the cache. 15,000 rows match the queries.


[19:07:45,944][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Remove will not be done for key (entry got replaced or removed): com.abc.poc.icpoc.model.FcPortStatsKey [idHash=326667456, hash=627195288, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=00159ca0-a6c9-47e5-a1f6-f3fe12941ba1]

There are exactly the same number of these (15,414).


[19:07:45,945][FINE][pool-12-thread-6][GridDhtAtomicCache] <FcPortStatsCache> Ignoring entry for partition that does not belong [key=com.abc.poc.icpoc.model.FcPortStatsKey [idHash=235959795, hash=1960141096, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=5, dateTime=Fri Jul 07 15:09:00 UTC 2017, portId=001648ce-5410-472b-ac32-5b8056b30674], val=FcPortStats: 001648ce-5410-472b-ac32-5b8056b30674, Fri Jul 07 15:09:00 UTC 2017, 1; 53498d64-e53f-4136-a0db-7be9e200cf84 ..., err=class org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException [part=509, msg=Creating partition which does not belong to local node (often may be caused by inconsistent 'key.hashCode()' implementation) [part=509, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], this.topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0]]]]

There are 5685 of these logs, way less than the number of missing objects.
The message suggest that the hash code for the key may not be good enough. But would that not also apply in the other (slow, but good) environment, or when loading a single query?


ver. 2.0.0#20170430-sha1:d4eef3c6


Any suggestions?

Thanks...

Roger


RE: Missing object when loading from Cassandra with multiple queries

Posted by "Roger Fischer (CW)" <rf...@Brocade.com>.
Hello,

it turns out that the problem (missing objects) was caused by cluster configuration issues. Both the Ignite and Cassandra cluster had one node (not the same VM) configured with a different time zone. Since date-time was part of the primary key and query parameters, there were inconsistencies in how the date-time was converted. Because of the combinations of where the queries where sent and the order in which the results arrived, the outcome appeared to be random.

Caveat: Make sure the clusters are consistent, particularly when date-times are involved.

Roger


-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com] 
Sent: Friday, July 28, 2017 11:46 AM
To: user@ignite.apache.org
Subject: RE: Missing object when loading from Cassandra with multiple queries

Roger,

Yeah, that's weird and hard to tell why this happens. Obviously there is some kind of race condition. In any case, I would do everything correctly first, and then check if issue persists or not.

-Val



--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Missing-2Dobject-2Dwhen-2Dloading-2Dfrom-2DCassandra-2Dwith-2Dmultiple-2Dqueries-2Dtp15644p15789.html&d=DwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=dl-eqK7P7J3D54eeGL5rjJ10bzaDUx2XL2vaGmM_uIM&s=i9qrKfjX2-aAJAu5beN73PXgRKyHFY6wZpJf5Wlf-zg&e=
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by "Roger Fischer (CW)" <rf...@Brocade.com>.
Hi Val,

looking at the debug logs, I see that the query is executed on only one Ignite server. It seems that this happens when the where clause includes a partitioning column.
Query " select * from test_11.fc_port_stats; " loads all objects.
Query " select * from test_11.fc_port_stats where sid = 1d6dad9c-6e8d-4425-b3d0-a5e50ee0b845 allow filtering; " also loads all objects. SID has an index in Cassandra.
Query " select * from test_11.fc_port_stats where hour = '2017-07-07 15:00:00' and bucket = 0 and date_time >= '2017-07-07 15:05:00' and date_time < '2017-07-07 15:10:00'; " only returns about 1/3 of the objects it should. 

Server 2 log has:

[17:41:39,532][FINE][mgmt-#77%null%][GridCacheProcessor] Getting cache for name: FcPortStatsCache
[17:41:39,532][FINE][mgmt-#77%null%][CacheOsStoreManager] <FcPortStatsCache> Loading all values from store.
[17:41:39,536][FINE][mgmt-#77%null%][CassandraCacheStore] Cache loaded from db: FcPortStatsCache
[17:41:39,536][FINE][mgmt-#77%null%][CacheOsStoreManager] <FcPortStatsCache> Loaded all values from store.
[17:41:39,537][FINE][mgmt-#77%null%][GridJobWorker] Job execution has successfully finished [job=C2 [c=LoadCacheJobV2 [keepBinary=false]], res=null]

There are no logs between "Loading ..." and "Loaded ...".

The same for server 3:

[23:11:39,533][FINE][mgmt-#76%null%][GridCacheProcessor] Getting cache for name: FcPortStatsCache
[23:11:39,533][FINE][mgmt-#76%null%][CacheOsStoreManager] <FcPortStatsCache> Loading all values from store.
[23:11:39,536][FINE][mgmt-#76%null%][CassandraCacheStore] Cache loaded from db: FcPortStatsCache
[23:11:39,537][FINE][mgmt-#76%null%][CacheOsStoreManager] <FcPortStatsCache> Loaded all values from store.
[23:11:39,537][FINE][mgmt-#76%null%][GridJobWorker] Job execution has successfully finished [job=C2 [c=LoadCacheJobV2 [keepBinary=false]], res=null]

Only server 1 shows actual loading:

[17:41:39,539][FINE][mgmt-#78%null%][CacheOsStoreManager] <FcPortStatsCache> Loading all values from store.
...
[17:41:39,648][FINE][pool-13-thread-1][GridQueryProcessor] Store [space=FcPortStatsCache, key=com.brocade.poc.icpoc.model.FcPortStatsKey [idHash=1898688509, hash=-1866764483, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=0, dateTime=Fri Jul 07 15:19:00 UTC 2017, portId=002055f9-d8e6-4255-b3b4-b1b6cd6453f4], val=com.brocade.poc.icpoc.model.FcPortStats [idHash=1964258560, hash=-842901133, portId=002055f9-d8e6-4255-b3b4-b1b6cd6453f4, dateTime=Fri Jul 07 15:19:00 UTC 2017, portType=1, switchId=683277c5-da36-47cb-94e8-8e29a28bbc9c, rxUtil=79.01061, txUtil=70.75167, higherUtil=79.01061, lowerUtil=70.75167, rxRate=15127.0, txRate=19382.0, higherRate=19382.0, lowerRate=15127.0, c3Discards=0.0, crcErrors=0.0]]
...
[17:41:39,649][FINE][pool-13-thread-1][GridDhtAtomicCache] <FcPortStatsCache> Remove will not be done for key (entry got replaced or removed): com.brocade.poc.icpoc.model.FcPortStatsKey [idHash=1898688509, hash=-1866764483, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=0, dateTime=Fri Jul 07 15:19:00 UTC 2017, portId=002055f9-d8e6-4255-b3b4-b1b6cd6453f4]
...
[17:41:39,649][FINE][pool-13-thread-1][GridDhtAtomicCache] <FcPortStatsCache> Ignoring entry for partition that does not belong [key=com.brocade.poc.icpoc.model.FcPortStatsKey [idHash=806790598, hash=984017435, hour=Fri Jul 07 15:00:00 UTC 2017, bucket=0, dateTime=Fri Jul 07 15:19:00 UTC 2017, portId=003d1a6f-d1c4-414c-8fc8-d0b54e127003], val=FcPortStats: 003d1a6f-d1c4-414c-8fc8-d0b54e127003, Fri Jul 07 15:19:00 UTC 2017, 2; bbd80298-fb88-4a1b-99b0-5e9974ed5f9d ..., err=class org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtInvalidPartitionException [part=189, msg=Creating partition which does not belong to local node (often may be caused by inconsistent 'key.hashCode()' implementation) [part=189, topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], this.topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1]]]]
...
[17:41:40,712][FINE][mgmt-#78%null%][CacheOsStoreManager] <FcPortStatsCache> Loaded all values from store.

At this time I don't know if this is related to the difference in the primary key between Cassandra and Ignite. I will try that next.

Roger



-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com] 
Sent: Friday, July 28, 2017 11:46 AM
To: user@ignite.apache.org
Subject: RE: Missing object when loading from Cassandra with multiple queries

Roger,

Yeah, that's weird and hard to tell why this happens. Obviously there is some kind of race condition. In any case, I would do everything correctly first, and then check if issue persists or not.

-Val



--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Missing-2Dobject-2Dwhen-2Dloading-2Dfrom-2DCassandra-2Dwith-2Dmultiple-2Dqueries-2Dtp15644p15789.html&d=DwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=dl-eqK7P7J3D54eeGL5rjJ10bzaDUx2XL2vaGmM_uIM&s=i9qrKfjX2-aAJAu5beN73PXgRKyHFY6wZpJf5Wlf-zg&e=
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by vkulichenko <va...@gmail.com>.
Roger,

Yeah, that's weird and hard to tell why this happens. Obviously there is
some kind of race condition. In any case, I would do everything correctly
first, and then check if issue persists or not.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Missing-object-when-loading-from-Cassandra-with-multiple-queries-tp15644p15789.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by "Roger Fischer (CW)" <rf...@Brocade.com>.
Thanks, Val. 

I got confused between the Cassandra and Ignite keys. The configuration (keyPersistence) references the POJO and the Cassandra partition and clustering key. Thus I inadvertently also added the partitioning key elements to the POJO.

However, this still does not explain my problem. It works on the slow (older HW) environment. And it works when using a single CQL statement in the loadCache() method.

Roger


-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com] 
Sent: Thursday, July 27, 2017 5:55 PM
To: user@ignite.apache.org
Subject: RE: Missing object when loading from Cassandra with multiple queries

Roger,

If they are not needed, they should be part of value object rather than key object. When you create a key object to do a cache lookup or other operation, you should provide exact fields for all values included in it.

-Val



--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Missing-2Dobject-2Dwhen-2Dloading-2Dfrom-2DCassandra-2Dwith-2Dmultiple-2Dqueries-2Dtp15644p15754.html&d=DwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=0J2t9uNi5M6GNstK5XecfTzyxd5DeUHcnoYjqRnJ6l0&s=nZ4YLwgKbjIsRf3u5l-GaO3paGZC9hTihFyjxcm__RY&e=
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by vkulichenko <va...@gmail.com>.
Roger,

If they are not needed, they should be part of value object rather than key
object. When you create a key object to do a cache lookup or other
operation, you should provide exact fields for all values included in it.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Missing-object-when-loading-from-Cassandra-with-multiple-queries-tp15644p15754.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by "Roger Fischer (CW)" <rf...@Brocade.com>.
Thanks, Val.

The two fields are sufficient as the primary key. The "hour" field is the truncated dateTime, and "bucket" is a grouping of the PortIds.

Roger

-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com] 
Sent: Thursday, July 27, 2017 2:56 PM
To: user@ignite.apache.org
Subject: RE: Missing object when loading from Cassandra with multiple queries

Roger,

Since Ignite stores data in binary format, ыщ it basically ignores your hashCode and equals implementations. However, I see that in these methods you use only two fields out of four which makes me think that you can create two keys with same portId and dateTime, but different hour and bucket, and expect them to be equal. This definitely can cause issue like you described, you should identify key by all fields included in it.

Can this be the case?

-Val



--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Missing-2Dobject-2Dwhen-2Dloading-2Dfrom-2DCassandra-2Dwith-2Dmultiple-2Dqueries-2Dtp15644p15748.html&d=DwIFaQ&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=S46CnMWTAP0BmHV0GtebdoVXMlXsKw6iO653LG_TS2g&s=68xajupHZjoD9vAWkccGM58rBcNAFuc48S-eZfDuYYE&e=
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by vkulichenko <va...@gmail.com>.
Roger,

Since Ignite stores data in binary format, ыщ it basically ignores your
hashCode and equals implementations. However, I see that in these methods
you use only two fields out of four which makes me think that you can create
two keys with same portId and dateTime, but different hour and bucket, and
expect them to be equal. This definitely can cause issue like you described,
you should identify key by all fields included in it.

Can this be the case?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Missing-object-when-loading-from-Cassandra-with-multiple-queries-tp15644p15748.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

RE: Missing object when loading from Cassandra with multiple queries

Posted by "Roger Fischer (CW)" <rf...@Brocade.com>.
Hi Val,

the class is listed below. Two more updates:

* The "Ignoring entry for partition ..." is also present on the slow cluster, where the data is loaded correctly.

* The clocks are significantly out of sync on the fast cluster. One is off by more than one minute. I am having this rectified, but it is not my cluster, so it is taking some time.

Here is the key class.

import java.io.Serializable;
import java.util.UUID;
import java.util.Date;


public class FcPortStatsKey implements Serializable {

    private Date    hour;       // partition 1 -- timestamp truncated at the hour
    private Integer bucket;     // partition 2 -- switch_id hashed to buckets
    private Date    dateTime;   // cluster 1   -- date_time
    private UUID    portId;     // cluster 2   -- port_id

    // default constructor
    public FcPortStatsKey() {
        super();
    }
    
    public Date getHour() { return hour; }
    public Integer getBucket() { return bucket; }
    public Date getDateTime() { return dateTime; }
    public UUID getPortId() { return portId; }

    public void setHour( Date hour) { this.hour = hour; }
    public void setBucket( Integer bucket) { this.bucket = bucket; }
    public void setDateTime( Date dateTime) { this.dateTime = dateTime; }
    public void setPortId( UUID portId) { this.portId = portId; }

    @Override
    public String toString() {
        return String.format( "FcPortStatsKey: %s, %d, %s, %s", 
            hour.toString(), bucket, dateTime.toString(), portId.toString());
    }

    @Override
    public boolean equals( Object obj) {
        if (this == obj) return true;
        if (!(obj instanceof FcPortStatsKey)) return false;
        FcPortStatsKey other = (FcPortStatsKey) obj;
        // partition keys are derived; need not be included
        if (dateTime != null && dateTime.equals( other.dateTime) &&
            portId != null && portId.equals( other.portId))
            return true;
        else
            return false;
    }
            
    @Override
    public int hashCode() {
        // partition keys are derived; need not be included
        int result = 91;
        result = 127 * result + portId.hashCode();
        result = 127 * result + dateTime.hashCode();
        return result;
    }

}

And here is the related configuration:

                <persistence keyspace="test_11" table="fc_port_stats">
                    <keyPersistence class="com.brocade.poc.icpoc.model.FcPortStatsKey" strategy="POJO">
                        <partitionKey>
                            <field name="hour"/>
                            <field name="bucket"/>
                        </partitionKey>
                        <clusterKey>
                            <field name="dateTime" column="date_time"/>
                            <field name="portId" column="port_id"/>
                        </clusterKey>
                    </keyPersistence>

Roger

-----Original Message-----
From: vkulichenko [mailto:valentin.kulichenko@gmail.com] 
Sent: Tuesday, July 25, 2017 5:16 PM
To: user@ignite.apache.org
Subject: Re: Missing object when loading from Cassandra with multiple queries

Hi Roger,

Can you show the FcPortStatsKey class?

-Val



--
View this message in context: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dignite-2Dusers.70518.x6.nabble.com_Missing-2Dobject-2Dwhen-2Dloading-2Dfrom-2DCassandra-2Dwith-2Dmultiple-2Dqueries-2Dtp15644p15645.html&d=DwICAg&c=IL_XqQWOjubgfqINi2jTzg&r=1esZO0r0bYS90lcsaLA6N4AFxuNo6lzauhETGwdJQoQ&m=s1zSUsG_hFKhGsE9HpBCQWzuPCJa-iX4tmT8kmN7-d0&s=u8-EePeYS6JZ8u9cp9pMGJ8WGAoSvDZFBy93_jKNrIM&e= 
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Missing object when loading from Cassandra with multiple queries

Posted by vkulichenko <va...@gmail.com>.
Hi Roger,

Can you show the FcPortStatsKey class?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Missing-object-when-loading-from-Cassandra-with-multiple-queries-tp15644p15645.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.