You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "steve.hostettler" <st...@gmail.com> on 2016/12/20 16:03:14 UTC

LoadCache Performance decrease with the size of the cache

Hello,

I am trying to increase the performance of the cache loading. I am
witnessing a strange behavior: as the number of objects increase in the
cache the number of objects loaded per seconds decrease. The database server
seems not to be the problem. To get some numbers I copy pasted the
CacheAbstractJdbcStore implementation and added a couple of logs to
understand what is going on.

In the method call : public Void call() throws Exception there is a block 

while (rs.next()) {  
    K1 key = buildObject(em.cacheName, em.keyType(), em.keyKind(),
em.keyColumns(), em.keyCols, colIdxs,   rs);
   V1 val = buildObject(em.cacheName, em.valueType(), em.valueKind(),
em.valueColumns(), null, colIdxs, rs);
   clo.apply(key, val);
}

Apparently the performance of the statement clo.apply(key, val) decreases
over time.

I first thought of a problem with the hashcode method that generates
collision but I made sure that I use a unique row id and that equals and
hashcode are based on it.

Any advice that would help me to understand where the problem comes from?

many thanks in advance




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decrease-with-the-size-of-the-cache-tp9645.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
I would recommend to use one of the non-empty constructors for QueryIndex,
they all set SORTED as default. Frankly, I would remove the one that is
without parameters, it doesn't make much sense and error-prone.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9856.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hello Val,

you're right I was too quick to jump to a conclusion. Actually, the problem
comes from my code but it was not obvious to me.

I create the indexes with the following code:
---------------------------------
		Collection<QueryIndex> idxs = new ArrayList<>();

		QueryIndex idx1 = new QueryIndex();
		LinkedHashMap<String, Boolean> idxFlds1 = new LinkedHashMap<>();
		idxFlds1.put("lotTypeFk", true);
		idxFlds1.put("validOn", true);
		idxFlds1.put("ideCounterpartyRef", true);
		idx1.setFields(idxFlds1);
		idxs.add(idx1);
		
		QueryIndex idx2 = new QueryIndex("rowId");
		idxs.add(idx2);		

		qryEntity.setIndexes(idxs);
---------------------------------

Because I did not specify the type, I thought the index type was SORTED.
Actually, when nothing has been specified the type == null.

The thing is that null is considered to be  FULLTEXT.

From GridQueryProcessor.java
 if (idx.getIndexType() == QueryIndexType.SORTED || idx.getIndexType() ==
QueryIndexType.GEOSPATIAL) {
....
} else {
assert idx.getIndexType() == QueryIndexType.FULLTEXT;

                    for (String field : idx.getFields().keySet()) {
                        String alias = aliases.get(field);

                        if (alias != null)
                            field = alias;

                        d.addFieldToTextIndex(field);
                    }
}


I solved that problem simply by setting the type explicitly 
idx1.setIndexType(QueryIndexType.SORTED);


I will do a full load using this initialization code. If the problem
persists, I'll do a reproducer. The problem is that it might be a little bit
difficult. 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9765.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
The code snippet you provided is actually only for the case when String is
the whole value, not wrapped in some other object. But this is not your
case, right? I think we're missing something here, reproducer would really
help :)

rebuildIndexes() method is on the private API and I don't see any usages, so
I don't think it works properly. Looks like some legacy code.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9763.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hello,

while investigating, I understood why I do have a lot of locks on Lucene
Documents in java mission control. That is because as soon as there is
String in the index, this is handled by Lucene even if you do not want full
text search (The string being a identifier).


--- From IgniteH2Indexing.java
            if (type().valueClass() == String.class) {
                try {
                    luceneIdx = new GridLuceneIndex(ctx, schema.offheap,
schema.spaceName, type);
                }
                catch (IgniteCheckedException e1) {
                    throw new IgniteException(e1);
                }
            }


Altought I do understand why this has been done that way, I wonder whether
it would not be better to let the user choose it. Lucene brings a lot of
features but also has an impact on the performances. In my case, I know that
the index on the string will never ever been searched as a substring.

The Lucene Index management seems to heavily rely on locks and therefore the
more threads the more contention on the index.

Is this a known behavior? Am I missing something.
Furthermore, there seems to be a "rebuildIndexes" method.  I am not sure
what this method do but if it efefctively rebuilds the index then I could do
a measure without indexes (t1) and then a measure with indexes (t2). After
that I rebuild the indexes (t2). Hence, if t1 + t3 < t2 then it means that
disabling the indexes during load make sens from a performance perspective.
Am I correct?






--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9738.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hi Val,

first of all, let me thank you for this great product and incredible mailing
list.

Anonymizing our code will take a bit longer but I can give the following
numbers:

load time 10M records without indexes 10m2s
load time 10M records with a simple row id type index : 12m
load time 10M records with a simple row id type index + a composed index (3
columns : Long, Long, String) :  40 minutes

The VM's heap is 56Gb, At the end of the processing, it consumes .. GB with
a peak of 32Gb. Maximum GC pause is 2s with a pause every 2 minutes. So GC
does not sound to be an issue.

As for the performance I took 3 snapshots of 5 minutes each.
After 2 min : 
<http://apache-ignite-users.70518.x6.nabble.com/file/n9725/Capture1.png> 


After 15 min
<http://apache-ignite-users.70518.x6.nabble.com/file/n9725/Capture2.png> 

After 35min
<http://apache-ignite-users.70518.x6.nabble.com/file/n9725/Capture3.png> 


I was also a bit surprised of the use of Lucene internally to H2.
<http://apache-ignite-users.70518.x6.nabble.com/file/n9725/Capture4.png> 


Hope you can make sense of all of this.

Thanks and happy holidays



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9725.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
Hi Steve,

Currently this is not possible. If what you're saying is true, then in
probably make sense to add such option, but I want to reproduce it myself
first. Can you at least provide timings for different dataset sizes with and
without indexes? A reproducer would be even better. Also are you sure you
don't have memory issues when loading 20 millions? Did you check heap
consumption and GC logs?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9723.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hello guys,

I finally understood what the "problem" is. With 20 millions of records, the
indexing time is very expensive. Is there a way, like for a database, to
disable indexing during load time and to re-enable it after loading?
I guess that rebalancing the b-trees (or whatever is used to implement the
indexes) is a costly operation to perform while the cache is loading.

To my understanding, indexes are declared in the cache config and directly
enabled.




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9719.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hello Val,

Yes I'll try to upload something on github tomorrow. Thanks for the help



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9710.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
Hi,

I tried to reproduce this behavior, but without success. Data loading time
increased linearly for me when I increased number of entries. Is it possible
for you to provide a reproducer that I would be able to run and investigate?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9709.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Sure here is the code		

QueryEntity qryEntity = new QueryEntity();
		qryEntity.setKeyType("myapp.bpepoc.model.MyModelKey");
		qryEntity.setValueType("myapp.bpepoc.model.MyModel");

		LinkedHashMap<String, String> fields = new LinkedHashMap<>();
		fields.put("rowId", "java.lang.Integer");
		fields.put("validOn", "java.lang.Integer");
		fields.put("myFk", "java.lang.Integer");
		fields.put("rowType", "java.lang.Short");
		fields.put("businessRef", "java.lang.String");
		fields.put("status", "java.lang.Short");
		fields.put("modifDate", "java.sql.Date");
		fields.put("ideCreditGroupRef", "java.lang.String");
		fields.put("ideSegmentationRef", "java.lang.String");
		fields.put("ideInternalPartyRef", "java.lang.String");
		fields.put("ideInternalOne", "java.lang.String");

		qryEntity.setFields(fields);

		Collection<QueryIndex> idxs = new ArrayList<>();
		QueryIndex idx = new QueryIndex();
		idx.setName("FPK");
		LinkedHashMap<String, Boolean> idxFlds = new LinkedHashMap<>();
		idxFlds.put("rowId", true);
		idxFlds.put("myFk", true);
		idxFlds.put("validOn", true);
		idxFlds.put("businessRef", true);
		idx.setFields(idxFlds);
		idxs.add(idx);
		
		qryEntity.setIndexes(idxs);

		final List<QueryEntity> queryEntities = new ArrayList<>();
        queryEntities.add(qryEntity);
		
		final CacheConfiguration<K, V> ccfg = new CacheConfiguration<>(cacheName);
		....
		ccfg.setQueryEntities(queryEntities);




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9705.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
Steve,

Can you show your indexing configuration?

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9688.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Hi Val and thanks for the reply.

I narrowed it down a little bit. The problem comes from the indexing. I
tried with no indexing/query fields and then half the fields queryable and
it turns out that with indexing the loading performances decrease over time.

First, I would like to know whether it is expected for the performances to
decrease with the size of te cache.
Second, I've seen a lot of locks  on the document manager of Lucene. Are
there some parameters to configure?

Steve




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9681.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by vkulichenko <va...@gmail.com>.
Hi Steve,

Did you do any profileration? I would recommend to use VisualVM or JFR to
see what the hotspots are.

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9652.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: LoadCache Performance decreases with the size of the cache

Posted by "steve.hostettler" <st...@gmail.com>.
Maybe some more information on this. Apparently the more concurrent threads
the worse it becomes. Therefore, I guess that there is some sort of lock (I
get why) that is maybe slowing down the load in the cache.

Here is my cache configuration:

		final CacheConfiguration<K, V> ccfg = new CacheConfiguration<>(cacheName);

		final CacheJdbcPojoStoreFactory<Object, Object> storeFactory = new
CacheJdbcPojoStoreFactory<>();
		storeFactory.setDataSourceFactory(new JndiFactory<>(datasourceJndiName));
		storeFactory.setDialect(new OracleDialect());

		if (parallelLoadCacheMinThreshold != null) {
		
storeFactory.setParallelLoadCacheMinimumThreshold(parallelLoadCacheMinThreshold);
		}
		if (maxPoolSize != null) {
			storeFactory.setMaximumPoolSize(maxPoolSize);
		}

		ccfg.setCacheStoreFactory(storeFactory);
		ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
		ccfg.setAffinityMapper(new AffinityKeyMapper() {
			private static final long serialVersionUID = 1L;

			@Override
			public void reset() {
			}

			@Override
			public Object affinityKey(Object key) {
				if (key instanceof CKey) {
					return ((CKey) key).getRowId();
				} else if (key instanceof BinaryObject) {
					BinaryObject binaryKey = (BinaryObject) key;
					Object affKey = binaryKey.field("rowId");
					return affKey == null ? "" : affKey;
				}
				throw new IllegalStateException("Affinity function null for " + key);
			}
		});
		ccfg.setAffinity(new FairAffinityFunction(false, nbPartitions) {
			private static final long serialVersionUID = 1L;

			/** {@inheritDoc} */
		    @Override public int partition(Object key) {
		        return key.hashCode() % getPartitions();
		    }

		});
		ccfg.setStartSize(5*1024*1024);



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/LoadCache-Performance-decreases-with-the-size-of-the-cache-tp9645p9647.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.