You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Todd Long <lo...@gmail.com> on 2015/12/16 17:20:35 UTC

RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DIH Caching w/ BerkleyBackedCache

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
Todd,

I have no idea if this will perform acceptable with so many multiple values.  I doubt the solr/patch code was really optimized for such a use case.  In my production environment, I have je-6.2.31.jar on the classpath.  I don't think I've tried it with other versions.

James Dyer
Ingram Content Group

-----Original Message-----
From: Todd Long [mailto:longtm@gmail.com] 
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.