You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Funtick <fu...@efendi.ca> on 2009/08/18 05:23:41 UTC

Re: SOLR - extremely strange behavior! Documents disappeared...


But how to explain that within an hour (after commit) I have had about
500,000 new documents, and within 30 hours (after commit) only 1,300,000?

Same _random_enough_ documents... 

BTW, SOLR Console was showing only few hundreds "deletesById" although I
don't use any deleteById explicitly; only "update" with "allowOverwrite" and
"uniqueId".




markrmiller wrote:
> 
> I'd say you have a lot of documents that have the same id.
> When you add a doc with the same id, first the old one is deleted, then
> the
> new one is added (atomically though).
> 
> The deleted docs are not removed from the index immediately though - the
> doc
> id is just marked as deleted.
> 
> Over time though, as segments are merged due to hitting triggers while
> adding new documents, deletes are removed (which deletes depends on which
> segments have been merged).
> 
> So if you add a tone of documents over time, many with the same ids, you
> would likely see this type of maxDoc, numDoc churn. maxDoc will include
> deleted docs while numDoc will not.
> 
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> On Mon, Aug 17, 2009 at 11:09 PM, Funtick <fu...@efendi.ca> wrote:
> 
>>
>> After running an application which heavily uses MD5 HEX-representation as
>> <uniqueKey> for SOLR v.1.4-dev-trunk:
>>
>> 1. After 30 hours:
>> 101,000,000 documents added
>>
>> 2. Commit:
>> numDocs = 783,714
>> maxDoc = 3,975,393
>>
>> 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then
>> optimize:
>> numDocs=1,281,851
>> maxDocs=1,281,851
>>
>> It looks _extremely_ strange that within an hour I have such a huge
>> increase
>> with same 'average' document set...
>>
>> I am suspecting something goes wrong with Lucene buffer flush / index
>> merge
>> OR SOLR - Unique ID handling...
>>
>> According to my own estimates, I should have about 10,000,000 new
>> documents
>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day; same
>> 'random' documents.
>>
>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>> Gb.
>> Why? I haven't issued any "commit"...
>>
>> I am using ramBufferMB=8192
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017826.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR - extremely strange behavior! Documents disappeared...

Posted by Funtick <fu...@efendi.ca>.
One more hour, and I have +0.5 mlns more (after commit/optimize)

Something strange happening with SOLR buffer flush (if we have single
segment???)... explicit commit prevents it...

30 hours, with index flush, commit: 783,714
+ 1 hour, commit, optimize: 1,281,851
+ 1 hour, commit, optimize: 1,786,552

Same random docs retrieved from web...



Funtick wrote:
> 
> 
> But how to explain that within an hour (after commit) I have had about
> 500,000 new documents, and within 30 hours (after commit) only 783,714?
> 
> Same _random_enough_ documents... 
> 
> BTW, SOLR Console was showing only few hundreds "deletesById" although I
> don't use any deleteById explicitly; only "update" with "allowOverwrite"
> and "uniqueId".
> 
> 
> 
> 
> markrmiller wrote:
>> 
>> I'd say you have a lot of documents that have the same id.
>> When you add a doc with the same id, first the old one is deleted, then
>> the
>> new one is added (atomically though).
>> 
>> The deleted docs are not removed from the index immediately though - the
>> doc
>> id is just marked as deleted.
>> 
>> Over time though, as segments are merged due to hitting triggers while
>> adding new documents, deletes are removed (which deletes depends on which
>> segments have been merged).
>> 
>> So if you add a tone of documents over time, many with the same ids, you
>> would likely see this type of maxDoc, numDoc churn. maxDoc will include
>> deleted docs while numDoc will not.
>> 
>> 
>> -- 
>> - Mark
>> 
>> http://www.lucidimagination.com
>> 
>> On Mon, Aug 17, 2009 at 11:09 PM, Funtick <fu...@efendi.ca> wrote:
>> 
>>>
>>> After running an application which heavily uses MD5 HEX-representation
>>> as
>>> <uniqueKey> for SOLR v.1.4-dev-trunk:
>>>
>>> 1. After 30 hours:
>>> 101,000,000 documents added
>>>
>>> 2. Commit:
>>> numDocs = 783,714
>>> maxDoc = 3,975,393
>>>
>>> 3. Upload new docs to SOLR during 1 hour(!!!!!!!), then commit, then
>>> optimize:
>>> numDocs=1,281,851
>>> maxDocs=1,281,851
>>>
>>> It looks _extremely_ strange that within an hour I have such a huge
>>> increase
>>> with same 'average' document set...
>>>
>>> I am suspecting something goes wrong with Lucene buffer flush / index
>>> merge
>>> OR SOLR - Unique ID handling...
>>>
>>> According to my own estimates, I should have about 10,000,000 new
>>> documents
>>> now... I had 0.5 millions within an hour, and 0.8 mlns within a day;
>>> same
>>> 'random' documents.
>>>
>>> This morning index size was about 4Gb, then suddenly dropped below 0.5
>>> Gb.
>>> Why? I haven't issued any "commit"...
>>>
>>> I am using ramBufferMB=8192
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017728.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/SOLR-%3CuniqueKey%3E---extremely-strange-behavior%21-Documents-disappeared...-tp25017728p25017967.html
Sent from the Solr - User mailing list archive at Nabble.com.