You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Köhler Christian <C....@zfmk.de> on 2013/11/12 16:59:40 UTC

Missing Documents after db import

Hi!

I experience a mismatch between the number of indexed documents and the
number of documents actually in the solr index. I can not find any
reason for this in the log files. How do I find out, why some documents
are deleted from the index?

Setup:
Solr 4.4 using DIH fetching 1000 rows form a MySQL db.

Doing a full import shows that all 1000 rows were processed, 0 deleted:

<lst name="statusMessages">
   <str name="Total Requests made to DataSource">1</str>
   <str name="Total Rows Fetched">1000</str>
   <str name="Total Documents Skipped">0</str>
   <str name="Full Dump Started">2013-11-12 14:58:40</str>
   <str name="">Indexing completed. Added/Updated: 1000 documents.
     Deleted 0 documents.</str>
   <str name="Committed">2013-11-12 14:58:41</str>
   <str name="Total Documents Processed">1000</str>
   <str name="Time taken">0:0:0.764</str>
</lst>

However the solr admin panel (as well as my application) has only 597
documents:

Num Docs: 597
Max Doc: 1000
Deleted Docs: 403
Version: 1550
Segment Count: 1

There might be some db rows containing data conflicting my schema.xml
How do identify those?

Regards
Chris

--
Zoologisches Forschungsmuseum Alexander Koenig
- Leibniz-Institut für Biodiversität der Tiere -
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de

Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn

Re: Missing Documents after db import

Posted by Köhler Christian <C....@zfmk.de>.
Hi Gora,

thanx for pointing me in the right direction. The problem was indeed
that some ids were not unique.

Regards
Chris

Am 12.11.2013 17:05, schrieb Gora Mohanty:
> On 12 November 2013 21:29, Köhler Christian <C....@zfmk.de> wrote:
>> Hi!
>>
>> I experience a mismatch between the number of indexed documents and the
>> number of documents actually in the solr index. I can not find any
>> reason for this in the log files. How do I find out, why some documents
>> are deleted from the index?
> [...]
>
> First thing I would look at is the field that you are using
> as the unique ID in the Solr schema. My guess would
> be that some of your documents have the same unique
> ID, and are overwriting one another during the indexing
> process.
>
> Regards,
> Gora
>

--
Zoologisches Forschungsmuseum Alexander Koenig
- Leibniz-Institut für Biodiversität der Tiere -
Adenauerallee 160, 53113 Bonn, Germany
www.zfmk.de

Stiftung des öffentlichen Rechts; Direktor: Prof. J. Wolfgang Wägele
Sitz: Bonn

Re: Missing Documents after db import

Posted by Gora Mohanty <go...@mimirtech.com>.
On 12 November 2013 21:29, Köhler Christian <C....@zfmk.de> wrote:
> Hi!
>
> I experience a mismatch between the number of indexed documents and the
> number of documents actually in the solr index. I can not find any
> reason for this in the log files. How do I find out, why some documents
> are deleted from the index?
[...]

First thing I would look at is the field that you are using
as the unique ID in the Solr schema. My guess would
be that some of your documents have the same unique
ID, and are overwriting one another during the indexing
process.

Regards,
Gora