You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jerome Renard <je...@gmail.com> on 2011/06/07 18:02:54 UTC

Data not always returned

Hi all,

I have a problem with my index. Even though I always index the same
data over and over again, whenever I try
a couple of searches (they are always the same as they are issued by a
unit test suite) I do not get the same
results, sometimes I get 3 successes and 2 failures and sometimes it
is the other way around it is unpredictable.

Here is what I am trying to do:

I created a new Solr core with its specific solrconfig.xml and schema.xml
This core stores a list of towns which I plan to use with an
auto-suggestion system, using ngrams (no Suggester)

The indexing process is always the same :
1. the import script deletes all documents in the core :
<delete><query>*:*</query></delete> and <commit/>
2. the import script fetches date from postgres, 100 rows at a time
2. the import script adds these 100 documents and sends a <commit/>
3. once all the rows (around 40 000) have been imported the script
send an <optimize/> query

Here is what happens:
I run the indexer once and search for 'foo' I get results I expect but
if I search for 'bar' I get nothing
I reindex once again and search for 'foo' I get nothing, but if I
search for 'bar' I get results
The search is made on the "name" field which is a pretty common
TextField with ngrams.

I tried to physically remove the index (rm -rf path/to/index) and
reindex everything as well and
not all searches work, sometimes the 'foo' search work, sometimes the 'bar' one.

I tried a lot of differents things but now I am running out of ideas.
This is why I am asking for help.

Some useful informations :
Solr version : 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
2011-03-26 18:00:07
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Java 1.5.0_24 on Mac Os X
solrconfig.xml and schema.xml are attached

Thanks in advance for your help.

Re: Data not always returned

Posted by Jerome Renard <je...@gmail.com>.
Hi Erick

On Tue, Jun 7, 2011 at 11:42 PM, Erick Erickson <er...@gmail.com> wrote:
> Well, this is odd. Several questions
>
> 1> what do your logs show? I'm wondering if somehow some data is getting
>     rejected. I have no idea why that would be, but if you're seeing indexing
>     exceptions that would explain it.
> 2> on the admin/stats page, are maxDocs and numDocs the same in the success
>     /failure case? And are they equal to 40,000?
> 3> what does &debugQuery=on show in the two cases? I'd expect it to be
> identical, but...
> 4> admin/schema browser. Look at your three fields and see if things
> like unique-terms are
>     identical.
> 5> are the rows being returned before indexing in the same order? I'm
> wondering if somehow
>     you're getting documents overwritten by having the same id (uniqueKey).
> 6> Have you poked around with Luke to see what, if anything, is dissimilar?
>
> These are shots in the dark, but my supposition is that somehow you're
> not indexing what
> you expect, the questions above might give us a clue where to look next.
>

You were right, I found a nasty problem with the indexer and postgres which
prevented some documents to be indexed. Once I fixed this problem everything
worked fine.

Thanks a lot for your support.

Best Regards,

-- 
Jérôme

Re: Data not always returned

Posted by Erick Erickson <er...@gmail.com>.
Well, this is odd. Several questions

1> what do your logs show? I'm wondering if somehow some data is getting
     rejected. I have no idea why that would be, but if you're seeing indexing
     exceptions that would explain it.
2> on the admin/stats page, are maxDocs and numDocs the same in the success
     /failure case? And are they equal to 40,000?
3> what does &debugQuery=on show in the two cases? I'd expect it to be
identical, but...
4> admin/schema browser. Look at your three fields and see if things
like unique-terms are
     identical.
5> are the rows being returned before indexing in the same order? I'm
wondering if somehow
     you're getting documents overwritten by having the same id (uniqueKey).
6> Have you poked around with Luke to see what, if anything, is dissimilar?

These are shots in the dark, but my supposition is that somehow you're
not indexing what
you expect, the questions above might give us a clue where to look next.

Best
Erick

On Tue, Jun 7, 2011 at 12:02 PM, Jerome Renard <je...@gmail.com> wrote:
> Hi all,
>
> I have a problem with my index. Even though I always index the same
> data over and over again, whenever I try
> a couple of searches (they are always the same as they are issued by a
> unit test suite) I do not get the same
> results, sometimes I get 3 successes and 2 failures and sometimes it
> is the other way around it is unpredictable.
>
> Here is what I am trying to do:
>
> I created a new Solr core with its specific solrconfig.xml and schema.xml
> This core stores a list of towns which I plan to use with an
> auto-suggestion system, using ngrams (no Suggester)
>
> The indexing process is always the same :
> 1. the import script deletes all documents in the core :
> <delete><query>*:*</query></delete> and <commit/>
> 2. the import script fetches date from postgres, 100 rows at a time
> 2. the import script adds these 100 documents and sends a <commit/>
> 3. once all the rows (around 40 000) have been imported the script
> send an <optimize/> query
>
> Here is what happens:
> I run the indexer once and search for 'foo' I get results I expect but
> if I search for 'bar' I get nothing
> I reindex once again and search for 'foo' I get nothing, but if I
> search for 'bar' I get results
> The search is made on the "name" field which is a pretty common
> TextField with ngrams.
>
> I tried to physically remove the index (rm -rf path/to/index) and
> reindex everything as well and
> not all searches work, sometimes the 'foo' search work, sometimes the 'bar' one.
>
> I tried a lot of differents things but now I am running out of ideas.
> This is why I am asking for help.
>
> Some useful informations :
> Solr version : 3.1.0
> Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
> 2011-03-26 18:00:07
> Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
> Java 1.5.0_24 on Mac Os X
> solrconfig.xml and schema.xml are attached
>
> Thanks in advance for your help.
>