You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by ka...@nokia.com on 2010/09/14 10:37:48 UTC

Now, a lost data problem with trunk too

Hi folks,

It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing.  But before we got too far on that problem, I encountered what appears to be an even more serious problem.  Specifically, I'm losing field data out of some records.

The index I'm building is fairly large - some 25M records when complete.  What I'm seeing is that the main searchable field ("value") is not finding all the records it should.  I was able to locate one such record just now:

curl "http://localhost:8983/solr/nose/standard?fl=*,score&q=id:\"POI|DEU:205:20187477:1014564|brandenburger+tor\""
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">95</int><lst name="params"><str name="q">id:"POI|DEU:205:20187477:1014564|brandenburger tor"</str><str name="fl">*,score</str></lst></lst><result name="response" numFound="1" start="0" maxScore="17.335964"><doc><float name="score">17.335964</float><str name="entityid">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="id">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="reference">brandenburger tor, potsdam, deutschland</str><str name="type">poi</str> ... </doc></result>
</response>

.. but it is completely missing the supposedly required "value" field:

   <!-- The value field.  This contains the actual string that will be matched.-->
   <field name="value" type="string_idx"  required="true" stored="false"/>

The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the "value" field, but others aren't.  I know the "value" field is non-empty, because it is used to construct the "id" field, which is correct above.

Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know.  FWIW, this is a trunk build from Monday morning.

Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


RE: Now, a lost data problem with trunk too

Posted by ka...@nokia.com.
Yes. Of course.  My oversight.

So I did the obvious thing and searched for the value field directly, and it is there:

<str name="id">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="language">ger</str><str name="latitude">52.39935</str><str name="longitude">13.04793</str><str name="reference">brandenburger tor, potsdam, deutschland</str>


So, something about the way I am searching for it is not right.  Looking elsewhere.

Karl


________________________________________
From: ext Simon Willnauer [simon.willnauer@googlemail.com]
Sent: Tuesday, September 14, 2010 4:52 AM
To: dev@lucene.apache.org
Subject: Re: Now, a lost data problem with trunk too

On Tue, Sep 14, 2010 at 10:37 AM,  <ka...@nokia.com> wrote:
> Hi folks,
>
> It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing.  But before we got too far on that problem, I encountered what appears to be an even more serious problem.  Specifically, I'm losing field data out of some records.
>
> The index I'm building is fairly large - some 25M records when complete.  What I'm seeing is that the main searchable field ("value") is not finding all the records it should.  I was able to locate one such record just now:
>
> curl "http://localhost:8983/solr/nose/standard?fl=*,score&q=id:\"POI|DEU:205:20187477:1014564|brandenburger+tor\""
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">95</int><lst name="params"><str name="q">id:"POI|DEU:205:20187477:1014564|brandenburger tor"</str><str name="fl">*,score</str></lst></lst><result name="response" numFound="1" start="0" maxScore="17.335964"><doc><float name="score">17.335964</float><str name="entityid">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="id">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="reference">brandenburger tor, potsdam, deutschland</str><str name="type">poi</str> ... </doc></result>
> </response>
>
> .. but it is completely missing the supposedly required "value" field:
>
>   <!-- The value field.  This contains the actual string that will be matched.-->
>   <field name="value" type="string_idx"  required="true" stored="false"/>
that does not show up since it is not stored - maybe thats the reason :)

simon
>
> The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the "value" field, but others aren't.  I know the "value" field is non-empty, because it is used to construct the "id" field, which is correct above.
>
> Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know.  FWIW, this is a trunk build from Monday morning.
>
> Karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Now, a lost data problem with trunk too

Posted by Simon Willnauer <si...@googlemail.com>.
On Tue, Sep 14, 2010 at 10:37 AM,  <ka...@nokia.com> wrote:
> Hi folks,
>
> It looks like the handle leak may be real - Simon Willnauer has been looking at it and could not find an explanation for the behavior I have been seeing.  But before we got too far on that problem, I encountered what appears to be an even more serious problem.  Specifically, I'm losing field data out of some records.
>
> The index I'm building is fairly large - some 25M records when complete.  What I'm seeing is that the main searchable field ("value") is not finding all the records it should.  I was able to locate one such record just now:
>
> curl "http://localhost:8983/solr/nose/standard?fl=*,score&q=id:\"POI|DEU:205:20187477:1014564|brandenburger+tor\""
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">0</int><int name="QTime">95</int><lst name="params"><str name="q">id:"POI|DEU:205:20187477:1014564|brandenburger tor"</str><str name="fl">*,score</str></lst></lst><result name="response" numFound="1" start="0" maxScore="17.335964"><doc><float name="score">17.335964</float><str name="entityid">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="id">POI|DEU:205:20187477:1014564|brandenburger tor</str><str name="reference">brandenburger tor, potsdam, deutschland</str><str name="type">poi</str> ... </doc></result>
> </response>
>
> .. but it is completely missing the supposedly required "value" field:
>
>   <!-- The value field.  This contains the actual string that will be matched.-->
>   <field name="value" type="string_idx"  required="true" stored="false"/>
that does not show up since it is not stored - maybe thats the reason :)

simon
>
> The code that does the indexing is straightforward, and *some* of the records of this class are indeed searchable via the "value" field, but others aren't.  I know the "value" field is non-empty, because it is used to construct the "id" field, which is correct above.
>
> Simon is also looking into this one, but if anyone else has advice for figuring out what's going wrong, please let me know.  FWIW, this is a trunk build from Monday morning.
>
> Karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org