You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Lucas <bl...@gmail.com> on 2006/09/19 19:29:12 UTC

strange highlighting behavior

I’m experiencing some unusual behavior when I perform a search with highlighting enabled.    

 

I’ve set up “id” as “sint” and indexed properly, but performing a search gives the following result:

 

<doc>

<float name="score">3.0647626</float>

<int name="group_id">2</int>

<int name="id">369845</int>

<int name="language_id">1</int>

<arr name="search_keywords">

<str>Microsoft Reorganizes</str>

</arr>

<str name="title">Microsoft Reorganizes</str>

</doc>

 

<doc>

<float name="score">3.0647626</float>

<int name="group_id">2</int>

<int name="id">369850</int>

<int name="language_id">1</int>

<arr name="search_keywords">

<str>Microsoft Moment</str>

</arr>

<str name="title">Microsoft Moment</str>

</doc>

 

…

 

<lst name="highlighting">

<lst name="€Zҵ">

<arr name="title">

<str><em>Microsoft</em> Reorganizes</str>

</arr>

</lst>

 

<lst name="€ZҺ">

<arr name="title">

<str><em>Microsoft</em> Moment</str>

</arr>

</lst>

 

<lst name="€#31;৳">

<arr name="title">

<str>NASCAR with <em>Microsoft</em></str>

</arr>

</lst>

 

</lst>

 

The unusual characters on lst name=”…” are what I can’t figure out, as it DEFINITELY is not the id.  I’ve tried indexed id with “integer”, “sint”, and “string” all with the same result. 

 

Using Solr-9-18 and Tomcat 5.5.17.

 

Anyway to see where it’s getting these strange names from?  My understanding is that those should be the numeric ID’s given above.

Brian


Re: strange highlighting behavior

Posted by Yonik Seeley <yo...@apache.org>.
On 9/19/06, Brian Lucas <bl...@gmail.com> wrote:
> Converting to 'integer' and deleting/reindexing fixed it. Can 'sint' be
> used for the id with highlighting, or does one need to use integer or string
> for that?

It should be usable (but I personally haven't tested that).
If it's not, it's a bug and will be fixed :-)

>  Just trying to figure out if it's a bug with sint, or possibly
> due to the fact I could have changed sint to integer without deleting the
> data.

The latter would be my guess.

-Yonik

RE: strange highlighting behavior

Posted by Brian Lucas <bl...@gmail.com>.
Yonik, thanks for the tip. 

Converting to 'integer' and deleting/reindexing fixed it.  Can 'sint' be
used for the id with highlighting, or does one need to use integer or string
for that?  Just trying to figure out if it's a bug with sint, or possibly
due to the fact I could have changed sint to integer without deleting the
data.
 
-B

-----Original Message-----
From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
Sent: Tuesday, September 19, 2006 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: strange highlighting behavior

On 9/19/06, Yonik Seeley <yo...@apache.org> wrote:
> The fix would be to use
> FieldType.indexedToReadable() to convert the indexed form back to a
> readable form.

Oops, that should be storedToReadable since the id is obtained from
the stored fields, not from the index.

Hmmm, a quick look at the code suggests this is already beeing done:

         String printId = searcher.getSchema().printableUniqueKey(doc);
         fragments.add(printId == null ? null : printId, docSummaries);

What you are seeing may be due to indexing documents with one version
of the schema and viewing them with another.  Try deleting the
solr/data/index directory and then reindexing everything.

-Yonik


Re: strange highlighting behavior

Posted by Yonik Seeley <yo...@apache.org>.
On 9/19/06, Yonik Seeley <yo...@apache.org> wrote:
> The fix would be to use
> FieldType.indexedToReadable() to convert the indexed form back to a
> readable form.

Oops, that should be storedToReadable since the id is obtained from
the stored fields, not from the index.

Hmmm, a quick look at the code suggests this is already beeing done:

         String printId = searcher.getSchema().printableUniqueKey(doc);
         fragments.add(printId == null ? null : printId, docSummaries);

What you are seeing may be due to indexing documents with one version
of the schema and viewing them with another.  Try deleting the
solr/data/index directory and then reindexing everything.

-Yonik

Re: strange highlighting behavior

Posted by Yonik Seeley <yo...@apache.org>.
On 9/19/06, Brian Lucas <bl...@gmail.com> wrote:
> The unusual characters on lst name="…" are what I can't figure out, as it DEFINITELY
> is not the id.  I've tried indexed id with "integer", "sint", and "string" all with the
> same result.

Yes, looks like you hit a bug where you are seeing the "indexed" form
of sint (which is more of a binary format that allows terms to be
ordered in numeric order).  The fix would be to use
FieldType.indexedToReadable() to convert the indexed form back to a
readable form.

It should have worked with "integer" or "string" since the indexed and
readable forms are identical... I suspect the old documents with an
sint ID still exist in your index and that is what you are seeing.


-Yonik