You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pascal Bleser <pa...@atosorigin.com> on 2009/12/23 11:35:04 UTC
More terms than documents, impossible to sort on tokenized fields
Using Solr 1.4 (release)
My complete schema is here, basically a somewhat stripped down version of the
example's schema.xml, and a few additional fields: http://pastebin.be/22596
I've read past posts on this issue and believe to mostly understand what caused
it, but I cannot find that problem in my configuration, even after implementing
the workarounds: when I perform a search (*), I get the a 500 stating: "there
are more terms than documents in field "text", but it's impossible to sort on
tokenized fields"
The complete stack trace is here: http://pastebin.be/22597
(*) The search query is as follows:
/solr/select?sort=alphaNameSort+desc&q=java
* the dismax SearchHandler is configured as default
* the documents are PDFs that have been uploaded with curl into /update/extract
* there are 1398 documents in the index (as of numDocs)
Now, the field "text" is multi valued and tokenized:
--->8----------------------------------------------------------------------
<field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>
<copyField source="title" dest="text"/>
<copyField source="subject" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="comments" dest="text"/>
<copyField source="content" dest="text"/>
--->8----------------------------------------------------------------------
(it's the "text" fieldType as in the example configuration)
But I do try to resort to the "alphaOnlySort" trick, in order to perform the
search on a non-multivalued field:
--->8----------------------------------------------------------------------
<fieldType name="alphaOnlySort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
<!-- ... -->
<field name="alphaNameSort" type="alphaOnlySort" indexed="true"
stored="false"/>
<copyField source="title" dest="alphaNameSort"/>
--->8----------------------------------------------------------------------
I even explicitly specify that the sorting must be done on the field
"alphaNameSort" (using the "sort=alphaNameSort+desc" query parameter), and Solr
is still complaining about the field "text".
Same error happens if I specify other fields to use for sort, such as id.
I'm seriously puzzled at this point. Am I hitting an obscure bug in 1.4 ? A
misleading error message ? Or maybe a bug in my brain ? :)
cheers
--
-o) Pascal Bleser <lo...@fosdem.org> http://www.fosdem.org
/\\ FOSDEM 2010 :: 6+7 February 2010 in Brussels
_\_v Free and Opensource Software Developers European Meeting
Re: More terms than documents, impossible to sort on tokenized fields
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Dec 23, 2009 at 4:05 PM, Pascal Bleser <pascal.bleser@atosorigin.com
> wrote:
> Using Solr 1.4 (release)
> My complete schema is here, basically a somewhat stripped down version of
> the
> example's schema.xml, and a few additional fields:
> http://pastebin.be/22596
>
> I've read past posts on this issue and believe to mostly understand what
> caused
> it, but I cannot find that problem in my configuration, even after
> implementing
> the workarounds: when I perform a search (*), I get the a 500 stating:
> "there
> are more terms than documents in field "text", but it's impossible to sort
> on
> tokenized fields"
>
> The complete stack trace is here: http://pastebin.be/22597
>
> (*) The search query is as follows:
> /solr/select?sort=alphaNameSort+desc&q=java
>
>
The stack trace indicates that you are using the ord(text) function query.
Check your solrconfig.xml if ord(text) exists in the default section of your
search handler.
--
Regards,
Shalin Shekhar Mangar.