You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pascal Bleser <pa...@atosorigin.com> on 2009/12/23 11:35:04 UTC

More terms than documents, impossible to sort on tokenized fields

Using Solr 1.4 (release)
My complete schema is here, basically a somewhat stripped down version of the
example's schema.xml, and a few additional fields: http://pastebin.be/22596

I've read past posts on this issue and believe to mostly understand what caused
it, but I cannot find that problem in my configuration, even after implementing
the workarounds: when I perform a search (*), I get the a 500 stating: "there
are more terms than documents in field "text", but it's impossible to sort on
tokenized fields"

The complete stack trace is here: http://pastebin.be/22597

(*) The search query is as follows:
/solr/select?sort=alphaNameSort+desc&q=java

* the dismax SearchHandler is configured as default
* the documents are PDFs that have been uploaded with curl into /update/extract
* there are 1398 documents in the index (as of numDocs)

Now, the field "text" is multi valued and tokenized:
--->8----------------------------------------------------------------------
 <field name="text" type="text" indexed="true" stored="false"
        multiValued="true"/>
 <copyField source="title" dest="text"/>
 <copyField source="subject" dest="text"/>
 <copyField source="description" dest="text"/>
 <copyField source="comments" dest="text"/>
 <copyField source="content" dest="text"/>
--->8----------------------------------------------------------------------
(it's the "text" fieldType as in the example configuration)

But I do try to resort to the "alphaOnlySort" trick, in order to perform the
search on a non-multivalued field:
--->8----------------------------------------------------------------------
 <fieldType name="alphaOnlySort" class="solr.TextField"
            sortMissingLast="true" omitNorms="true">
   <analyzer>
     <tokenizer class="solr.KeywordTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory" />
     <filter class="solr.TrimFilterFactory" />
   </analyzer>
 </fieldType>
 <!-- ... -->
 <field name="alphaNameSort" type="alphaOnlySort" indexed="true"
        stored="false"/>
 <copyField source="title" dest="alphaNameSort"/>
--->8----------------------------------------------------------------------

I even explicitly specify that the sorting must be done on the field
"alphaNameSort" (using the "sort=alphaNameSort+desc" query parameter), and Solr
is still complaining about the field "text".

Same error happens if I specify other fields to use for sort, such as id.

I'm seriously puzzled at this point. Am I hitting an obscure bug in 1.4 ? A
misleading error message ? Or maybe a bug in my brain ? :)

cheers
-- 
  -o) Pascal Bleser <lo...@fosdem.org>    http://www.fosdem.org
  /\\       FOSDEM 2010 :: 6+7 February 2010 in Brussels
 _\_v Free and Opensource Software Developers European Meeting


Re: More terms than documents, impossible to sort on tokenized fields

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Wed, Dec 23, 2009 at 4:05 PM, Pascal Bleser <pascal.bleser@atosorigin.com
> wrote:

> Using Solr 1.4 (release)
> My complete schema is here, basically a somewhat stripped down version of
> the
> example's schema.xml, and a few additional fields:
> http://pastebin.be/22596
>
> I've read past posts on this issue and believe to mostly understand what
> caused
> it, but I cannot find that problem in my configuration, even after
> implementing
> the workarounds: when I perform a search (*), I get the a 500 stating:
> "there
> are more terms than documents in field "text", but it's impossible to sort
> on
> tokenized fields"
>
> The complete stack trace is here: http://pastebin.be/22597
>
> (*) The search query is as follows:
> /solr/select?sort=alphaNameSort+desc&q=java
>
>
The stack trace indicates that you are using the ord(text) function query.
Check your solrconfig.xml if ord(text) exists in the default section of your
search handler.

-- 
Regards,
Shalin Shekhar Mangar.