You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Rob Staveley (Tom)" <rs...@seseit.com> on 2006/07/11 12:16:41 UTC

Compressed fields

What's a sensible guideline for length of an un-indexed field and whether to
store it compressed or not? I have a 300 character document synopsis, which
I store. Would there be any saving having it compressed?

Can you have an index with a stored un-indexed field which is sometimes
compressed and sometimes not?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Missing fields used for a sort

Posted by Chris Hostetter <ho...@fucit.org>.
: > I can't thank you enough, Yonik :-)
: >
:
: send money <G>.....

Bah! ... there's lots of money in the world, they print more and more of
it every day.

Quality Patches ... now there's something I bet Yonik would *really*
appreciate!  :)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Missing fields used for a sort

Posted by Erick Erickson <er...@gmail.com>.
On 7/11/06, Rob Staveley (Tom) <rs...@seseit.com> wrote:
>
> I can't thank you enough, Yonik :-)
>

send money <G>.....

Re: Missing fields used for a sort

Posted by Yonik Seeley <ys...@gmail.com>.
Oh, and here is how Solr uses it to construct the correct lucene Sort objects:
http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/search/Sorting.java?view=markup


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Missing fields used for a sort

Posted by "Rob Staveley (Tom)" <rs...@seseit.com>.
I can't thank you enough, Yonik :-)

-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com] 
Sent: 11 July 2006 18:05
To: java-user@lucene.apache.org
Subject: Re: Missing fields used for a sort

On 7/11/06, Rob Staveley (Tom) <rs...@seseit.com> wrote:
> Thanks for the info both of you. Of course Lucene obeys Murphy's law that
> the missing ones appear first when you reverse sort, which is what
Murphy's
> law says you want to do.
>
> Does solr have a custom build of Lucene in it, or is the functionality
> required to required to get the missing ones to the end of the list
> something that can be configured anyhow in Lucene?

Solr uses stock-lucene and builds on top of it.
The bits that put missing values at the end is here:
http://issues.apache.org/jira/browse/LUCENE-406

And in Solr here:
http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/s
earch/MissingStringLastComparatorSource.java?revision=382610&view=markup


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Missing fields used for a sort

Posted by Yonik Seeley <ys...@gmail.com>.
On 7/11/06, Rob Staveley (Tom) <rs...@seseit.com> wrote:
> Thanks for the info both of you. Of course Lucene obeys Murphy's law that
> the missing ones appear first when you reverse sort, which is what Murphy's
> law says you want to do.
>
> Does solr have a custom build of Lucene in it, or is the functionality
> required to required to get the missing ones to the end of the list
> something that can be configured anyhow in Lucene?

Solr uses stock-lucene and builds on top of it.
The bits that put missing values at the end is here:
http://issues.apache.org/jira/browse/LUCENE-406

And in Solr here:
http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/search/MissingStringLastComparatorSource.java?revision=382610&view=markup


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Missing fields used for a sort

Posted by "Rob Staveley (Tom)" <rs...@seseit.com>.
Thanks for the info both of you. Of course Lucene obeys Murphy's law that
the missing ones appear first when you reverse sort, which is what Murphy's
law says you want to do. 

Does solr have a custom build of Lucene in it, or is the functionality
required to required to get the missing ones to the end of the list
something that can be configured anyhow in Lucene?

-----Original Message-----
From: Yonik Seeley [mailto:yseeley@gmail.com] 
Sent: 11 July 2006 15:37
To: java-user@lucene.apache.org
Subject: Re: Missing fields used for a sort

On 7/11/06, Erick Erickson <er...@gmail.com> wrote:
> So I guess all the documents without a particular field all get defaulted
> for you. Which end of the list they get placed at I guess you'll find out
> <G>...

For lucene, it depends on what direction you are sorting.

Solr gives control over this in it's schema... here are some snippets
from the example schema.xml:

    <!-- The optional sortMissingLast and sortMissingFirst attributes are
         currently supported on types that are sorted internally as a
strings.
       - If sortMissingLast="true" then a sort on this field will
cause documents
       without the field to come after documents with the field,
       regardless of the requested sort order (asc or desc).
       - If sortMissingFirst="true" then a sort on this field will
cause documents
       without the field to come before documents with the field,
       regardless of the requested sort order.
       - If sortMissingLast="false" and sortMissingFirst="false" (the
default),
       then default lucene sorting will be used which places docs
without the field
       first in an ascending sort and last in a descending sort.
    -->

    <!-- Numeric field types that manipulate the value into
         a string value that isn't human readable in it's internal form,
         but with a lexicographic ordering the same as the numeric ordering
         so that range queries correctly work. -->
    <fieldtype name="sint" class="solr.SortableIntField"
sortMissingLast="true"/>
    <fieldtype name="slong" class="solr.SortableLongField"
sortMissingLast="true"/>
    <fieldtype name="sfloat" class="solr.SortableFloatField"
sortMissingLast="true"/>
    <fieldtype name="sdouble" class="solr.SortableDoubleField"
sortMissingLast="true"/>


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Missing fields used for a sort

Posted by Yonik Seeley <ys...@gmail.com>.
On 7/11/06, Erick Erickson <er...@gmail.com> wrote:
> So I guess all the documents without a particular field all get defaulted
> for you. Which end of the list they get placed at I guess you'll find out
> <G>...

For lucene, it depends on what direction you are sorting.

Solr gives control over this in it's schema... here are some snippets
from the example schema.xml:

    <!-- The optional sortMissingLast and sortMissingFirst attributes are
         currently supported on types that are sorted internally as a strings.
       - If sortMissingLast="true" then a sort on this field will
cause documents
       without the field to come after documents with the field,
       regardless of the requested sort order (asc or desc).
       - If sortMissingFirst="true" then a sort on this field will
cause documents
       without the field to come before documents with the field,
       regardless of the requested sort order.
       - If sortMissingLast="false" and sortMissingFirst="false" (the default),
       then default lucene sorting will be used which places docs
without the field
       first in an ascending sort and last in a descending sort.
    -->

    <!-- Numeric field types that manipulate the value into
         a string value that isn't human readable in it's internal form,
         but with a lexicographic ordering the same as the numeric ordering
         so that range queries correctly work. -->
    <fieldtype name="sint" class="solr.SortableIntField"
sortMissingLast="true"/>
    <fieldtype name="slong" class="solr.SortableLongField"
sortMissingLast="true"/>
    <fieldtype name="sfloat" class="solr.SortableFloatField"
sortMissingLast="true"/>
    <fieldtype name="sdouble" class="solr.SortableDoubleField"
sortMissingLast="true"/>


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Missing fields used for a sort

Posted by Erick Erickson <er...@gmail.com>.
Quote from Chris...

"you can only sort on fields with 0 or 1 terms per doc" from a post of his
today even....

So I guess all the documents without a particular field all get defaulted
for you. Which end of the list they get placed at I guess you'll find out
<G>...

Erick

Missing fields used for a sort

Posted by "Rob Staveley (Tom)" <rs...@seseit.com>.
If I want to sort on a field that doesn't exist in all documents in my
index, can I have a default value for documents which lack that field (e.g.
MAXINT or 0)?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org