You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2013/12/02 17:17:01 UTC

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

On Thu, Nov 28, 2013 at 2:19 AM, Thomas den Braber <th...@delos.nl> wrote:
> When I do a range query on a sortable Int32Type I get an error: "term is a
> Lucy::Object::CharBuf, and not comparable to a Lucy::Object::Integer32"
>
> I use the range in the same way as in the example:
> http://search.cpan.org/~creamyg/Lucy-0.3.3/lib/Lucy/Search/RangeQuery.pod

Ah.  You might be able to work around that by supplying values like so:

    my $range_query = Lucy::Search::RangeQuery->new(
        field      => 'product_number',
        lower_term => Lucy::Object::Integer32->new(value => 3),
    );

> Do you know if there is a speed difference between sorting on Int32Type
> fields and text fields with leading zero's ?

Should be negligible.

(Gory details: We pre-sort everything at index-time and write out binary
integer ordinals.  Most comparisons happen between the ordinals and are very
fast.  Some text comparisons happen but these scale with the number of
segments in the index, not the number of documents matched by the query.)

Marvin Humphrey

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by "Nick D." <nd...@globaldataguard.com>.
Marvin Humphrey wrote
> On Thu, Dec 12, 2013 at 10:36 AM, Nick D. &lt;

> ndwyer@

> &gt; wrote:
>> Can I not use Int32Type when indexing integers that I want to do a range
>> query on later?
> 
> That's right.  Int32Type isn't public and isn't ready for prime time in
> Lucy
> 0.3.x.
> 
>> Can you give me an example of the leading zeros because I think I tried
>> that
>> also but I may be miss understanding what you mean by leading zeros?
> 
> The idea is to define the field as an ordinary text type (probably
> StringType)
> and add leading zeroes at *index-time*.
> 
>     # If `$time_sec` is 14, then `$fields{time_sec}` will be `"00014"`.
>     $fields{time_sec} = sprintf("%0.5d", $time_sec);
>     $indexer->add_doc(\%fields);
> 
> Then your query will work at search-time:
> 
>> my $range_query = Lucy::Search::RangeQuery->new(
>>          field      => 'time_sec',
>>          lower_term => '00014',
>>      );
> 
> Marvin Humphrey


Thanks this worked!



--
View this message in context: http://lucene.472066.n3.nabble.com/lucy-user-Indexing-Lucy-Plan-Int32Type-tp4103497p4107325.html
Sent from the lucy-user mailing list archive at Nabble.com.

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Tue, Jan 28, 2014 at 1:57 PM, Nick D. <nd...@globaldataguard.com> wrote:

> Another question regarding this method of Range. Does the StringType have to
> be stored or can I mark it unstored and still be able to use a RangeQuery on
> it.

It can be unstored.  The data structures for full-document retrieval
and sorting are separate.

Marvin Humphrey

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by "Nick D." <nd...@globaldataguard.com>.
Marvin Humphrey wrote
> On Thu, Dec 12, 2013 at 10:36 AM, Nick D. &lt;

> ndwyer@

> &gt; wrote:
>> Can I not use Int32Type when indexing integers that I want to do a range
>> query on later?
> 
> That's right.  Int32Type isn't public and isn't ready for prime time in
> Lucy
> 0.3.x.
> 
>> Can you give me an example of the leading zeros because I think I tried
>> that
>> also but I may be miss understanding what you mean by leading zeros?
> 
> The idea is to define the field as an ordinary text type (probably
> StringType)
> and add leading zeroes at *index-time*.
> 
>     # If `$time_sec` is 14, then `$fields{time_sec}` will be `"00014"`.
>     $fields{time_sec} = sprintf("%0.5d", $time_sec);
>     $indexer->add_doc(\%fields);
> 
> Then your query will work at search-time:
> 
>> my $range_query = Lucy::Search::RangeQuery->new(
>>          field      => 'time_sec',
>>          lower_term => '00014',
>>      );
> 
> Marvin Humphrey


Another question regarding this method of Range. Does the StringType have to
be stored or can I mark it unstored and still be able to use a RangeQuery on
it.

Is this valid to do a RangeQuery on this field:

my $unindexed_string_type = Lucy::Plan::StringType->new( indexed => 0,
sortable => 1, stored => 0  ); 

Or do I need this:

my $unindexed_string_type = Lucy::Plan::StringType->new( indexed => 0,
sortable => 1, stored => 1  ); 



--
View this message in context: http://lucene.472066.n3.nabble.com/lucy-user-Indexing-Lucy-Plan-Int32Type-tp4103497p4114066.html
Sent from the lucy-user mailing list archive at Nabble.com.

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Thu, Dec 12, 2013 at 10:36 AM, Nick D. <nd...@globaldataguard.com> wrote:
> Can I not use Int32Type when indexing integers that I want to do a range
> query on later?

That's right.  Int32Type isn't public and isn't ready for prime time in Lucy
0.3.x.

> Can you give me an example of the leading zeros because I think I tried that
> also but I may be miss understanding what you mean by leading zeros?

The idea is to define the field as an ordinary text type (probably StringType)
and add leading zeroes at *index-time*.

    # If `$time_sec` is 14, then `$fields{time_sec}` will be `"00014"`.
    $fields{time_sec} = sprintf("%0.5d", $time_sec);
    $indexer->add_doc(\%fields);

Then your query will work at search-time:

> my $range_query = Lucy::Search::RangeQuery->new(
>          field      => 'time_sec',
>          lower_term => '00014',
>      );

Marvin Humphrey

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by "Nick D." <nd...@globaldataguard.com>.
I am having the same issue as well and not sure how to correct this. Can I
not use Int32Type when indexing integers that I want to do a range query on
later?

Can you give me an example of the leading zeros because I think I tried that
also but I may be miss understanding what you mean by leading zeros? I tried
adding zeros like so 

my $range_query = Lucy::Search::RangeQuery->new(
         field      => 'time_sec',
         lower_term => '00014',
     ); 

I am storing the seconds in a day into an Int32Type so it's range will be
from 0-86400. If I storing in a Int32Type is impossible to use on RangeQuery
then how should I store this value and have it sorted the correct way (ex.
1111 is not smaller than 21 just because "1111" begins with "1" and "21"
begins with "2") ???

Thanks in advance,

Nicholas Dwyer



--
View this message in context: http://lucene.472066.n3.nabble.com/lucy-user-Indexing-Lucy-Plan-Int32Type-tp4103497p4106414.html
Sent from the lucy-user mailing list archive at Nabble.com.

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by Thomas den Braber <th...@delos.nl>.
That is OK, I will use the leading zero's until the Integer support is ready.

Thanks for your help,

Thomas den Braber

-----Original Message-----
From: Marvin Humphrey <ma...@rectangular.com>
To: Thomas den Braber <th...@delos.nl>
Cc: user@lucy.apache.org
Date: Mon, 2 Dec 2013 10:39:30 -0800
Subject: Re: [lucy-user] Indexing Lucy::Plan::Int32Type

> On Mon, Dec 2, 2013 at 9:08 AM, Thomas den Braber <th...@delos.nl> wrote:
> 
> >> Ah.  You might be able to work around that by supplying values like so:
> >>
> >>     my $range_query = Lucy::Search::RangeQuery->new(
> >>         field      => 'product_number',
> >>         lower_term => Lucy::Object::Integer32->new(value => 3),
> >>     );
> >
> > I got an error when doing so:
> >
> > Invalid parameter: 'value'\n\tcfish_XSBind_allot_params at xs\\XSBind.c line
> > 507\n\tXS_Lucy_Object_Obj_new at lib\\\\Lucy.xs line 343
> >
> > I am using version 0.3.3
> 
> OK, it looks like that workaround is only feasible with the current master
> branch, not 0.3.x.  (Using `Clownfish::Integer32` instead of
> `Lucy::Object::Integer32`.)
> 
> That being the case, does the leading-zeroes technique work for you?  It's
> probably better anyway because it doesn't depend on non-public API features.
> 
> Marvin Humphrey



Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Dec 2, 2013 at 9:08 AM, Thomas den Braber <th...@delos.nl> wrote:

>> Ah.  You might be able to work around that by supplying values like so:
>>
>>     my $range_query = Lucy::Search::RangeQuery->new(
>>         field      => 'product_number',
>>         lower_term => Lucy::Object::Integer32->new(value => 3),
>>     );
>
> I got an error when doing so:
>
> Invalid parameter: 'value'\n\tcfish_XSBind_allot_params at xs\\XSBind.c line
> 507\n\tXS_Lucy_Object_Obj_new at lib\\\\Lucy.xs line 343
>
> I am using version 0.3.3

OK, it looks like that workaround is only feasible with the current master
branch, not 0.3.x.  (Using `Clownfish::Integer32` instead of
`Lucy::Object::Integer32`.)

That being the case, does the leading-zeroes technique work for you?  It's
probably better anyway because it doesn't depend on non-public API features.

Marvin Humphrey

Re: [lucy-user] Indexing Lucy::Plan::Int32Type

Posted by Thomas den Braber <th...@delos.nl>.

> Ah.  You might be able to work around that by supplying values like so:
> 
>     my $range_query = Lucy::Search::RangeQuery->new(
>         field      => 'product_number',
>         lower_term => Lucy::Object::Integer32->new(value => 3),
>     );

I got an error when doing so:

Invalid parameter: 'value'\n\tcfish_XSBind_allot_params at xs\\XSBind.c line
507\n\tXS_Lucy_Object_Obj_new at lib\\\\Lucy.xs line 343

I am using version 0.3.3



--
Thomas den Braber