You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Nikola Tulechki <ni...@gmail.com> on 2012/08/01 12:15:14 UTC

Re: [lucy-user] Range queries in Lucy

Just a quick follow up concerning the "fixed width string" issue Peter was
talking about.

Note that if you work directly with unix timestamps (as I was), make sure
all your dates are posterior to 09 sept 2001, (1000000000). Otherwise use
Peter's solution (yyyymmdd - format)

NT

On Wed, Jul 25, 2012 at 6:08 PM, Nikola Tulechki
<ni...@gmail.com>wrote:

> Thank you Peter,
> It is exactly what I was looking for
> Cheers
> NT
>
>
> On Wed, Jul 25, 2012 at 3:42 PM, Peter Karman <pe...@peknet.com> wrote:
>
>> On 7/25/12 6:33 AM, Nikola Tulechki wrote:
>>
>>> Hello,
>>> Is there a way to specify range queries or query numerical fields like
>>> dates or user age in lucy using<  <=>  operators and integrate it in a
>>> normal query tree object.
>>> Thanks
>>>
>>
>>
>> Nikola,
>>
>> You can create a RangeQuery:
>>
>> https://metacpan.org/module/**Lucy::Search::RangeQuery<https://metacpan.org/module/Lucy::Search::RangeQuery>
>>
>> The Lucy QueryParser doesn't support native syntax for that though, so if
>> you want a query parser that does, you might want to look at:
>>
>> https://metacpan.org/module/**Search::Query::Parser<https://metacpan.org/module/Search::Query::Parser>
>> https://metacpan.org/module/**Search::Query::Dialect::Lucy<https://metacpan.org/module/Search::Query::Dialect::Lucy>
>>
>> where you can do things like:
>>
>>  my $parser = Search::Query->parser( dialect => 'Lucy' );
>>  my $query  = $parser->parse( 'foo=(123..456)' );
>>  # then pass to Lucy
>>  my $hits  = $lucy_searcher->hits( query => $query->as_lucy_query() );
>>
>> An important thing to note is that Lucy has only one public field storage
>> type, which is a string. So if you want to get coherent results from a
>> range query, make sure you are searching fixed-width strings. E.g., I
>> format all my dates as YYYYMMDD so that I can do range queries like:
>>
>>  my $all_hits_in_2012 = $parser->parse( 'mydate=(20120101..20121231)' );
>>
>> HTH,
>> pek
>>
>> --
>> Peter Karman  .  http://peknet.com/  .  peter@peknet.com
>>
>
>

Re: [lucy-user] retrieve the order of a hit in a sorted search

Posted by arjan <ar...@unitedknowledge.nl>.
Hi Marvin,

Completely clear, both the solution as why Lucy can't offer it as a core 
feature.
And good that you mentioned: generate always and only when you open a 
new searchindexer.

Thanx,
Arjan.


On 08/06/2012 08:53 PM, Marvin Humphrey wrote:
> On Mon, Aug 6, 2012 at 9:38 AM, arjan <ar...@unitedknowledge.nl> wrote:
>> If there is no way to retrieve this from the
>> $searcher->hits object, it could be done by doing two queries, one with
>> MatchAllQuery and the actual query. I just tried this (below) and that
>> works. However, it's not ideal.
> What you need is a value-to-ordinal mapping for the entire index on the field
> "epoch".
>
>      my %val_to_ord_map;
>      my $sort_spec = Lucy::Search::SortSpec->new(
>          rules => [Lucy::Search::SortRule->new(field => 'epoch')],
>      );
>      my $ord = 0;
>      my $all_hits = $searcher->hits(
>          query      => Lucy::Search::MatchAllQuery->new,
>          sort_spec  => $sort_spec,
>          num_wanted => $searcher->doc_max,
>      );
>      while (my $hit = $all_hits->next) {
>          $val_to_ord_map{$hit->{epoch}} = $ord++;
>      }
>
>      ...
>
>      my $hits = $searcher->hits(
>          query     => $query,
>          sort_spec => $sort_spec,
>      );
>      while (my $hit = $hits->next) {
>          my $ord = $val_to_ord_map{$hit->{epoch}};
>          ...
>      }
>
> Such a mapping needs to be fully regenerated every time the index is changed,
> because inserting a new value into the middle will cause many ordinals to
> increase.
>
> Lucy can't offer that as a core feature because regenerating full-index data
> structures is at odds with fast incremental index updates.  However, if you
> don't need near-real-time responsiveness (and you can afford the RAM), you can
> generate the map yourself each time you open a new IndexSearcher.
>
> HTH,
>
> Marvin Humphrey


-- 
Met vriendelijke groet,
Arjan Widlak

Bezoek onze site op:
http://www.unitedknowledge.nl

United Knowledge, inhoud en techniek
Bilderdijkstraat 79N
1015 CT Amsterdam
T +31 (0)20 737 1851
F +31 (0)84 877 0399
bureau@unitedknowledge.nl
http://www.unitedknowledge.nl

M +31 (0)6 2427 1444
E arjan@unitedknowledge.nl


Re: [lucy-user] retrieve the order of a hit in a sorted search

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Aug 6, 2012 at 9:38 AM, arjan <ar...@unitedknowledge.nl> wrote:
> If there is no way to retrieve this from the
> $searcher->hits object, it could be done by doing two queries, one with
> MatchAllQuery and the actual query. I just tried this (below) and that
> works. However, it's not ideal.

What you need is a value-to-ordinal mapping for the entire index on the field
"epoch".

    my %val_to_ord_map;
    my $sort_spec = Lucy::Search::SortSpec->new(
        rules => [Lucy::Search::SortRule->new(field => 'epoch')],
    );
    my $ord = 0;
    my $all_hits = $searcher->hits(
        query      => Lucy::Search::MatchAllQuery->new,
        sort_spec  => $sort_spec,
        num_wanted => $searcher->doc_max,
    );
    while (my $hit = $all_hits->next) {
        $val_to_ord_map{$hit->{epoch}} = $ord++;
    }

    ...

    my $hits = $searcher->hits(
        query     => $query,
        sort_spec => $sort_spec,
    );
    while (my $hit = $hits->next) {
        my $ord = $val_to_ord_map{$hit->{epoch}};
        ...
    }

Such a mapping needs to be fully regenerated every time the index is changed,
because inserting a new value into the middle will cause many ordinals to
increase.

Lucy can't offer that as a core feature because regenerating full-index data
structures is at odds with fast incremental index updates.  However, if you
don't need near-real-time responsiveness (and you can afford the RAM), you can
generate the map yourself each time you open a new IndexSearcher.

HTH,

Marvin Humphrey

Re: [lucy-user] retrieve the order of a hit in a sorted search

Posted by arjan <ar...@unitedknowledge.nl>.
Hi,

To answer my question myself. If there is no way to retrieve this from 
the $searcher->hits object, it could be done by doing two queries, one 
with MatchAllQuery and the actual query. I just tried this (below) and 
that works. However, it's not ideal.


     $phrase_query = $queryparser->parse("some phrase");
     my $match_all_query = Lucy::Search::MatchAllQuery->new;

     my $hits        = $self->env->current_searcher->hits(
         query       => $phrase_query,
         sort_spec   => $sort_spec,
         offset      => 0,
         num_wanted  => 1000000000,
     );
     my $all_hits    = $self->env->current_searcher->hits(
         query       => $match_all_query,
         sort_spec   => $sort_spec,
         offset      => 0,
         num_wanted  => 1000000000,
     );

     my $index = -1;
     my $found_indices = [ ];
     SMALL: while( my $hit = $hits->next ) {
         while( my $all_hit = $all_hits->next ) {
             $index++;
             if ( $hit->{ message_id } eq $all_hit->{ message_id } ) {
                 say "I'm a hit and number $index in order of publication.";
                 next SMALL;
             }
         }
     }

Kind regards,
Arjan.

On 08/06/2012 12:35 PM, arjan wrote:
> Hi all,
>
> Would it be somehow possible to retrieve the rownumber or count of a 
> hit in a sorted search?
>
> Suppose this is stored in lucy in this order: (simple example values 
> for epoch)
>
> color    epoch
> red              1
> blue             2
> green          0
>
> Suppose I search for blue and I sort by epoch. I would get 1 result 
> and this would be the 1st or 3rd item of all my sorted documents. 
> (depending on using reverse or not) Is there a way to find out that it 
> the hit is the 1st or 3rd item?
>
> And - by the way - I was very happy with Peters post saying that range 
> queries need to be fixed width. I never experienced a problem, because 
> I never used epoch values before September 9th 2001, (1000000000). And 
> just before I was about doing this, I read his email. Talking about 
> just-in-time information. ;)
>
> I assume the same goes for SortRules, meaning that I have to enter 
> epoch values, and other numerical values, into the lucy as fixed width 
> entries. Right?
>
> Kind regards,
> Arjan Widlak.


-- 
Met vriendelijke groet,
Arjan Widlak

Bezoek onze site op:
http://www.unitedknowledge.nl

United Knowledge, inhoud en techniek
Bilderdijkstraat 79N
1015 CT Amsterdam
T +31 (0)20 737 1851
F +31 (0)84 877 0399
bureau@unitedknowledge.nl
http://www.unitedknowledge.nl

M +31 (0)6 2427 1444
E arjan@unitedknowledge.nl


[lucy-user] retrieve the order of a hit in a sorted search

Posted by arjan <ar...@unitedknowledge.nl>.
Hi all,

Would it be somehow possible to retrieve the rownumber or count of a hit 
in a sorted search?

Suppose this is stored in lucy in this order: (simple example values for 
epoch)

color    epoch
red              1
blue             2
green          0

Suppose I search for blue and I sort by epoch. I would get 1 result and 
this would be the 1st or 3rd item of all my sorted documents. (depending 
on using reverse or not) Is there a way to find out that it the hit is 
the 1st or 3rd item?

And - by the way - I was very happy with Peters post saying that range 
queries need to be fixed width. I never experienced a problem, because I 
never used epoch values before September 9th 2001, (1000000000). And 
just before I was about doing this, I read his email. Talking about 
just-in-time information. ;)

I assume the same goes for SortRules, meaning that I have to enter epoch 
values, and other numerical values, into the lucy as fixed width 
entries. Right?

Kind regards,
Arjan Widlak.

Re: [lucy-user] Range queries in Lucy

Posted by Peter Karman <pe...@peknet.com>.
Nikola Tulechki wrote on 8/1/12 5:15 AM:
> Just a quick follow up concerning the "fixed width string" issue Peter was
> talking about. 
> 
> Note that if you work directly with unix timestamps (as I was), make sure all
> your dates are posterior to 09 sept 2001, (1000000000). Otherwise use Peter's
> solution (yyyymmdd - format)
> 

I would expect fixed-width with leading zeros to work too. E.g.:

 my $epoch_fixed = sprintf("%012d", $epoch);


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com