You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucy.apache.org by Nikola Tulechki <ni...@gmail.com> on 2012/08/01 12:15:14 UTC
Re: [lucy-user] Range queries in Lucy
Just a quick follow up concerning the "fixed width string" issue Peter was
talking about.
Note that if you work directly with unix timestamps (as I was), make sure
all your dates are posterior to 09 sept 2001, (1000000000). Otherwise use
Peter's solution (yyyymmdd - format)
NT
On Wed, Jul 25, 2012 at 6:08 PM, Nikola Tulechki
<ni...@gmail.com>wrote:
> Thank you Peter,
> It is exactly what I was looking for
> Cheers
> NT
>
>
> On Wed, Jul 25, 2012 at 3:42 PM, Peter Karman <pe...@peknet.com> wrote:
>
>> On 7/25/12 6:33 AM, Nikola Tulechki wrote:
>>
>>> Hello,
>>> Is there a way to specify range queries or query numerical fields like
>>> dates or user age in lucy using< <=> operators and integrate it in a
>>> normal query tree object.
>>> Thanks
>>>
>>
>>
>> Nikola,
>>
>> You can create a RangeQuery:
>>
>> https://metacpan.org/module/**Lucy::Search::RangeQuery<https://metacpan.org/module/Lucy::Search::RangeQuery>
>>
>> The Lucy QueryParser doesn't support native syntax for that though, so if
>> you want a query parser that does, you might want to look at:
>>
>> https://metacpan.org/module/**Search::Query::Parser<https://metacpan.org/module/Search::Query::Parser>
>> https://metacpan.org/module/**Search::Query::Dialect::Lucy<https://metacpan.org/module/Search::Query::Dialect::Lucy>
>>
>> where you can do things like:
>>
>> my $parser = Search::Query->parser( dialect => 'Lucy' );
>> my $query = $parser->parse( 'foo=(123..456)' );
>> # then pass to Lucy
>> my $hits = $lucy_searcher->hits( query => $query->as_lucy_query() );
>>
>> An important thing to note is that Lucy has only one public field storage
>> type, which is a string. So if you want to get coherent results from a
>> range query, make sure you are searching fixed-width strings. E.g., I
>> format all my dates as YYYYMMDD so that I can do range queries like:
>>
>> my $all_hits_in_2012 = $parser->parse( 'mydate=(20120101..20121231)' );
>>
>> HTH,
>> pek
>>
>> --
>> Peter Karman . http://peknet.com/ . peter@peknet.com
>>
>
>
Re: [lucy-user] retrieve the order of a hit in a sorted search
Posted by arjan <ar...@unitedknowledge.nl>.
Hi Marvin,
Completely clear, both the solution as why Lucy can't offer it as a core
feature.
And good that you mentioned: generate always and only when you open a
new searchindexer.
Thanx,
Arjan.
On 08/06/2012 08:53 PM, Marvin Humphrey wrote:
> On Mon, Aug 6, 2012 at 9:38 AM, arjan <ar...@unitedknowledge.nl> wrote:
>> If there is no way to retrieve this from the
>> $searcher->hits object, it could be done by doing two queries, one with
>> MatchAllQuery and the actual query. I just tried this (below) and that
>> works. However, it's not ideal.
> What you need is a value-to-ordinal mapping for the entire index on the field
> "epoch".
>
> my %val_to_ord_map;
> my $sort_spec = Lucy::Search::SortSpec->new(
> rules => [Lucy::Search::SortRule->new(field => 'epoch')],
> );
> my $ord = 0;
> my $all_hits = $searcher->hits(
> query => Lucy::Search::MatchAllQuery->new,
> sort_spec => $sort_spec,
> num_wanted => $searcher->doc_max,
> );
> while (my $hit = $all_hits->next) {
> $val_to_ord_map{$hit->{epoch}} = $ord++;
> }
>
> ...
>
> my $hits = $searcher->hits(
> query => $query,
> sort_spec => $sort_spec,
> );
> while (my $hit = $hits->next) {
> my $ord = $val_to_ord_map{$hit->{epoch}};
> ...
> }
>
> Such a mapping needs to be fully regenerated every time the index is changed,
> because inserting a new value into the middle will cause many ordinals to
> increase.
>
> Lucy can't offer that as a core feature because regenerating full-index data
> structures is at odds with fast incremental index updates. However, if you
> don't need near-real-time responsiveness (and you can afford the RAM), you can
> generate the map yourself each time you open a new IndexSearcher.
>
> HTH,
>
> Marvin Humphrey
--
Met vriendelijke groet,
Arjan Widlak
Bezoek onze site op:
http://www.unitedknowledge.nl
United Knowledge, inhoud en techniek
Bilderdijkstraat 79N
1015 CT Amsterdam
T +31 (0)20 737 1851
F +31 (0)84 877 0399
bureau@unitedknowledge.nl
http://www.unitedknowledge.nl
M +31 (0)6 2427 1444
E arjan@unitedknowledge.nl
Re: [lucy-user] retrieve the order of a hit in a sorted search
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Mon, Aug 6, 2012 at 9:38 AM, arjan <ar...@unitedknowledge.nl> wrote:
> If there is no way to retrieve this from the
> $searcher->hits object, it could be done by doing two queries, one with
> MatchAllQuery and the actual query. I just tried this (below) and that
> works. However, it's not ideal.
What you need is a value-to-ordinal mapping for the entire index on the field
"epoch".
my %val_to_ord_map;
my $sort_spec = Lucy::Search::SortSpec->new(
rules => [Lucy::Search::SortRule->new(field => 'epoch')],
);
my $ord = 0;
my $all_hits = $searcher->hits(
query => Lucy::Search::MatchAllQuery->new,
sort_spec => $sort_spec,
num_wanted => $searcher->doc_max,
);
while (my $hit = $all_hits->next) {
$val_to_ord_map{$hit->{epoch}} = $ord++;
}
...
my $hits = $searcher->hits(
query => $query,
sort_spec => $sort_spec,
);
while (my $hit = $hits->next) {
my $ord = $val_to_ord_map{$hit->{epoch}};
...
}
Such a mapping needs to be fully regenerated every time the index is changed,
because inserting a new value into the middle will cause many ordinals to
increase.
Lucy can't offer that as a core feature because regenerating full-index data
structures is at odds with fast incremental index updates. However, if you
don't need near-real-time responsiveness (and you can afford the RAM), you can
generate the map yourself each time you open a new IndexSearcher.
HTH,
Marvin Humphrey
Re: [lucy-user] retrieve the order of a hit in a sorted search
Posted by arjan <ar...@unitedknowledge.nl>.
Hi,
To answer my question myself. If there is no way to retrieve this from
the $searcher->hits object, it could be done by doing two queries, one
with MatchAllQuery and the actual query. I just tried this (below) and
that works. However, it's not ideal.
$phrase_query = $queryparser->parse("some phrase");
my $match_all_query = Lucy::Search::MatchAllQuery->new;
my $hits = $self->env->current_searcher->hits(
query => $phrase_query,
sort_spec => $sort_spec,
offset => 0,
num_wanted => 1000000000,
);
my $all_hits = $self->env->current_searcher->hits(
query => $match_all_query,
sort_spec => $sort_spec,
offset => 0,
num_wanted => 1000000000,
);
my $index = -1;
my $found_indices = [ ];
SMALL: while( my $hit = $hits->next ) {
while( my $all_hit = $all_hits->next ) {
$index++;
if ( $hit->{ message_id } eq $all_hit->{ message_id } ) {
say "I'm a hit and number $index in order of publication.";
next SMALL;
}
}
}
Kind regards,
Arjan.
On 08/06/2012 12:35 PM, arjan wrote:
> Hi all,
>
> Would it be somehow possible to retrieve the rownumber or count of a
> hit in a sorted search?
>
> Suppose this is stored in lucy in this order: (simple example values
> for epoch)
>
> color epoch
> red 1
> blue 2
> green 0
>
> Suppose I search for blue and I sort by epoch. I would get 1 result
> and this would be the 1st or 3rd item of all my sorted documents.
> (depending on using reverse or not) Is there a way to find out that it
> the hit is the 1st or 3rd item?
>
> And - by the way - I was very happy with Peters post saying that range
> queries need to be fixed width. I never experienced a problem, because
> I never used epoch values before September 9th 2001, (1000000000). And
> just before I was about doing this, I read his email. Talking about
> just-in-time information. ;)
>
> I assume the same goes for SortRules, meaning that I have to enter
> epoch values, and other numerical values, into the lucy as fixed width
> entries. Right?
>
> Kind regards,
> Arjan Widlak.
--
Met vriendelijke groet,
Arjan Widlak
Bezoek onze site op:
http://www.unitedknowledge.nl
United Knowledge, inhoud en techniek
Bilderdijkstraat 79N
1015 CT Amsterdam
T +31 (0)20 737 1851
F +31 (0)84 877 0399
bureau@unitedknowledge.nl
http://www.unitedknowledge.nl
M +31 (0)6 2427 1444
E arjan@unitedknowledge.nl
[lucy-user] retrieve the order of a hit in a sorted search
Posted by arjan <ar...@unitedknowledge.nl>.
Hi all,
Would it be somehow possible to retrieve the rownumber or count of a hit
in a sorted search?
Suppose this is stored in lucy in this order: (simple example values for
epoch)
color epoch
red 1
blue 2
green 0
Suppose I search for blue and I sort by epoch. I would get 1 result and
this would be the 1st or 3rd item of all my sorted documents. (depending
on using reverse or not) Is there a way to find out that it the hit is
the 1st or 3rd item?
And - by the way - I was very happy with Peters post saying that range
queries need to be fixed width. I never experienced a problem, because I
never used epoch values before September 9th 2001, (1000000000). And
just before I was about doing this, I read his email. Talking about
just-in-time information. ;)
I assume the same goes for SortRules, meaning that I have to enter epoch
values, and other numerical values, into the lucy as fixed width
entries. Right?
Kind regards,
Arjan Widlak.
Re: [lucy-user] Range queries in Lucy
Posted by Peter Karman <pe...@peknet.com>.
Nikola Tulechki wrote on 8/1/12 5:15 AM:
> Just a quick follow up concerning the "fixed width string" issue Peter was
> talking about.
>
> Note that if you work directly with unix timestamps (as I was), make sure all
> your dates are posterior to 09 sept 2001, (1000000000). Otherwise use Peter's
> solution (yyyymmdd - format)
>
I would expect fixed-width with leading zeros to work too. E.g.:
my $epoch_fixed = sprintf("%012d", $epoch);
--
Peter Karman . http://peknet.com/ . peter@peknet.com