You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mikhail Khludnev <mk...@griddynamics.com> on 2014/12/02 20:59:51 UTC
indexing numbers in texts for range queries
Hello Searchers,
Don't you remember any examples of indexing numbers inside of plain text.
eg. if I have a text: "foo and 10 bars" I want to find it with a query like
foo [8 TO 20] bars.
The question no.1 whether to put trie terms into the separate field or they
can reside at the same text one? Note, enumerating [0-9]* terms in
MultiTermQuery is not an option for me, I definitely need the trie field
magic!
Perhaps you can remind a blog or chapter, whatever makes me happy.
Thanks a lot!
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mk...@griddynamics.com>
Re: indexing numbers in texts for range queries
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Mikhail,
Range queries allowed inside phrases with ComplexPhraseQParser, but I think string order is used.
Also LUCENE-5205 / SOLR-5410 is meant to supersede complex phrase. It might have that functionality too.
Ahmet
On Tuesday, December 2, 2014 10:43 PM, Mikhail Khludnev <mk...@griddynamics.com> wrote:
Hello Michael,
On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
> Mikhail - I can imagine a filter that strips out everything but numbers
> and then indexes those with a (separate) numeric (trie) field. But I don't
> believe you can do phrase or other proximity queries across multiple
> fields.
Technically it's not a big deal. I used FieldMaskingSpanQuery before.
As long as an or-query is good enough, I think this problem is not too
> hard? But if you need proximity it becomes more complicated. Once in the
> distant past we coded a numeric range query using a complicated set of
> wildcard queries that could handle large numbers efficiently - this search
> index (Verity) had no range capability, so we had to mock it up using
> text. The way this worked was something along these lines:
>
> 1) transform all the numbers into their binary encoding (8 = 0b00001000,
> eg)
> 2) write queries by encoding the range as a set of bitmasks represented by
> wildcard queries:
> [8 TO 20] becomes (0b00001000 0b000100?? 0b00010100)
>
> I know you said you cannot use [0-9]* terms, but you will not see terrible
> term explosion with this. What's your concern there?
>
it's not terrible but significant, I wish to make a try with the trie
magic, which reduces query time processing.
Thanks for suggestions.
Do I remember correctly that you ignored last Lucene Revolution?
>
> -Mike
>
>
>
> On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:
>
>> Hello Searchers,
>>
>> Don't you remember any examples of indexing numbers inside of plain text.
>> eg. if I have a text: "foo and 10 bars" I want to find it with a query
>> like
>> foo [8 TO 20] bars.
>> The question no.1 whether to put trie terms into the separate field or
>> they
>> can reside at the same text one? Note, enumerating [0-9]* terms in
>> MultiTermQuery is not an option for me, I definitely need the trie field
>> magic!
>> Perhaps you can remind a blog or chapter, whatever makes me happy.
>>
>> Thanks a lot!
>>
>>
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mkhludnev@griddynamics.com
>
Re: indexing numbers in texts for range queries
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote:
> Thanks for suggestions. Do I remember correctly that you ignored last
> Lucene Revolution?
I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm
excited to catch up on the presentations as the videos become available,
though.
-Mike
Re: indexing numbers in texts for range queries
Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Michael,
On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
> Mikhail - I can imagine a filter that strips out everything but numbers
> and then indexes those with a (separate) numeric (trie) field. But I don't
> believe you can do phrase or other proximity queries across multiple
> fields.
Technically it's not a big deal. I used FieldMaskingSpanQuery before.
As long as an or-query is good enough, I think this problem is not too
> hard? But if you need proximity it becomes more complicated. Once in the
> distant past we coded a numeric range query using a complicated set of
> wildcard queries that could handle large numbers efficiently - this search
> index (Verity) had no range capability, so we had to mock it up using
> text. The way this worked was something along these lines:
>
> 1) transform all the numbers into their binary encoding (8 = 0b00001000,
> eg)
> 2) write queries by encoding the range as a set of bitmasks represented by
> wildcard queries:
> [8 TO 20] becomes (0b00001000 0b000100?? 0b00010100)
>
> I know you said you cannot use [0-9]* terms, but you will not see terrible
> term explosion with this. What's your concern there?
>
it's not terrible but significant, I wish to make a try with the trie
magic, which reduces query time processing.
Thanks for suggestions.
Do I remember correctly that you ignored last Lucene Revolution?
>
> -Mike
>
>
>
> On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:
>
>> Hello Searchers,
>>
>> Don't you remember any examples of indexing numbers inside of plain text.
>> eg. if I have a text: "foo and 10 bars" I want to find it with a query
>> like
>> foo [8 TO 20] bars.
>> The question no.1 whether to put trie terms into the separate field or
>> they
>> can reside at the same text one? Note, enumerating [0-9]* terms in
>> MultiTermQuery is not an option for me, I definitely need the trie field
>> magic!
>> Perhaps you can remind a blog or chapter, whatever makes me happy.
>>
>> Thanks a lot!
>>
>>
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mk...@griddynamics.com>
Re: indexing numbers in texts for range queries
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Mikhail - I can imagine a filter that strips out everything but numbers
and then indexes those with a (separate) numeric (trie) field. But I
don't believe you can do phrase or other proximity queries across
multiple fields. As long as an or-query is good enough, I think this
problem is not too hard? But if you need proximity it becomes more
complicated. Once in the distant past we coded a numeric range query
using a complicated set of wildcard queries that could handle large
numbers efficiently - this search index (Verity) had no range
capability, so we had to mock it up using text. The way this worked was
something along these lines:
1) transform all the numbers into their binary encoding (8 = 0b00001000, eg)
2) write queries by encoding the range as a set of bitmasks represented
by wildcard queries:
[8 TO 20] becomes (0b00001000 0b000100?? 0b00010100)
I know you said you cannot use [0-9]* terms, but you will not see
terrible term explosion with this. What's your concern there?
-Mike
On 12/02/2014 02:59 PM, Mikhail Khludnev wrote:
> Hello Searchers,
>
> Don't you remember any examples of indexing numbers inside of plain text.
> eg. if I have a text: "foo and 10 bars" I want to find it with a query like
> foo [8 TO 20] bars.
> The question no.1 whether to put trie terms into the separate field or they
> can reside at the same text one? Note, enumerating [0-9]* terms in
> MultiTermQuery is not an option for me, I definitely need the trie field
> magic!
> Perhaps you can remind a blog or chapter, whatever makes me happy.
>
> Thanks a lot!
>