You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by su ha <s_...@yahoo.com> on 2012/03/02 08:22:14 UTC

Range queries in successive positions

Hi,
I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
that they do not have any social security numbers (3-digits 2-digits 4-digits). 

(How) Can I write a query (with the QueryParser) that searches for this pattern?

e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.

Is this possible with the default QueryParser?
Or is there some other programmatic way to do it?
thanks,
Sandeep

Re: Range queries in successive positions

Posted by Ian Lea <ia...@gmail.com>.
Or take a look at search.regex.RegexQuery contrib module.  You won't
be able to use that via QueryParser either.

It might make more sense to do the sanitizing before indexing rather than after.


--
Ian.


On Fri, Mar 2, 2012 at 7:26 AM, Trejkaz <tr...@trypticon.org> wrote:
> On Fri, Mar 2, 2012 at 6:22 PM, su ha <s_...@yahoo.com> wrote:
>> Hi,
>> I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
>> that they do not have any social security numbers (3-digits 2-digits 4-digits).
>>
>> (How) Can I write a query (with the QueryParser) that searches for this pattern?
>>
>> e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
>> Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.
>>
>> Is this possible with the default QueryParser?
>> Or is there some other programmatic way to do it?
>
> The programmatic way is to use SpanMultiTermQueryWrapper around each
> RangeQuery and then SpanNearQuery around the lot.
>
> The default QueryParser probably can't do it. I believe someone was
> enhancing it for wildcards but I'm not sure if range queries were
> included in all that.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Range queries in successive positions

Posted by Trejkaz <tr...@trypticon.org>.
On Fri, Mar 2, 2012 at 6:22 PM, su ha <s_...@yahoo.com> wrote:
> Hi,
> I'm new to Lucene. I'm indexed some documents with Lucene and need to sanitize it to ensure
> that they do not have any social security numbers (3-digits 2-digits 4-digits).
>
> (How) Can I write a query (with the QueryParser) that searches for this pattern?
>
> e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes hits with any 2, 3 or 4 digit number.
> Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at all.
>
> Is this possible with the default QueryParser?
> Or is there some other programmatic way to do it?

The programmatic way is to use SpanMultiTermQueryWrapper around each
RangeQuery and then SpanNearQuery around the lot.

The default QueryParser probably can't do it. I believe someone was
enhancing it for wildcards but I'm not sure if range queries were
included in all that.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org