You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Cam Bazz <ca...@gmail.com> on 2008/06/12 19:45:00 UTC
lucene wildcard query with stop character
Hello,
Imagine I have the following documents having keys
A
A>B
A>B>C
A>B>D
A>B>C>D
now Imagine a query with keyword analyzer and a wildcard: A>B>*
which will bring me A>B>C , A>B>D and A>B>C>D
but I just want to get A>B>C and A>B>D
so can I make a query like A>B>* but does not have the > character after
A>B>
Best Regards,
-C.B.
Re: lucene wildcard query with stop character
Posted by Chris Hostetter <ho...@fucit.org>.
: Hrm.. can we see a more specific example of the type of data you are trying to
: query against here?
As i understand the question, this is a fairly classic hierarchical
organization of documents. Documents Foo>Bar>Baz and Foo>Bar>Bax are both
children of Document Foo>Bar ... Foo>Barber is their aunt (a sibling
document of Foo>Bar.
searching for doc:Foo>Bar>* will find all of the decendents of document
Foo>Bar ... but you want to just find the direct children (not hte grand
children)
the easiest way to tackled something like this is with multiple fields...
doc_path: Foo>Bar>Baz
parent: Foo>Bar
..then you can query for parent:"Foo>Bar" to find all of the direct
children, or doc:Foo>Bar>* to find all decendents.
i typically use an "ancestors" field, where every ancestor in the "family
tree" is enumerated as a seperate field value, so that i'm not dependent
on prefix queries to do "decendents" queries like that....
doc_path: Foo>Bar>Baz
parent: Foo>Bar
ancestor: Foo>Bar
ancestor: Foo
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene wildcard query with stop character
Posted by Matthew Hall <mh...@informatics.jax.org>.
Hrm.. can we see a more specific example of the type of data you are
trying to query against here?
Matt
Cam Bazz wrote:
> well the ? would work if the length of each token be same.
> however, instead of A>B>C I want tags that change dynamically from 1 to
> unlimited length.
>
> I just I could just pad every token to a normalized length such as
> 00000000...000A but i am hoping there is a better method.
>
> if we could tell lucene to do it like in a regular expression until a > is
> there to insert ??'s ...
>
> Another way could be to do the regularexpression outside lucene, but then
> still there is need for fetching the hits.
>
> Best.
> -C.B.
>
>
>
> On Thu, Jun 12, 2008 at 8:47 PM, Matthew Hall <mh...@informatics.jax.org>
> wrote:
>
>
>> I assume you want all of your queries to function in this way?
>>
>> If so, you could just translate the * character into a ? at search time,
>> which should give you the functionality you are asking for.
>>
>> Unless I'm missing something.
>>
>> Matt
>>
>>
>> Cam Bazz wrote:
>>
>>
>>> Hello,
>>>
>>> Imagine I have the following documents having keys
>>>
>>> A
>>> A>B
>>> A>B>C
>>> A>B>D
>>> A>B>C>D
>>>
>>> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>>>
>>> which will bring me A>B>C , A>B>D and A>B>C>D
>>>
>>> but I just want to get A>B>C and A>B>D
>>>
>>> so can I make a query like A>B>* but does not have the > character after
>>> A>B>
>>>
>>> Best Regards,
>>> -C.B.
>>>
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
>
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: lucene wildcard query with stop character
Posted by Cam Bazz <ca...@gmail.com>.
well the ? would work if the length of each token be same.
however, instead of A>B>C I want tags that change dynamically from 1 to
unlimited length.
I just I could just pad every token to a normalized length such as
00000000...000A but i am hoping there is a better method.
if we could tell lucene to do it like in a regular expression until a > is
there to insert ??'s ...
Another way could be to do the regularexpression outside lucene, but then
still there is need for fetching the hits.
Best.
-C.B.
On Thu, Jun 12, 2008 at 8:47 PM, Matthew Hall <mh...@informatics.jax.org>
wrote:
> I assume you want all of your queries to function in this way?
>
> If so, you could just translate the * character into a ? at search time,
> which should give you the functionality you are asking for.
>
> Unless I'm missing something.
>
> Matt
>
>
> Cam Bazz wrote:
>
>> Hello,
>>
>> Imagine I have the following documents having keys
>>
>> A
>> A>B
>> A>B>C
>> A>B>D
>> A>B>C>D
>>
>> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>>
>> which will bring me A>B>C , A>B>D and A>B>C>D
>>
>> but I just want to get A>B>C and A>B>D
>>
>> so can I make a query like A>B>* but does not have the > character after
>> A>B>
>>
>> Best Regards,
>> -C.B.
>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: lucene wildcard query with stop character
Posted by Matthew Hall <mh...@informatics.jax.org>.
I assume you want all of your queries to function in this way?
If so, you could just translate the * character into a ? at search time,
which should give you the functionality you are asking for.
Unless I'm missing something.
Matt
Cam Bazz wrote:
> Hello,
>
> Imagine I have the following documents having keys
>
> A
> A>B
> A>B>C
> A>B>D
> A>B>C>D
>
> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>
> which will bring me A>B>C , A>B>D and A>B>C>D
>
> but I just want to get A>B>C and A>B>D
>
> so can I make a query like A>B>* but does not have the > character after
> A>B>
>
> Best Regards,
> -C.B.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org