You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Cam Bazz <ca...@gmail.com> on 2008/06/12 19:45:00 UTC

lucene wildcard query with stop character

Hello,

Imagine I have the following documents having keys

A
A>B
A>B>C
A>B>D
A>B>C>D

now Imagine a query with keyword analyzer and a wildcard: A>B>*

which will bring me A>B>C , A>B>D and A>B>C>D

but I just want to get A>B>C and A>B>D

so can I make a query like A>B>* but does not have the > character after
A>B>

Best Regards,
-C.B.

Re: lucene wildcard query with stop character

Posted by Chris Hostetter <ho...@fucit.org>.
: Hrm.. can we see a more specific example of the type of data you are trying to
: query against here?

As i understand the question, this is a fairly classic hierarchical 
organization of documents.  Documents Foo>Bar>Baz and Foo>Bar>Bax are both 
children of Document Foo>Bar ... Foo>Barber is their aunt (a sibling 
document of Foo>Bar.

searching for  doc:Foo>Bar>*  will find all of the decendents of document 
Foo>Bar ... but you want to just find the direct children (not hte grand 
children)

the easiest way to tackled something like this is with multiple fields...

   doc_path: Foo>Bar>Baz
   parent:   Foo>Bar

..then you can query for parent:"Foo>Bar" to find all of the direct 
children, or doc:Foo>Bar>* to find all decendents.

i typically use an "ancestors" field, where every ancestor in the "family 
tree" is enumerated as a seperate field value, so that i'm not dependent 
on prefix queries to do "decendents" queries like that....

   doc_path: Foo>Bar>Baz
   parent:   Foo>Bar
   ancestor: Foo>Bar
   ancestor: Foo



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene wildcard query with stop character

Posted by Matthew Hall <mh...@informatics.jax.org>.
Hrm.. can we see a more specific example of the type of data you are 
trying to query against here?

Matt

Cam Bazz wrote:
> well the ? would work if the length of each token be same.
> however, instead of A>B>C I want tags that change dynamically from 1 to
> unlimited length.
>
> I just I could just pad every token to a normalized length such as
> 00000000...000A but i am hoping there is a better method.
>
> if we could tell lucene to do it like in a regular expression until a > is
> there to insert ??'s ...
>
> Another way could be to do the regularexpression outside lucene, but then
> still there is need for fetching the hits.
>
> Best.
> -C.B.
>
>
>
> On Thu, Jun 12, 2008 at 8:47 PM, Matthew Hall <mh...@informatics.jax.org>
> wrote:
>
>   
>> I assume you want all of your queries to function in this way?
>>
>> If so, you could just translate the * character into a ? at search time,
>> which should give you the functionality you are asking for.
>>
>> Unless I'm missing something.
>>
>> Matt
>>
>>
>> Cam Bazz wrote:
>>
>>     
>>> Hello,
>>>
>>> Imagine I have the following documents having keys
>>>
>>> A
>>> A>B
>>> A>B>C
>>> A>B>D
>>> A>B>C>D
>>>
>>> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>>>
>>> which will bring me A>B>C , A>B>D and A>B>C>D
>>>
>>> but I just want to get A>B>C and A>B>D
>>>
>>> so can I make a query like A>B>* but does not have the > character after
>>> A>B>
>>>
>>> Best Regards,
>>> -C.B.
>>>
>>>
>>>
>>>       
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   

-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene wildcard query with stop character

Posted by Cam Bazz <ca...@gmail.com>.
well the ? would work if the length of each token be same.
however, instead of A>B>C I want tags that change dynamically from 1 to
unlimited length.

I just I could just pad every token to a normalized length such as
00000000...000A but i am hoping there is a better method.

if we could tell lucene to do it like in a regular expression until a > is
there to insert ??'s ...

Another way could be to do the regularexpression outside lucene, but then
still there is need for fetching the hits.

Best.
-C.B.



On Thu, Jun 12, 2008 at 8:47 PM, Matthew Hall <mh...@informatics.jax.org>
wrote:

> I assume you want all of your queries to function in this way?
>
> If so, you could just translate the * character into a ? at search time,
> which should give you the functionality you are asking for.
>
> Unless I'm missing something.
>
> Matt
>
>
> Cam Bazz wrote:
>
>> Hello,
>>
>> Imagine I have the following documents having keys
>>
>> A
>> A>B
>> A>B>C
>> A>B>D
>> A>B>C>D
>>
>> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>>
>> which will bring me A>B>C , A>B>D and A>B>C>D
>>
>> but I just want to get A>B>C and A>B>D
>>
>> so can I make a query like A>B>* but does not have the > character after
>> A>B>
>>
>> Best Regards,
>> -C.B.
>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: lucene wildcard query with stop character

Posted by Matthew Hall <mh...@informatics.jax.org>.
I assume you want all of your queries to function in this way?

If so, you could just translate the * character into a ? at search time, 
which should give you the functionality you are asking for.

Unless I'm missing something.

Matt

Cam Bazz wrote:
> Hello,
>
> Imagine I have the following documents having keys
>
> A
> A>B
> A>B>C
> A>B>D
> A>B>C>D
>
> now Imagine a query with keyword analyzer and a wildcard: A>B>*
>
> which will bring me A>B>C , A>B>D and A>B>C>D
>
> but I just want to get A>B>C and A>B>D
>
> so can I make a query like A>B>* but does not have the > character after
> A>B>
>
> Best Regards,
> -C.B.
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org