You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by alexw <aw...@crossview.com> on 2011/10/05 05:00:45 UTC

A simple query?

Hi all,

This may seem to be an easy one but I have been struggling to get it
working. To simplify things, let's say I have a field that can contain any
combination of the 26 alphabetic letters, space delimited:

<doc>
    <myfield>a b</myfield>
</doc>
<doc>
    <myfield>b c</myfield>
</doc>
<doc>
    <myfield>x y z</myfield>
</doc>

The search term is a list of user specified letters, for exampe: a b y z

I would like only the following docs to be returned:

1. Any doc that contains exactly the 4 letters a b y z (sequence not
important)
2. Any docs that contains only a subset of the 4 letters a b y z (sequence
not important)

Note if a doc contains any letters other than the 4 letters a b y z will not
qualify. So in this case, only the first doc should be returned.

Can some one shed some light here and let me know how to get this working,
specifically:

1. What should the field type be (text, text_ws, string...)?
2. How does the query look like?

Thanks in advance!







--
View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3395465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A simple query?

Posted by Ahmet Arslan <io...@yahoo.com>.

Your use-case is pretty unique. One solutions might be to use MemoryIndex which is designed for "Prospective search". 

http://lucene.apache.org/java/2_4_0/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html

Your documents will be your stored "huge numbers of queries". Your user entered search term (a b y z)  will be tested against those stored queries.

In your example,

Your document will be a b y z and your stored queries will be

q1 = a b
q2 = b c
q3 = x y z

If you use AND operator, only q1/doc1 will be a hit.

--- On Wed, 10/5/11, alexw <aw...@crossview.com> wrote:

> From: alexw <aw...@crossview.com>
> Subject: Re: A simple query?
> To: solr-user@lucene.apache.org
> Date: Wednesday, October 5, 2011, 3:15 PM
> Thanks but, unfortunately that will
> not solve the problem since it will bring
> back both the first and second doc. Besides, the query
> terms is: a b y z,
> not just: a b
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3396297.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>

Re: A simple query?

Posted by alexw <aw...@crossview.com>.

Thanks but, unfortunately that will not solve the problem since it will bring
back both the first and second doc. Besides, the query terms is: a b y z,
not just: a b

--
View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3396297.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A simple query?

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.

Hi,
Set your default operator to OR i.e.

<solrQueryParser defaultOperator="OR"/> in schema.xml

Also keep your fieldType=text i.e.

<field name="myfield" type="text" indexed="true" stored="true"/>

As you would want whitespace tokenization and try your query with () i.e.

/select/?q=myfields:(a b)&version=2.2&start=0&rows=2&indent=on

This hopefully should solve your problem.

--
View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3395735.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A simple query?

Posted by alexw <aw...@crossview.com>.

Thanks Hoss and iorixxx.

Yes I probably did oversimplify the use case, which is fairly complicated to
explain. I think I might have found a workaround for the issue and I am
testing the performance of it.

Thanks again for your help!




--
View this message in context: http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3416863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: A simple query?

Posted by Chris Hostetter <ho...@fucit.org>.

: This may seem to be an easy one but I have been struggling to get it
: working. To simplify things, let's say I have a field that can contain any
: combination of the 26 alphabetic letters, space delimited:

I suspect you may have "over simplified" your problem in a way that may 
make some specific solutions for your real use case non-obvious becaues of 
the way you've simplified it.  

the more you can tell us about your *real* use case, the more likeley 
people can give you *real* answers.

That said...

: 1. Any doc that contains exactly the 4 letters a b y z (sequence not
: important)
: 2. Any docs that contains only a subset of the 4 letters a b y z (sequence
: not important)
: 
: Note if a doc contains any letters other than the 4 letters a b y z will not
: qualify. So in this case, only the first doc should be returned.

...since the number of values is finite (and small) i would index your 
data as a simple multivalued StrField (or TextField using a whitespace 
tokenizer) and then write your queries such that you deliberately exclude 
documents matching the inverse set of your input.

ie: query = "a b y z" ...

 q = +(a b y z) -(c d e f g h i j k l m n o b q r s t u v w x)

...or leverage filter cache...

 q = a b y z & fq = -c & fq = -d & fq = -e & fq = -f & ....



-Hoss