You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Balaji.A" <re...@gmail.com> on 2010/07/13 14:22:58 UTC

Get only partial match results

Hi All,
   I have a specific requirement as stated below. Kindly suggest if this can
be acheived or not and the steps to acheive it.

I have 2 cores storing different kind of data.

My search query should return results in the below given order

1) Exact match resutls from core1
2) Exact match results from core2
3) Partial match results from core1
4) Partial match results from core2

Note: I don't want exact match results to be duplicated in Partial match
results.

Please suggest!

Thanks.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-partial-match-results-tp963212p963212.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Get only partial match results

Posted by "Balaji.A" <re...@gmail.com>.
Hi Jonathan,

   Once again many thanks for your guidance. I made it work this time :-)

Thanks,
Balaji.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-partial-match-results-tp963212p976106.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Get only partial match results

Posted by Jonathan Rochkind <ro...@jhu.edu>.
> 1) While doing a dismax query, I specify the query in double quotes for
> exact match. This works fine but I don't get any partial matches in search
> result.

Rather than specify your query in quotes for 'exact' matches, I was suggesting configuring the analyzers differently for your fields "core1_title_exact" and "core1_title_partial". -- oops, except I don't think I meant analyzers, I mean differnet class types in solr. 

But again, it depends on what you mean by 'exact' -- do you mean it must match the whole string start to finish?  If so, if you make the *_exact fields in schema.xml use a "string" solr.StrField instead of a  "text" solr.TextField, then querries will only match in those fields if they are _exact_, covering the whole indexed string start to finish, all punctuation and spaces etc exactly the same. (You could use some analyzers to say lowercase, remove punctuation, and normalize whitespace to make it a _bit_ more forgiving). No need for quoting the query, it'll only match if it's exact. 

Oops, except I just realized this isn't neccesarily true, sorry, because of the way the dismax query parser will deal with whitespace in the query. Hmm. 

If what you mean by 'exact' is just a phrase search, then you don't need the seperate *_exact fields in the first place, you can just use dismax 'ps' param with the right boost. 

Hmm, I think for the first case where 'exact' really does mean 'exact' (not phrase), you might be able to combine the _exact field configured as a solr.StrField, with the 'ps' technique, only mention the _exact fields in the dismax 'ps', not the dismax 'qf'.  

I'm not completely sure any of this will work, just giving you some ideas of how I'd try approaching it if it were me. 

> If the frequency of search term is more in "core2_content_exact" field,
> eventhough the search term is present atleast once in the field
> "core1_content_exact" I get "core2_content_exact" as my first search result
> item.

I'm surprised this is true with such gigantic boosts, but I'm not sure what to do about it, sorry. Although I guess the boosts I suggested aren't that different from each other, they just all are multipled by 1000, which won't make them so different from each other. You could try making the boosts even more ridiculously higher. at each stage than the last, maybe powers of 10.  ^1, ^10, ^100, ^1000, ^10000.  

Jonathan

Re: Get only partial match results

Posted by "Balaji.A" <re...@gmail.com>.
Thanks Jonathan. I appreciate your reply.

Though I got few ideas for implementing my requirement, I got stuck up with
few issues. It would be more helpful if you guide me in resolving those.

As you suggested I configured single core with different fields.

For example the core contains the following fields:

core1_title_exact (type : text_ws)
core1_title_partial (type : text)
core1_content_exact (type : text_ws)
core1_content_partial (type : text)
core2_title_exact (type : text_ws)
core2_title_partial (type: text)
core2_content_exact (type : text_ws)
core2_content_partial (type: text)


Problems
*******
1) While doing a dismax query, I specify the query in double quotes for
exact match. This works fine but I don't get any partial matches in search
result.

My query:
q="Ryder Cup"&qf=core1_title_exact^8000 core1_content_exact^7000
core2_title_exact^6000 core2_content_exact^5000 core1_title_partial^4000
core1_content_partial^3000 core2_title_partial^2000
core2_content_partial^1000

2) If the frequency of search term is more in "core2_content_exact" field,
eventhough the search term is present atleast once in the field
"core1_content_exact" I get "core2_content_exact" as my first search result
item. 

For example assume my search term is "Ryder Cup". And if the occurance of
Ryder Cup in core1_content_exact field is 1 and occurance of the same text
in core2_content_exact is about 15, search query is returning me
core2_content_exact as first result.

Is it something to do with term Frequency? How do I fix this problem? Even
if core1_content_exact field should be my topmost priority with the match of
atlest one search term.


Thanks,
Balaji
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Get-only-partial-match-results-tp963212p974850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get only partial match results

Posted by Jonathan Rochkind <ro...@jhu.edu>.
I think you're going to have trouble doing this with seperate cores. 
With seperate cores, you'll need to issue two querries to solr, one for 
each core. And then to intermingle results from the differnet cores like 
that, it's going to require difficult (esp to do at all efficiently) 
client side code. Different cores really are entirely seperate.

If you put everything in the same core, but with different solr fields 
used, it will be easier to intermingle.  Conceptually, think of all the 
documents you were thinking of as being indexed in 'core1' as being just 
in the single core, but having their text indexed in a field called, 
say, "text_core1".   Then the 'core2' documents are in that same core 
actually, but indexed under "text_core2".

Now if you just wanted results from core1 followed by results from 
core2, you could use dismax and boost the core1 field a lot, something like:

text_core1^1000   text_core2

Then we add in your requirement for "exact matches" first.  You need to 
be more precise about what you mean by "exact" matches and "partial" 
matches.  By "exact" do you mean phrase searching?  Do you mean it must 
match the entire field exactly start to finish?  Do you mean un-stemmed, 
where "partial" is stemmed?

Once you figure this out, one possible way to approach it is to set up a 
solr field with analyzers such that it will only match "exact" matches. 
For instance, if you really mean exact string match without any 
tokenization, you could use the KeywordTokenizer. If you are able to set 
up a solr field like this, then again using dismax, the solution is 
straightforward, something like:

qf=text_core1_exact^3000  text_core2_exact^2000  text_core1_partial^1000 
text_core2_partial

Hope this helps you think about how to approach your problem.

Jonathan

Balaji.A wrote:
> Hi All,
>    I have a specific requirement as stated below. Kindly suggest if this can
> be acheived or not and the steps to acheive it.
>
> I have 2 cores storing different kind of data.
>
> My search query should return results in the below given order
>
> 1) Exact match resutls from core1
> 2) Exact match results from core2
> 3) Partial match results from core1
> 4) Partial match results from core2
>
> Note: I don't want exact match results to be duplicated in Partial match
> results.
>
> Please suggest!
>
> Thanks.
>