You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Max Metral <ma...@artsalliancelabs.com> on 2008/04/20 04:33:13 UTC

quick question I hope

I couldn't find this searching Google, but I'm sure I should've been
able to.  Let's say I have a document called "Bradford Street Play Area"
(because I do!), and I want a search for Bradford Street Park to work.
First, in general, I do an "all terms" search.  That fails, so I do an
OR search.  Problem is a HUGE number of documents have Street in them.
I don't mind that they match so much as that I'd like to have the term
frequency in the corpus influence the scoring.  Is there a Scorer or
query-boosting trick to accomplish this?

 

Thanks

--Max


RE: quick question I hope

Posted by Digy <di...@gmail.com>.
Additionaly, some words like(street, park, area etc.) can be defined as stop-words.

DIGY

-----Original Message-----
From: Michael Garski [mailto:mgarski@myspace.com] 
Sent: Monday, April 21, 2008 8:56 PM
To: lucene-net-dev@incubator.apache.org
Subject: RE: quick question I hope

Max - 

Even with the large number of items with 'Street' in them, I would expect the documents that also match 'Bradford' to have a higher score.  I'd suggest checking out the results using the Searchable.Explain() method to see how the scores are being calculated.

Michael

-----Original Message-----
From: Max Metral [mailto:max@artsalliancelabs.com] 
Sent: Saturday, April 19, 2008 7:33 PM
To: lucene-net-dev@incubator.apache.org
Subject: quick question I hope

I couldn't find this searching Google, but I'm sure I should've been
able to.  Let's say I have a document called "Bradford Street Play Area"
(because I do!), and I want a search for Bradford Street Park to work.
First, in general, I do an "all terms" search.  That fails, so I do an
OR search.  Problem is a HUGE number of documents have Street in them.
I don't mind that they match so much as that I'd like to have the term
frequency in the corpus influence the scoring.  Is there a Scorer or
query-boosting trick to accomplish this?

 

Thanks

--Max



RE: quick question I hope

Posted by Michael Garski <mg...@myspace.com>.
Max - 

Even with the large number of items with 'Street' in them, I would expect the documents that also match 'Bradford' to have a higher score.  I'd suggest checking out the results using the Searchable.Explain() method to see how the scores are being calculated.

Michael

-----Original Message-----
From: Max Metral [mailto:max@artsalliancelabs.com] 
Sent: Saturday, April 19, 2008 7:33 PM
To: lucene-net-dev@incubator.apache.org
Subject: quick question I hope

I couldn't find this searching Google, but I'm sure I should've been
able to.  Let's say I have a document called "Bradford Street Play Area"
(because I do!), and I want a search for Bradford Street Park to work.
First, in general, I do an "all terms" search.  That fails, so I do an
OR search.  Problem is a HUGE number of documents have Street in them.
I don't mind that they match so much as that I'd like to have the term
frequency in the corpus influence the scoring.  Is there a Scorer or
query-boosting trick to accomplish this?

 

Thanks

--Max


RE: quick question I hope

Posted by Digy <di...@gmail.com>.
You can write a custom Analyzer[+filter] that converts the input tokens to
their synonyms, 
like
Play area --> Park
Restaurant --> Café
Bar --> Café
Road --> Street etc.

(The code of StandardAnalyzer is a good sample).

DIGY

-----Original Message-----
From: Max Metral [mailto:max@artsalliancelabs.com] 
Sent: Sunday, April 20, 2008 5:33 AM
To: lucene-net-dev@incubator.apache.org
Subject: quick question I hope

I couldn't find this searching Google, but I'm sure I should've been
able to.  Let's say I have a document called "Bradford Street Play Area"
(because I do!), and I want a search for Bradford Street Park to work.
First, in general, I do an "all terms" search.  That fails, so I do an
OR search.  Problem is a HUGE number of documents have Street in them.
I don't mind that they match so much as that I'd like to have the term
frequency in the corpus influence the scoring.  Is there a Scorer or
query-boosting trick to accomplish this?

 

Thanks

--Max