You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karl Koch <Th...@gmx.net> on 2004/01/16 13:00:36 UTC

Term weighting and Term boost

Hello all,

I am new to the Lucene scene and have a few questions regarding the term
boost physolophy:

Is the term boost equal to a term weight? Example: If I boost a term with
0.2 does this mean the term has a weight of 0.2 then?

If this is not the case, how is the term weight of the query calculated
then? Formula? Are there parts in it which I cannot influence? Does this formular
depend on the type of Query or is it independent. Maybe somebody can provide
a small code example? 

Give the following code:

TermQuery termQuery1 = new TermQuery(new Term("contents", "house"));
TermQuery termQuery2 = new TermQuery(new Term("contents", "tree"));
termQuery2.setBoost( ? );
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(termQuery1, true, false);
finalQuery.add(termQuery2, true, false);

How can I realise that the term "tree" is double as important for search
than "house"?

Many questions I know but I am sure that the experts here can answer them
easily.

Cheers,
Karl

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Andrzej Bialecki <ab...@getopt.org>.
Karl Koch wrote:
> Hello Andrzej,
> 
> sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you
> get Threat Exceptions...
> 
> Anyway, solved now. Thank you,
> Karl

Thanks for the report - it's my bad, too, because the JNLP file 
mistakenly says <j2se version="1.2+" />. I'll correct it.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Karl Koch <Th...@gmx.net>.
Hello Andrzej,

sorry. I mistakenly run it under Java 1.2.2 which cannot work :-) Then you
get Threat Exceptions...

Anyway, solved now. Thank you,
Karl

> Karl Koch wrote:
> 
> > Hello and thank you for this link. I think this is a very usefull tool
> to
> > analyse Lucene internals.
> > 
> > 
> >>I realize this is not exactly the answer, but you may want to try one of
> 
> >>the new features of Luke (http://www.getopt.org/luke), namely the query 
> >>result explanation.
> > 
> > 
> > When I start it according to the description on your web site and select
> the
> > index directory I get an error message "current threat no owner"...
> > 
> 
> I.e. Java WebStart, or by getting the jars and starting it from 
> command-line?
> 
> > What does it mean and what do I wrong?
> 
> Beats me... I've never seen something like that. Could you please turn 
> on the Java console, and see what kind of exception and where is thrown?
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -------------------------------------------------
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -------------------------------------------------
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Bis 31.1.: TopMail + Digicam für nur 29 EUR http://www.gmx.net/topmail


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Andrzej Bialecki <ab...@getopt.org>.
Karl Koch wrote:

> Hello and thank you for this link. I think this is a very usefull tool to
> analyse Lucene internals.
> 
> 
>>I realize this is not exactly the answer, but you may want to try one of 
>>the new features of Luke (http://www.getopt.org/luke), namely the query 
>>result explanation.
> 
> 
> When I start it according to the description on your web site and select the
> index directory I get an error message "current threat no owner"...
> 

I.e. Java WebStart, or by getting the jars and starting it from 
command-line?

> What does it mean and what do I wrong?

Beats me... I've never seen something like that. Could you please turn 
on the Java console, and see what kind of exception and where is thrown?

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Karl Koch <Th...@gmx.net>.
Hello and thank you for this link. I think this is a very usefull tool to
analyse Lucene internals.

> I realize this is not exactly the answer, but you may want to try one of 
> the new features of Luke (http://www.getopt.org/luke), namely the query 
> result explanation.

When I start it according to the description on your web site and select the
index directory I get an error message "current threat no owner"...

What does it mean and what do I wrong?

Kind Regards,
Karl


> 
> Currently the best way to start Luke is to use Java WebStart. Then open 
> an already existing index, go to the Search tab, enter a query (use 
> "Update" button to see exactly what it is parsed into), press Search, 
> and then highlight one of the results and press "Explain".
> 
> It was revealing for me to see how weights, boosts, normalizations etc. 
> are applied "under the hood" so to speak, especially for  Fuzzy or 
> Phrase queries.
> 
> After experimenting a little, you may want to consult the classes in 
> org.apache.lucene.search (e.g. Scorer and Similarity) to see the gory 
> details.
> 
> -- 
> Best regards,
> Andrzej Bialecki
> 
> -------------------------------------------------
> Software Architect, System Integration Specialist
> CEN/ISSS EC Workshop, ECIMF project chair
> EU FP6 E-Commerce Expert/Evaluator
> -------------------------------------------------
> FreeBSD developer (http://www.freebsd.org)
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
+++ GMX - die erste Adresse für Mail, Message, More +++
Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Andrzej Bialecki <ab...@getopt.org>.
Karl Koch wrote:

>Hello all,
>
>I am new to the Lucene scene and have a few questions regarding the term
>boost physolophy:
>
>Is the term boost equal to a term weight? Example: If I boost a term with
>0.2 does this mean the term has a weight of 0.2 then?
>
>If this is not the case, how is the term weight of the query calculated
>then? Formula? Are there parts in it which I cannot influence? Does this formular
>depend on the type of Query or is it independent. Maybe somebody can provide
>a small code example? 
>  
>
I realize this is not exactly the answer, but you may want to try one of 
the new features of Luke (http://www.getopt.org/luke), namely the query 
result explanation.

Currently the best way to start Luke is to use Java WebStart. Then open 
an already existing index, go to the Search tab, enter a query (use 
"Update" button to see exactly what it is parsed into), press Search, 
and then highlight one of the results and press "Explain".

It was revealing for me to see how weights, boosts, normalizations etc. 
are applied "under the hood" so to speak, especially for  Fuzzy or 
Phrase queries.

After experimenting a little, you may want to consult the classes in 
org.apache.lucene.search (e.g. Scorer and Similarity) to see the gory 
details.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Term weighting and Term boost

Posted by Morus Walter <mo...@tanto-xipolis.de>.
Karl Koch writes:
> 
> If this is not the case, how is the term weight of the query calculated
> then? Formula? Are there parts in it which I cannot influence? Does this formular
> depend on the type of Query or is it independent. Maybe somebody can provide
> a small code example? 
> 
Scoring is explained in the FAQ:
31. How does Lucene assigns scores to hits ?

http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org