You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "T. H. Lin" <ea...@gmail.com> on 2008/11/18 18:37:10 UTC

can I set Boost to the term while indexing?

I would like to store a set of keywords in a single field of a document.

for example I have now three keywords: "One", "Two" and "Three"
and I am going to add them into a document.

At first, is this code correct?
/****************************************************************/
String[] keyword = new String[]{"One", "Two", "Three"};
for (int i = 0; i < keyword.length; i++) {
   Field f = new Field("field_name",
                            keyword[i],
                            Field.Store.NO,
                            Field.Index.UN_TOKENIZED,
                            TermVector.YES);
   doc.add(f);
}
indexWriter.addDocument(doc);
/***************************************************************/

when searching, We can set Boost for a query term.

the question is...
Can I set Boost for every keyword/term while indexing?

from the example above. I may set those keywords. i.e. "One", "Two" and
"Three", with different "Weight/Boost/Relavance..." while indexing.
and the same "term" may have different "Weight/Boost/Relavance..." in
different document.

can I do this?

thanks. :-)

Re: can I set Boost to the term while indexing?

Posted by "T. H. Lin" <ea...@gmail.com>.
hi,

thanks for your suggestions.

actually, my original idea is that the same term may have different "weight"
in different doc.
of course the TF/IDF has already embedded some kind of term relavance to a
doc.
But I would like to explicitly set different "weight" to the same term in
diferent docs.

For instance,

the query is "T1 T2"

Both Doc1 and Doc2 have T1 and T2. They may also have exactly the same term
frequency!
But I want to bring some "semantic enhancement".
I want to let T1 has higher weight in Doc1 than in Doc2, and let T2 has
higher weight in Doc2 than Doc1.

I think, setBoost on a whole doc, on a field or on the term in query may not
achieve this.

Maybe "payload" is a solution, I will take a look!


Lin

2008/11/20 Grant Ingersoll <gs...@apache.org>

> You can do this.  It's called adding a Payload.  You can add payloads
> during Analysis (Token.setPayload()) which means your code below will need
> to be changed below such that you use the Field constructor that takes in a
> TokenStream and wraps your input tokens.  This TokenStream will also need to
> add you payloads.
>
> then, during search, you can use a BoostingTermQuery to have the payload
> values factor in during scoring.
>
> -Grant
>
>
>

2008/11/20 Anshum <an...@gmail.com>

> Hi Lin,
>
> I guess you are looking at document boosting, if 'm right, you could
> conditionally do this:
> doc.setBoost(boostFactor);
> where boostFactor is a float > 1.0 that boosts the doc with the boost
> factor.
> Also, you could also use
> field.setBoost (boostValue) to boost a particular field in a document by a
> particular boostfactor.
> By default all boosts are set to 1.0 in lucene. The field.setBoost would
> multiply the score of all matching docs by this factor while calculating
> relevance.
>
> Hope this solves your issue.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>

Re: can I set Boost to the term while indexing?

Posted by Anshum <an...@gmail.com>.
Hi Lin,

I guess you are looking at document boosting, if 'm right, you could
conditionally do this:
doc.setBoost(boostFactor);
where boostFactor is a float > 1.0 that boosts the doc with the boost
factor.
Also, you could also use
field.setBoost (boostValue) to boost a particular field in a document by a
particular boostfactor.
By default all boosts are set to 1.0 in lucene. The field.setBoost would
multiply the score of all matching docs by this factor while calculating
relevance.

Hope this solves your issue.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Tue, Nov 18, 2008 at 11:07 PM, T. H. Lin <ea...@gmail.com> wrote:

> I would like to store a set of keywords in a single field of a document.
>
> for example I have now three keywords: "One", "Two" and "Three"
> and I am going to add them into a document.
>
> At first, is this code correct?
> /****************************************************************/
> String[] keyword = new String[]{"One", "Two", "Three"};
> for (int i = 0; i < keyword.length; i++) {
>   Field f = new Field("field_name",
>                            keyword[i],
>                            Field.Store.NO,
>                            Field.Index.UN_TOKENIZED,
>                            TermVector.YES);
>   doc.add(f);
> }
> indexWriter.addDocument(doc);
> /***************************************************************/
>
> when searching, We can set Boost for a query term.
>
> the question is...
> Can I set Boost for every keyword/term while indexing?
>
> from the example above. I may set those keywords. i.e. "One", "Two" and
> "Three", with different "Weight/Boost/Relavance..." while indexing.
> and the same "term" may have different "Weight/Boost/Relavance..." in
> different document.
>
> can I do this?
>
> thanks. :-)
>

Re: can I set Boost to the term while indexing?

Posted by Grant Ingersoll <gs...@apache.org>.
You can do this.  It's called adding a Payload.  You can add payloads  
during Analysis (Token.setPayload()) which means your code below will  
need to be changed below such that you use the Field constructor that  
takes in a TokenStream and wraps your input tokens.  This TokenStream  
will also need to add you payloads.

then, during search, you can use a BoostingTermQuery to have the  
payload values factor in during scoring.

-Grant

On Nov 18, 2008, at 12:37 PM, T. H. Lin wrote:

> I would like to store a set of keywords in a single field of a  
> document.
>
> for example I have now three keywords: "One", "Two" and "Three"
> and I am going to add them into a document.
>
> At first, is this code correct?
> /****************************************************************/
> String[] keyword = new String[]{"One", "Two", "Three"};
> for (int i = 0; i < keyword.length; i++) {
>   Field f = new Field("field_name",
>                            keyword[i],
>                            Field.Store.NO,
>                            Field.Index.UN_TOKENIZED,
>                            TermVector.YES);
>   doc.add(f);
> }
> indexWriter.addDocument(doc);
> /***************************************************************/
>
> when searching, We can set Boost for a query term.
>
> the question is...
> Can I set Boost for every keyword/term while indexing?
>
> from the example above. I may set those keywords. i.e. "One", "Two"  
> and
> "Three", with different "Weight/Boost/Relavance..." while indexing.
> and the same "term" may have different "Weight/Boost/Relavance..." in
> different document.
>
> can I do this?
>
> thanks. :-)

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org