You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by B0DYLANG <ra...@yahoo.com> on 2009/06/26 10:59:09 UTC

special Type of indexing

dears, 
my problem is that i want to apply a wieght for each word i add to the
lucene document, so that when i want to index a sentence like this "Hello
how you doing" i want to add Hello with a boost equals to 0.75 and how with
boost 0.50 and doing with boost 0.3 is there is a mean of doing that ?
-- 
View this message in context: http://www.nabble.com/special-Type-of-indexing-tp24217071p24217071.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: special Type of indexing

Posted by Simon Willnauer <si...@googlemail.com>.

Hey there,
currently lucene offers several possibilities to do what you want.
1. Use payloads
you can encode arbitrary values per term you index and affect scoring
by overriding Similarity#scorePayloads.
2. use IndexReader#termPosition(term) inside a query
you can access the actual term position during scoring to change a
score of a document according to the terms position inside a
document.The returned TermDocuments provides you <document, frequency,
<position>* > for each document a term is contained in. To use this
for scoring you must write your own query or rather subclass an
existing one.

hope that helps.

simon

On Fri, Jun 26, 2009 at 4:14 PM, B0DYLANG<ra...@yahoo.com> wrote:
>
>
>
>
>
> Thanks for your response, i want to explaing more so that you can help me,
> the code am writing is as follows
> Field termField = new Field("terms",terms, Field.Store.YES,
> Field.Index.TOKENIZED);
> the terms i indexed consists of a concatenated string looks like
> "computer,230#pc,333#lucene,201#"
> and i have a special anlyzer to toknize it, of course the numbers appers
> after words are the word distance and it leads us to the more relevant
> word(not document) as the distance decreases, so when i serarch with a term
> lucene once and then search with computer at the second time this doceuemt
> is returned in both queries but with high rank on the first time as the
> distance is less than the second distance hope every body get it.
>
>
> Anshum-2 wrote:
>>
>> Isn't it better for you to for a query at search time with query term
>> boosts? that way you would have a smaller index (Boost info not getting
>> stored for each doc-field). Also, you would have the flexibility of
>> changing
>> your boost without having to reindex the entire data.
>>
>> --
>> Anshum Gupta
>> Naukri Labs!
>> http://ai-cafe.blogspot.com
>>
>> The facts expressed here belong to everybody, the opinions to me. The
>> distinction is yours to draw............
>>
>>
>> On Fri, Jun 26, 2009 at 2:29 PM, B0DYLANG <ra...@yahoo.com> wrote:
>>
>>>
>>> dears,
>>> my problem is that i want to apply a wieght for each word i add to the
>>> lucene document, so that when i want to index a sentence like this "Hello
>>> how you doing" i want to add Hello with a boost equals to 0.75 and how
>>> with
>>> boost 0.50 and doing with boost 0.3 is there is a mean of doing that ?
>>> --
>>> View this message in context:
>>> http://www.nabble.com/special-Type-of-indexing-tp24217071p24217071.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/special-Type-of-indexing-tp24217071p24221074.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: special Type of indexing

Posted by B0DYLANG <ra...@yahoo.com>.

Thanks for your response, i want to explaing more so that you can help me,
the code am writing is as follows
Field termField = new Field("terms",terms, Field.Store.YES,
Field.Index.TOKENIZED);
the terms i indexed consists of a concatenated string looks like
"computer,230#pc,333#lucene,201#"
and i have a special anlyzer to toknize it, of course the numbers appers
after words are the word distance and it leads us to the more relevant
word(not document) as the distance decreases, so when i serarch with a term
lucene once and then search with computer at the second time this doceuemt
is returned in both queries but with high rank on the first time as the
distance is less than the second distance hope every body get it.

Anshum-2 wrote:
> 
> Isn't it better for you to for a query at search time with query term
> boosts? that way you would have a smaller index (Boost info not getting
> stored for each doc-field). Also, you would have the flexibility of
> changing
> your boost without having to reindex the entire data.
> 
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
> 
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
> 
> 
> On Fri, Jun 26, 2009 at 2:29 PM, B0DYLANG <ra...@yahoo.com> wrote:
> 
>>
>> dears,
>> my problem is that i want to apply a wieght for each word i add to the
>> lucene document, so that when i want to index a sentence like this "Hello
>> how you doing" i want to add Hello with a boost equals to 0.75 and how
>> with
>> boost 0.50 and doing with boost 0.3 is there is a mean of doing that ?
>> --
>> View this message in context:
>> http://www.nabble.com/special-Type-of-indexing-tp24217071p24217071.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/special-Type-of-indexing-tp24217071p24221074.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: special Type of indexing

Posted by Anshum <an...@gmail.com>.

Isn't it better for you to for a query at search time with query term
boosts? that way you would have a smaller index (Boost info not getting
stored for each doc-field). Also, you would have the flexibility of changing
your boost without having to reindex the entire data.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............

On Fri, Jun 26, 2009 at 2:29 PM, B0DYLANG <ra...@yahoo.com> wrote:

>
> dears,
> my problem is that i want to apply a wieght for each word i add to the
> lucene document, so that when i want to index a sentence like this "Hello
> how you doing" i want to add Hello with a boost equals to 0.75 and how with
> boost 0.50 and doing with boost 0.3 is there is a mean of doing that ?
> --
> View this message in context:
> http://www.nabble.com/special-Type-of-indexing-tp24217071p24217071.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>