You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sumittyagi <pi...@gmail.com> on 2008/02/20 17:43:56 UTC

Re: Which file in the lucene package is used to manipulate results..

Hi Mark Harwood
I know it's being a long time, but till now i was busy in developing the
database to store the keyword, document and no. of clicks of the document
for the keyword and their respective mappings.
now i want my database to communicate with lucene api and i cannot figure it
out where to start from.
Please help me out, how can i make my database to work with lucene.
Thanks
Sumit

mark harwood wrote:
> 
> Thanks for the context - much more useful.
> The challenge here is similar to that posed by offering end-user tagging
> of content (see here
> http://www.mail-archive.com/java-user@lucene.apache.org/msg17580.html ).
> The main difference here being that words are added to docs implicitly by
> search click-throughs rather than any explicit tagging action.
> 
> In both cases the challenge is that the user data around documents is
> likely to be updated very often while the documents remain relatively
> static. 
> I suspect some additional things to think about are:
> 1) Cancelling out the "human laziness" bias that favours clicking results
> on page 1. Are clicks on page 2 worth more?
> 2) Spam clicks - detecting deliberate gaming of your re-ranking algorithm.
> 3) Lucene doc IDs are not stable - how will you associate query
> terms/click data with documents and join them at speed?
> 4) Are individual words or phrases the unit of boost? "Paris" means
> different things in "Paris Hilton" and "Paris, France".
> 
> A simple approach might be to re-index your content with all of the
> additional search terms from clicks added to the associated document in a
> "searchClicks" field - the more clicks, the more repetitions of the same
> search words in the document to help with tf (Term Frequency). This
> additional content would need to be capped, to avoid huge documents. This
> has the disadvantage of requiring a re-index though. 
> Another option to avoid reindexing everything is to wrap IndexReader (See
> FilterIndexReader) and implement TermEnum/TermDocs for a fake field called
> "searchClicks". The idea is Lucene looks after the usual, static document
> content while your implementation goes off to your more volatile storage
> (e.g. database/parallel index, custom file structure) to retrieve lists of
> doc ids, term frequencies etc. for this "searchClicks" field. All of the
> Lucene queries you might want to throw at this e.g. PhraseQueries can then
> test both the static Lucene fields and your new volatile "click" fields
> without being aware of this low-level trickery. 
> 
> I'm sure there will be other ways of doing this too but this seems like a
> conceptually clean way of modelling it - just seeing search terms as
> extensions to the document content.
> 
> Cheers
> Mark
> 
> 
> ----- Original Message ----
> From: sumittyagi <pi...@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Sunday, 23 December, 2007 5:30:55 AM
> Subject: Re: Which file in the lucene package is used to manipulate
> results..
> 
> 
> Actually what i have to do is...
> 1.) for every query(keyword), among the results obtained, the keyword
>  will
> be mapped with the page clicked, along with the no. of clicks for that
> keyword on that page
> 2.) next time for the same query(keyword), the mapped pages will be
>  ranked
> higher considering the no. of clicks too..
> 3.) for every new query these steps will be repeated...
> this was a very high level view , i have made algorithms for these
>  modules
> and trying to incorporate with lucene but dont know , on which files i
>  have
> to do edition to make it work...
> please help me regarding this, if you need some more explanation,
>  please let
> me know...
> thanks 
> Sumit Tyagi
> 
> 
> 
> 
> 
> Erick Erickson wrote:
>> 
>> You still haven't explained *why* you want to rerank results. What
>> is the use-case you're trying to implement? Quite often it's turned
>> out for me that when I let folks on the list know what the use
>> case I'm trying to support is, they come up with much more elegant
>> solutions than I was thinking about.
>> 
>> For instance, does the CustomScoreQuery class have any relevance
>> to your problem?
>> 
>> If you're thinking of modifying the core Lucene code for your
>> special purpose, I'd advise against it unless and until you'd
>  exhausted
>> all the other options. It's always a maintenance headache to do this.
>> 
>> Best
>> Erick
>> 
>> On Dec 21, 2007 10:09 AM, sumittyagi <pi...@gmail.com> wrote:
>> 
>>>
>>> actually i am writing a module to rerank the results, so i want to
>  edit
>>> the
>>> file which arrange the results and give them ranks,
>>> or is there any other way i can use my module to rerank the results
>>>
>>>
>>> markharw00d wrote:
>>> >
>>> > I think you need to describe your "factors" in more detail.
>  Exactly
>>> what
>>> > do you want to achieve for your users?
>>> > We could be talking about any number of Lucene functions here.
>>> >
>>> > ----- Original Message ----
>>> > From: sumittyagi <pi...@gmail.com>
>>> > To: java-user@lucene.apache.org
>>> > Sent: Friday, 21 December, 2007 4:51:09 AM
>>> > Subject: Which file in the lucene package is used to manipulate
>>> results..
>>> >
>>> >
>>> > hi, i am using lucene for the very first time and want to
>  manipulate
>>> >  the
>>> > results, by adding some more factors to it, which file should i
>  edit to
>>> > manipulate the search results....
>>> >
>>> > Thanks
>>> > Sumit Tyagi
>>> > --
>>> > View this message in context:
>>> >
>>> >
>>>
> 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
>>> > Sent from the Lucene - Java Users mailing list archive at
>  Nabble.com.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >       __________________________________________________________
>>> > Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>>> >
>>> >
>>> >
>  ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>>> >
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>>
> 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
>>> Sent from the Lucene - Java Users mailing list archive at
>  Nabble.com.
>>>
>>>
>>>
>  ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> 
>> 
> 
> -- 
> View this message in context:
> 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> 
>       __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15591566.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org