You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by parnab kumar <pa...@gmail.com> on 2013/04/23 17:51:32 UTC

Solr index searcher to lucene index searcher

Hi ,

            Can anyone please point out from where a solr search originates
and how it passes to the lucene index searcher and back to solr . I
actually what to know which class in solr directly calls the lucene Index
Searcher .

Thanks.
Pom

Re: Solr index searcher to lucene index searcher

Posted by Otis Gospodnetic <ot...@gmail.com>.
Perhaps http://search-lucene.com/?q=custom+hits+collector ?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Tue, Apr 23, 2013 at 12:32 PM, parnab kumar <pa...@gmail.com> wrote:
> Hi  ,
>
>         Timothy,Thanks for pointing out . But i have a specific requirement
> . For any query it passes through the search handler and solr finally
> directs it to lucene Index Searcher. As results are matched and collected
> as TopDocs in lucene i want to inspect the top K Docs , reorder them by
> some logic and pass the final TopDocs to solr which solr may send as a
> response .
>
> I need to know the point where actually these interaction between solr and
> lucene takes place .
> Can anyone please help where to look into for this purpose .
>
> Thanks..
> Pom
>
> On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter <th...@gmail.com>wrote:
>
>>    org.apache.solr.search.SolrIndexSearcher
>>
>> On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar <pa...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> >             Can anyone please point out from where a solr search
>> originates
>> > and how it passes to the lucene index searcher and back to solr . I
>> > actually what to know which class in solr directly calls the lucene Index
>> > Searcher .
>> >
>> > Thanks.
>> > Pom
>>

Re: Solr index searcher to lucene index searcher

Posted by parnab kumar <pa...@gmail.com>.
Hi  ,

    Thanks Chris . For every document that matches the query i want to able
to compute the following set of features for a query document pair

    LuceneScore ( The vector space score that lucene gives to each doc)
    LinkScore      ( computed from nutch )
    OpicScore     ( computed from nutch)
   co-rd in title,content,anchor,url
   wt of Entity in title,content,anchor,url
   length of title,content,anchor,url
   sum-of-tf in title,content,anchor,url
   sum-of-norm-tf in title,content,anchor,url
   min-of-tf in title,content,anchor,url
   max-of-tf in title,content,anchor,url
   variance-of-tf in title,content,anchor,url
   sum-of-tf-idf in title,content,anchor,url
   site-reputation-score
   enity-support-score
   domain score
  url-click-count
   query-url-click-count
  num-of-slashes-in-url

Based on these above features i want to build a machine learned model that
will learn to rank/score the documents .i am trying to understand how to
compute the features efficiently on the fly. Looking into the index and
computing these features seems to be very slow . So for the time being i
wanted to implement the same by looking into the TopK documents.Few of
these features has to be computed on the fly and some of them are computed
while indexing and stored in the index . I need to be able to look into all
features to score/rank the final set of documents.

Thanks ,
Pom..

On Sat, Apr 27, 2013 at 5:43 AM, Chris Hostetter
<ho...@fucit.org>wrote:

> : used to call the lucene IndexSearcher . As the documents are collected in
> : TopDocs in Lucene , before that is passed back to Nutch , i used to look
> : into the top K matching documents , consult some external repository
> : and further score the Top K documents and reorder them in the TopDocs
> array
> : . These reordered  TopDocs is passed to Nutch .  All these reordering
> code
> : was implemented by Extending the lucene IndexSearcher class .
>
> 1) that's basically the same info you provided before -- it still doesn't
> really tell us anything about what your current logic does with the top K
> documents and how/why/when you decide to reorder them or by how much --
> details that are kind of important in being able to provide you with any
> meaningful advice on how to achieve your goal using hte plugin hooks
> available in Solr.
>
> 2) if you only care about re-ordering the Top K documents using some
> secret sauce, then i would suggest you just set rows=K and let Solr do
> it's thing, the post process the reuslts -- either in your client, or in a
> SearchComponent that modifies the SolrDocumentList produces by
> QueryComponent.
>
> : > can you elaborate on what exactly your "some logic" involves?
>         ...
> : > https://people.apache.org/~hossman/#xyproblem
> : > XY Problem
> : >
> : > Your question appears to be an "XY Problem" ... that is: you are
> dealing
> : > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> : > without giving more details about the "X" so that we can understand the
> : > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> : > See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>

Re: Solr index searcher to lucene index searcher

Posted by Chris Hostetter <ho...@fucit.org>.
: used to call the lucene IndexSearcher . As the documents are collected in
: TopDocs in Lucene , before that is passed back to Nutch , i used to look
: into the top K matching documents , consult some external repository
: and further score the Top K documents and reorder them in the TopDocs array
: . These reordered  TopDocs is passed to Nutch .  All these reordering code
: was implemented by Extending the lucene IndexSearcher class .

1) that's basically the same info you provided before -- it still doesn't 
really tell us anything about what your current logic does with the top K 
documents and how/why/when you decide to reorder them or by how much -- 
details that are kind of important in being able to provide you with any 
meaningful advice on how to achieve your goal using hte plugin hooks 
available in Solr.

2) if you only care about re-ordering the Top K documents using some 
secret sauce, then i would suggest you just set rows=K and let Solr do 
it's thing, the post process the reuslts -- either in your client, or in a 
SearchComponent that modifies the SolrDocumentList produces by 
QueryComponent.

: > can you elaborate on what exactly your "some logic" involves?
	...
: > https://people.apache.org/~hossman/#xyproblem
: > XY Problem
: >
: > Your question appears to be an "XY Problem" ... that is: you are dealing
: > with "X", you are assuming "Y" will help you, and you are asking about "Y"
: > without giving more details about the "X" so that we can understand the
: > full issue.  Perhaps the best solution doesn't involve "Y" at all?
: > See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Solr index searcher to lucene index searcher

Posted by parnab kumar <pa...@gmail.com>.
Hi ,

        Thanks Chris. I had been using Nutch 1.1 . The Nutch IndexSearcher
used to call the lucene IndexSearcher . As the documents are collected in
TopDocs in Lucene , before that is passed back to Nutch , i used to look
into the top K matching documents , consult some external repository
and further score the Top K documents and reorder them in the TopDocs array
. These reordered  TopDocs is passed to Nutch .  All these reordering code
was implemented by Extending the lucene IndexSearcher class .
                The lucene core that comes with solr is a bit different
from the one that used to come with Nutch 1.1 . As a result implementing
the same is not straight forward .Moreover , i cannot figure out at which
point exactly the SolrIndexSearcher makes a direct Interaction with
LuceneIndexSearcher .
               With FunctionQuery i loose the flexibility of looking into
the documents before passing to the final result set.

  Now i am using solr 3.4 and i would like to implement the same between
solr and lucene.

Thanks ,
Pom

On Wed, Apr 24, 2013 at 3:05 AM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : > > . For any query it passes through the search handler and solr finally
> : > > directs it to lucene Index Searcher. As results are matched and
> collected
> : > > as TopDocs in lucene i want to inspect the top K Docs , reorder them
> by
> : > > some logic and pass the final TopDocs to solr which solr may send as
> a
> : > > response .
>
> can you elaborate on what exactly your "some logic" involves?
>
> instead of writing a custom collector, using a function query may be the
> best solution.
>
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> -Hoss
>

Re: Solr index searcher to lucene index searcher

Posted by Chris Hostetter <ho...@fucit.org>.
: > > . For any query it passes through the search handler and solr finally
: > > directs it to lucene Index Searcher. As results are matched and collected
: > > as TopDocs in lucene i want to inspect the top K Docs , reorder them by
: > > some logic and pass the final TopDocs to solr which solr may send as a
: > > response .

can you elaborate on what exactly your "some logic" involves?

instead of writing a custom collector, using a function query may be the 
best solution.

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Solr index searcher to lucene index searcher

Posted by Joel Bernstein <jo...@gmail.com>.
As Timothy mentioned, Solr has the PostFIlter mechanism, but it's not
really suited for ranking/sorting changes. To effect the ranking you'd need
to work with the TopScoreDocCollector which Solr does not give you access
to. If you're doing distributed search you'd need to account for the
ranking algorithm at the aggregation step as well.

There is a pluggable collectors jira that builds under Solr 4.1 (SOLR-4465)
but it is a proof of concept at this time. You may want to chime in on this
ticket if you find it useful.


On Tue, Apr 23, 2013 at 1:21 PM, Timothy Potter <th...@gmail.com>wrote:

> Take a look at Solr's DelegatingCollector - this article might be of
> interest too:
> http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html
>
> On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar <pa...@gmail.com>
> wrote:
> > Hi  ,
> >
> >         Timothy,Thanks for pointing out . But i have a specific
> requirement
> > . For any query it passes through the search handler and solr finally
> > directs it to lucene Index Searcher. As results are matched and collected
> > as TopDocs in lucene i want to inspect the top K Docs , reorder them by
> > some logic and pass the final TopDocs to solr which solr may send as a
> > response .
> >
> > I need to know the point where actually these interaction between solr
> and
> > lucene takes place .
> > Can anyone please help where to look into for this purpose .
> >
> > Thanks..
> > Pom
> >
> > On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter <thelabdude@gmail.com
> >wrote:
> >
> >>    org.apache.solr.search.SolrIndexSearcher
> >>
> >> On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar <pa...@gmail.com>
> >> wrote:
> >> > Hi ,
> >> >
> >> >             Can anyone please point out from where a solr search
> >> originates
> >> > and how it passes to the lucene index searcher and back to solr . I
> >> > actually what to know which class in solr directly calls the lucene
> Index
> >> > Searcher .
> >> >
> >> > Thanks.
> >> > Pom
> >>
>



-- 
Joel Bernstein
Professional Services LucidWorks

Re: Solr index searcher to lucene index searcher

Posted by Timothy Potter <th...@gmail.com>.
Take a look at Solr's DelegatingCollector - this article might be of
interest too: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html

On Tue, Apr 23, 2013 at 10:32 AM, parnab kumar <pa...@gmail.com> wrote:
> Hi  ,
>
>         Timothy,Thanks for pointing out . But i have a specific requirement
> . For any query it passes through the search handler and solr finally
> directs it to lucene Index Searcher. As results are matched and collected
> as TopDocs in lucene i want to inspect the top K Docs , reorder them by
> some logic and pass the final TopDocs to solr which solr may send as a
> response .
>
> I need to know the point where actually these interaction between solr and
> lucene takes place .
> Can anyone please help where to look into for this purpose .
>
> Thanks..
> Pom
>
> On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter <th...@gmail.com>wrote:
>
>>    org.apache.solr.search.SolrIndexSearcher
>>
>> On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar <pa...@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> >             Can anyone please point out from where a solr search
>> originates
>> > and how it passes to the lucene index searcher and back to solr . I
>> > actually what to know which class in solr directly calls the lucene Index
>> > Searcher .
>> >
>> > Thanks.
>> > Pom
>>

Re: Solr index searcher to lucene index searcher

Posted by parnab kumar <pa...@gmail.com>.
Hi  ,

        Timothy,Thanks for pointing out . But i have a specific requirement
. For any query it passes through the search handler and solr finally
directs it to lucene Index Searcher. As results are matched and collected
as TopDocs in lucene i want to inspect the top K Docs , reorder them by
some logic and pass the final TopDocs to solr which solr may send as a
response .

I need to know the point where actually these interaction between solr and
lucene takes place .
Can anyone please help where to look into for this purpose .

Thanks..
Pom

On Tue, Apr 23, 2013 at 9:25 PM, Timothy Potter <th...@gmail.com>wrote:

>    org.apache.solr.search.SolrIndexSearcher
>
> On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar <pa...@gmail.com>
> wrote:
> > Hi ,
> >
> >             Can anyone please point out from where a solr search
> originates
> > and how it passes to the lucene index searcher and back to solr . I
> > actually what to know which class in solr directly calls the lucene Index
> > Searcher .
> >
> > Thanks.
> > Pom
>

Re: Solr index searcher to lucene index searcher

Posted by Timothy Potter <th...@gmail.com>.
   org.apache.solr.search.SolrIndexSearcher

On Tue, Apr 23, 2013 at 9:51 AM, parnab kumar <pa...@gmail.com> wrote:
> Hi ,
>
>             Can anyone please point out from where a solr search originates
> and how it passes to the lucene index searcher and back to solr . I
> actually what to know which class in solr directly calls the lucene Index
> Searcher .
>
> Thanks.
> Pom