You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Ali Nazemian <al...@gmail.com> on 2015/03/23 20:07:23 UTC

Custom updateProcessor for purpose of extracting interesting terms at index time

Dear All,
Hi,
I wrote a customize updateProcessorFactory for the purpose of extracting
interesting terms at index time an putting them in a new field. Since I use
MLT interesting terms for this purpose, I have to make sure that the added
document exists in index or not. If it was indexed before there is no
problem for MLT interesting terms. But if it is a new document I have to
index this document before calling MLT interesting terms.
Here is a small part of my code that ran me into problem:

if (!isDocIndexed(cmd.getIndexedId())) {
          // Do not extract keyword since it is not indexed yet
          super.processAdd(cmd);
          processAdd(cmd);
          return;
        }

My problem is the core.getRealtimeSearcher() method does not change after
calling super.processAdd(cmd). Therefore such part of code causes infinite
loop! Would you please guide me how can I make sure that my custom
updateProcessorFactory run at the end of indexing process. (in order to
using MLT interesting terms without having concern about the existence of
document in index.

Best regards.
-- 
A.Nazemian

Re: Custom updateProcessor for purpose of extracting interesting terms at index time

Posted by Ali Nazemian <al...@gmail.com>.

Dear Alex,
Hi,
I am not sure about what would be the best way of doing such process, Would
you please provide me some detail example about doing that on commit? like
spell checker that you mentioned?
Is is possible to do that using a custom analyzer on a copy field? In order
to use MLT interesting terms I should have access to SolrIndexSearcher. I
am not sure that can I have access to SolrIndexSearcher in analyzer or not?

Best regards.


On Mon, Mar 23, 2015 at 11:51 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> So, for a new document. You want to index the document, then read it,
> then add keywords and index again? This does sound like an infinite
> loop. Not sure there is a solution for this approach.
>
> You sure you cannot do it like spell checker does with compiling a
> side-car index on commit? Or even with some sort of periodic trigger
> and update command issues on previously indexed but not post-processed
> documents.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 23 March 2015 at 15:07, Ali Nazemian <al...@gmail.com> wrote:
> > Dear All,
> > Hi,
> > I wrote a customize updateProcessorFactory for the purpose of extracting
> > interesting terms at index time an putting them in a new field. Since I
> use
> > MLT interesting terms for this purpose, I have to make sure that the
> added
> > document exists in index or not. If it was indexed before there is no
> > problem for MLT interesting terms. But if it is a new document I have to
> > index this document before calling MLT interesting terms.
> > Here is a small part of my code that ran me into problem:
> >
> > if (!isDocIndexed(cmd.getIndexedId())) {
> >           // Do not extract keyword since it is not indexed yet
> >           super.processAdd(cmd);
> >           processAdd(cmd);
> >           return;
> >         }
> >
> > My problem is the core.getRealtimeSearcher() method does not change after
> > calling super.processAdd(cmd). Therefore such part of code causes
> infinite
> > loop! Would you please guide me how can I make sure that my custom
> > updateProcessorFactory run at the end of indexing process. (in order to
> > using MLT interesting terms without having concern about the existence of
> > document in index.
> >
> > Best regards.
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Re: Custom updateProcessor for purpose of extracting interesting terms at index time

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

So, for a new document. You want to index the document, then read it,
then add keywords and index again? This does sound like an infinite
loop. Not sure there is a solution for this approach.

You sure you cannot do it like spell checker does with compiling a
side-car index on commit? Or even with some sort of periodic trigger
and update command issues on previously indexed but not post-processed
documents.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 23 March 2015 at 15:07, Ali Nazemian <al...@gmail.com> wrote:
> Dear All,
> Hi,
> I wrote a customize updateProcessorFactory for the purpose of extracting
> interesting terms at index time an putting them in a new field. Since I use
> MLT interesting terms for this purpose, I have to make sure that the added
> document exists in index or not. If it was indexed before there is no
> problem for MLT interesting terms. But if it is a new document I have to
> index this document before calling MLT interesting terms.
> Here is a small part of my code that ran me into problem:
>
> if (!isDocIndexed(cmd.getIndexedId())) {
>           // Do not extract keyword since it is not indexed yet
>           super.processAdd(cmd);
>           processAdd(cmd);
>           return;
>         }
>
> My problem is the core.getRealtimeSearcher() method does not change after
> calling super.processAdd(cmd). Therefore such part of code causes infinite
> loop! Would you please guide me how can I make sure that my custom
> updateProcessorFactory run at the end of indexing process. (in order to
> using MLT interesting terms without having concern about the existence of
> document in index.
>
> Best regards.
> --
> A.Nazemian