You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peyman Faratin <pe...@robustlinks.com> on 2013/11/19 03:01:17 UTC

deleting a doc inside a custom UpdateRequestProcessor

Hi

I am building a custom UpdateRequestProcessor to intercept any doc heading to the index. Basically what I want to do is to check if the current index has a doc with the same title (i am using IDs as the uniques so I can't use that, and besides the logic of checking is a little more complicated). If the incoming doc has a duplicate and some other conditions hold then one of 2 things can happen:

	1- we don't index the incoming document
	2- we index the incoming and delete the duplicate currently in the index

I think (1) can be done by simple not passing the call up the chain (not calling super.processAdd(cmd)). However, I don't know how to implement the second condition, deleting the duplicate document, inside a custom UpdateRequestProcessor. This thread is the closest to my goal 
http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html

however i am not clear how to proceed. Code snippets below.

thank you in advance for your help

	class isDuplicate extends UpdateRequestProcessor 
	{
		public isDuplicate( UpdateRequestProcessor next) { 
		  super( next ); 
		} 
		@Override 
		public void processAdd(AddUpdateCommand cmd) throws IOException { 	
			try 
			{
				boolean indexIncomingDoc = checkIfIsDuplicate(cmd);				
				if(indexIncomingDoc)
					super.processAdd(cmd); 				
			} catch (SolrServerException e) {e.printStackTrace();} 
			catch (ParseException e) {e.printStackTrace();}
		} 
		public boolean checkIfIsDuplicate(AddUpdateCommand cmd) ...{
			
			SolrInputDocument incomingDoc = cmd.getSolrInputDocument();
			if(incomingDoc == null) return false;
			String title = (String) incomingDoc.getFieldValue( "title" ); 			 
			SolrIndexSearcher searcher = cmd.getReq().getSearcher();			
			boolean addIncomingDoc = true;
			Integer idOfDuplicate = searcher.getFirstMatch(new Term("title",title));			
			if(idOfDuplicate != -1) 
			{
				addIncomingDoc = compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
			}
			return addIncomingDoc;				
		}
		private boolean compareDocs(.....){		
			....
			if( condition 1 ) 
			{
				--> DELETE DUPLICATE DOC in INDEX <--
				addIncomingDoc = true;
			}
			....
			return addIncomingDoc;
		}

Re: deleting a doc inside a custom UpdateRequestProcessor

Posted by Liu Bo <di...@gmail.com>.
hi,

    you can try this in your checkIfIsDuplicate(), build a query based on
your title, and set it to a delete command:

                //build your query accordingly, this depends on how your
tittle is indexed, eg analyzed or not. be careful with it and do some test.
      DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processDelete(cmd);

    Processors are normally chained, you should make sure that your
processor comes the first so that it can control what's coming next based
on your logic.

    you can also try to write your own updaterequesthandler instead of a
customized processor.

    you can do a set of operations in your function
        @Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {}

    get your processor chain in this function and passes a delete command
to it such as :

SolrParams params = req.getParams();
checkParameter(params);
UpdateRequestProcessorChain processorChain =
req.getCore().getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));
UpdateRequestProcessor processor = processorChain.createProcessor(req,
rsp);

      DeleteUpdateCommand cmd = new DeleteUpdateCommand(req);
cmd.commitWithin = commitWithin;
cmd.setQuery(query);
processor.processDelete(cmd);

this is what I am doing when customizing a update request handler, I try
not to touch the original process chain but tell solr what to do by
commands.


On 19 November 2013 10:01, Peyman Faratin <pe...@robustlinks.com> wrote:

> Hi
>
> I am building a custom UpdateRequestProcessor to intercept any doc heading
> to the index. Basically what I want to do is to check if the current index
> has a doc with the same title (i am using IDs as the uniques so I can't use
> that, and besides the logic of checking is a little more complicated). If
> the incoming doc has a duplicate and some other conditions hold then one of
> 2 things can happen:
>
>         1- we don't index the incoming document
>         2- we index the incoming and delete the duplicate currently in the
> index
>
> I think (1) can be done by simple not passing the call up the chain (not
> calling super.processAdd(cmd)). However, I don't know how to implement the
> second condition, deleting the duplicate document, inside a custom
> UpdateRequestProcessor. This thread is the closest to my goal
>
> http://lucene.472066.n3.nabble.com/SOLR-4-3-0-Migration-How-to-use-DeleteUpdateCommand-td4062454.html
>
> however i am not clear how to proceed. Code snippets below.
>
> thank you in advance for your help
>
>         class isDuplicate extends UpdateRequestProcessor
>         {
>                 public isDuplicate( UpdateRequestProcessor next) {
>                   super( next );
>                 }
>                 @Override
>                 public void processAdd(AddUpdateCommand cmd) throws
> IOException {
>                         try
>                         {
>                                 boolean indexIncomingDoc =
> checkIfIsDuplicate(cmd);
>                                 if(indexIncomingDoc)
>                                         super.processAdd(cmd);
>                         } catch (SolrServerException e)
> {e.printStackTrace();}
>                         catch (ParseException e) {e.printStackTrace();}
>                 }
>                 public boolean checkIfIsDuplicate(AddUpdateCommand cmd)
> ...{
>
>                         SolrInputDocument incomingDoc =
> cmd.getSolrInputDocument();
>                         if(incomingDoc == null) return false;
>                         String title = (String) incomingDoc.getFieldValue(
> "title" );
>                         SolrIndexSearcher searcher =
> cmd.getReq().getSearcher();
>                         boolean addIncomingDoc = true;
>                         Integer idOfDuplicate = searcher.getFirstMatch(new
> Term("title",title));
>                         if(idOfDuplicate != -1)
>                         {
>                                 addIncomingDoc =
> compareDocs(searcher,incomingDoc,idOfDuplicate,title,addIncomingDoc);
>                         }
>                         return addIncomingDoc;
>                 }
>                 private boolean compareDocs(.....){
>                         ....
>                         if( condition 1 )
>                         {
>                                 --> DELETE DUPLICATE DOC in INDEX <--
>                                 addIncomingDoc = true;
>                         }
>                         ....
>                         return addIncomingDoc;
>                 }




-- 
All the best

Liu Bo