You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2009/02/19 06:45:02 UTC

[jira] Resolved: (SOLR-1004) Optimizing the abort command in delta import

     [ https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-1004.
-----------------------------------------

    Resolution: Fixed

Committed revision 745742.

Thanks Marc!

> Optimizing the abort command in delta import
> --------------------------------------------
>
>                 Key: SOLR-1004
>                 URL: https://issues.apache.org/jira/browse/SOLR-1004
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>         Environment: Java - Lucene - Solr - DataImportHandler
>            Reporter: Marc Sturlese
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1004.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
> The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
> Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
> I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
> In the case of modifing documents, the code in DocBuilder.java would look like:
>     while (pkIter.hasNext()) {
>       Map<String, Object> map = pkIter.next();
>       vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
>       buildDocument(vri, null, map, root, true, null);
>       pkIter.remove();
>       //check if abortion
>       if (stop.get())
>       {
>             allPks = null ;
>             pkIter = null ;
>             return;
>         }     
>     }
> In the case of document deletion (deleteAll function in DocBuilder): Just       if (stop.get()){ break ; }     at the end of every loop and call this just after deleteAll is called (in doDelta)
>       if (stop.get())
>       {
>             allPks = null;
>             deletedKeys = null;
>             return;
>        }
> Finally in collect delta:
>       while (true) {
>          //check for abortion
>          if (stop.get()){ return myModifiedPks; }
>          Map<String, Object> row = entityProcessor.nextModifiedRowKey();
>          if (row == null)
>            break;
>            ...
> And the same for delete-query collection and parent-delta-query collection
> I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Resolved: (SOLR-1004) Optimizing the abort command in delta import

Posted by Marc Sturlese <ma...@gmail.com>.
Sorry, couldn't read yesterday... but that's exact what I was suggesting,
thank you very much!

JIRA jira@apache.org wrote:
> 
> 
>      [
> https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> 
> Shalin Shekhar Mangar resolved SOLR-1004.
> -----------------------------------------
> 
>     Resolution: Fixed
> 
> Committed revision 745742.
> 
> Thanks Marc!
> 
>> Optimizing the abort command in delta import
>> --------------------------------------------
>>
>>                 Key: SOLR-1004
>>                 URL: https://issues.apache.org/jira/browse/SOLR-1004
>>             Project: Solr
>>          Issue Type: Improvement
>>          Components: contrib - DataImportHandler
>>    Affects Versions: 1.3
>>         Environment: Java - Lucene - Solr - DataImportHandler
>>            Reporter: Marc Sturlese
>>            Assignee: Shalin Shekhar Mangar
>>            Priority: Minor
>>             Fix For: 1.4
>>
>>         Attachments: SOLR-1004.patch
>>
>>   Original Estimate: 0.5h
>>  Remaining Estimate: 0.5h
>>
>> I have seen that when abort command is called in a deltaImport, in
>> DocBuilder.java, at doDelta functions it's just checked for abortion at
>> the begining of collectDelta, after that function and at the end of
>> collectDelta.
>> The problem I have found is that if there is a big number of documents to
>> modify and abort is called in the middle of delta collection, it will not
>> take effect until all data is collected.
>> Same happens when we start deleteting or updating documents. In updating
>> case, there is an abortion check inside buildDocument but, as it is
>> called inside a "while" for all docs to update, it will keep going throw
>> all docs of the bucle and skipping them.
>> I propose to do an abortion check inside every loop of data collection
>> and after calling build document in doDelta function.
>> In the case of modifing documents, the code in DocBuilder.java would look
>> like:
>>     while (pkIter.hasNext()) {
>>       Map<String, Object> map = pkIter.next();
>>       vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
>>       buildDocument(vri, null, map, root, true, null);
>>       pkIter.remove();
>>       //check if abortion
>>       if (stop.get())
>>       {
>>             allPks = null ;
>>             pkIter = null ;
>>             return;
>>         }     
>>     }
>> In the case of document deletion (deleteAll function in DocBuilder): Just      
>> if (stop.get()){ break ; }     at the end of every loop and call this
>> just after deleteAll is called (in doDelta)
>>       if (stop.get())
>>       {
>>             allPks = null;
>>             deletedKeys = null;
>>             return;
>>        }
>> Finally in collect delta:
>>       while (true) {
>>          //check for abortion
>>          if (stop.get()){ return myModifiedPks; }
>>          Map<String, Object> row = entityProcessor.nextModifiedRowKey();
>>          if (row == null)
>>            break;
>>            ...
>> And the same for delete-query collection and parent-delta-query
>> collection
>> I didn't atach de patch because is the first time I open an issue and
>> don't know if you want to code it as I do. Just wanted to explain the
>> idea and how I solved, I think it can be useful for other users.
>>  
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-1004%29-Optimizing-the-abort-command-in-delta-import-tp21808783p22096080.html
Sent from the Solr - Dev mailing list archive at Nabble.com.