You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2009/02/19 06:45:02 UTC
[jira] Resolved: (SOLR-1004) Optimizing the abort command in delta
import
[ https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar resolved SOLR-1004.
-----------------------------------------
Resolution: Fixed
Committed revision 745742.
Thanks Marc!
> Optimizing the abort command in delta import
> --------------------------------------------
>
> Key: SOLR-1004
> URL: https://issues.apache.org/jira/browse/SOLR-1004
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Environment: Java - Lucene - Solr - DataImportHandler
> Reporter: Marc Sturlese
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1004.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
> The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
> Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
> I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
> In the case of modifing documents, the code in DocBuilder.java would look like:
> while (pkIter.hasNext()) {
> Map<String, Object> map = pkIter.next();
> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
> buildDocument(vri, null, map, root, true, null);
> pkIter.remove();
> //check if abortion
> if (stop.get())
> {
> allPks = null ;
> pkIter = null ;
> return;
> }
> }
> In the case of document deletion (deleteAll function in DocBuilder): Just if (stop.get()){ break ; } at the end of every loop and call this just after deleteAll is called (in doDelta)
> if (stop.get())
> {
> allPks = null;
> deletedKeys = null;
> return;
> }
> Finally in collect delta:
> while (true) {
> //check for abortion
> if (stop.get()){ return myModifiedPks; }
> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
> if (row == null)
> break;
> ...
> And the same for delete-query collection and parent-delta-query collection
> I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Resolved: (SOLR-1004) Optimizing the abort command in
delta import
Posted by Marc Sturlese <ma...@gmail.com>.
Sorry, couldn't read yesterday... but that's exact what I was suggesting,
thank you very much!
JIRA jira@apache.org wrote:
>
>
> [
> https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Shalin Shekhar Mangar resolved SOLR-1004.
> -----------------------------------------
>
> Resolution: Fixed
>
> Committed revision 745742.
>
> Thanks Marc!
>
>> Optimizing the abort command in delta import
>> --------------------------------------------
>>
>> Key: SOLR-1004
>> URL: https://issues.apache.org/jira/browse/SOLR-1004
>> Project: Solr
>> Issue Type: Improvement
>> Components: contrib - DataImportHandler
>> Affects Versions: 1.3
>> Environment: Java - Lucene - Solr - DataImportHandler
>> Reporter: Marc Sturlese
>> Assignee: Shalin Shekhar Mangar
>> Priority: Minor
>> Fix For: 1.4
>>
>> Attachments: SOLR-1004.patch
>>
>> Original Estimate: 0.5h
>> Remaining Estimate: 0.5h
>>
>> I have seen that when abort command is called in a deltaImport, in
>> DocBuilder.java, at doDelta functions it's just checked for abortion at
>> the begining of collectDelta, after that function and at the end of
>> collectDelta.
>> The problem I have found is that if there is a big number of documents to
>> modify and abort is called in the middle of delta collection, it will not
>> take effect until all data is collected.
>> Same happens when we start deleteting or updating documents. In updating
>> case, there is an abortion check inside buildDocument but, as it is
>> called inside a "while" for all docs to update, it will keep going throw
>> all docs of the bucle and skipping them.
>> I propose to do an abortion check inside every loop of data collection
>> and after calling build document in doDelta function.
>> In the case of modifing documents, the code in DocBuilder.java would look
>> like:
>> while (pkIter.hasNext()) {
>> Map<String, Object> map = pkIter.next();
>> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
>> buildDocument(vri, null, map, root, true, null);
>> pkIter.remove();
>> //check if abortion
>> if (stop.get())
>> {
>> allPks = null ;
>> pkIter = null ;
>> return;
>> }
>> }
>> In the case of document deletion (deleteAll function in DocBuilder): Just
>> if (stop.get()){ break ; } at the end of every loop and call this
>> just after deleteAll is called (in doDelta)
>> if (stop.get())
>> {
>> allPks = null;
>> deletedKeys = null;
>> return;
>> }
>> Finally in collect delta:
>> while (true) {
>> //check for abortion
>> if (stop.get()){ return myModifiedPks; }
>> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
>> if (row == null)
>> break;
>> ...
>> And the same for delete-query collection and parent-delta-query
>> collection
>> I didn't atach de patch because is the first time I open an issue and
>> don't know if you want to code it as I do. Just wanted to explain the
>> idea and how I solved, I think it can be useful for other users.
>>
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
>
--
View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-1004%29-Optimizing-the-abort-command-in-delta-import-tp21808783p22096080.html
Sent from the Solr - Dev mailing list archive at Nabble.com.