You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Marc Sturlese (JIRA)" <ji...@apache.org> on 2009/02/03 13:20:00 UTC
[jira] Created: (SOLR-1004) Optimizing the abort command in delta
import
Optimizing the abort command in delta import
--------------------------------------------
Key: SOLR-1004
URL: https://issues.apache.org/jira/browse/SOLR-1004
Project: Solr
Issue Type: Improvement
Components: contrib - DataImportHandler
Affects Versions: 1.3
Environment: Java - Lucene - Solr - DataImportHandler
Reporter: Marc Sturlese
Priority: Minor
Fix For: 1.3.1, 1.4
I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
In the case of modifing documents, the code in DocBuilder.java would look like:
while (pkIter.hasNext()) {
Map<String, Object> map = pkIter.next();
vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
buildDocument(vri, null, map, root, true, null);
pkIter.remove();
//check if abortion
if (stop.get())
{
allPks = null ;
pkIter = null ;
return;
}
}
In the case of document deletion (deleteAll function in DocBuilder): Just if (stop.get()){ break ; } at the end of every loop and call this just after deleteAll is called (in doDelta)
if (stop.get())
{
allPks = null;
deletedKeys = null;
return;
}
Finally in collect delta:
while (true) {
//check for abortion
if (stop.get()){ return myModifiedPks; }
Map<String, Object> row = entityProcessor.nextModifiedRowKey();
if (row == null)
break;
...
And the same for delete-query collection and parent-delta-query collection
I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
Re: [jira] Resolved: (SOLR-1004) Optimizing the abort command in
delta import
Posted by Marc Sturlese <ma...@gmail.com>.
Sorry, couldn't read yesterday... but that's exact what I was suggesting,
thank you very much!
JIRA jira@apache.org wrote:
>
>
> [
> https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Shalin Shekhar Mangar resolved SOLR-1004.
> -----------------------------------------
>
> Resolution: Fixed
>
> Committed revision 745742.
>
> Thanks Marc!
>
>> Optimizing the abort command in delta import
>> --------------------------------------------
>>
>> Key: SOLR-1004
>> URL: https://issues.apache.org/jira/browse/SOLR-1004
>> Project: Solr
>> Issue Type: Improvement
>> Components: contrib - DataImportHandler
>> Affects Versions: 1.3
>> Environment: Java - Lucene - Solr - DataImportHandler
>> Reporter: Marc Sturlese
>> Assignee: Shalin Shekhar Mangar
>> Priority: Minor
>> Fix For: 1.4
>>
>> Attachments: SOLR-1004.patch
>>
>> Original Estimate: 0.5h
>> Remaining Estimate: 0.5h
>>
>> I have seen that when abort command is called in a deltaImport, in
>> DocBuilder.java, at doDelta functions it's just checked for abortion at
>> the begining of collectDelta, after that function and at the end of
>> collectDelta.
>> The problem I have found is that if there is a big number of documents to
>> modify and abort is called in the middle of delta collection, it will not
>> take effect until all data is collected.
>> Same happens when we start deleteting or updating documents. In updating
>> case, there is an abortion check inside buildDocument but, as it is
>> called inside a "while" for all docs to update, it will keep going throw
>> all docs of the bucle and skipping them.
>> I propose to do an abortion check inside every loop of data collection
>> and after calling build document in doDelta function.
>> In the case of modifing documents, the code in DocBuilder.java would look
>> like:
>> while (pkIter.hasNext()) {
>> Map<String, Object> map = pkIter.next();
>> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
>> buildDocument(vri, null, map, root, true, null);
>> pkIter.remove();
>> //check if abortion
>> if (stop.get())
>> {
>> allPks = null ;
>> pkIter = null ;
>> return;
>> }
>> }
>> In the case of document deletion (deleteAll function in DocBuilder): Just
>> if (stop.get()){ break ; } at the end of every loop and call this
>> just after deleteAll is called (in doDelta)
>> if (stop.get())
>> {
>> allPks = null;
>> deletedKeys = null;
>> return;
>> }
>> Finally in collect delta:
>> while (true) {
>> //check for abortion
>> if (stop.get()){ return myModifiedPks; }
>> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
>> if (row == null)
>> break;
>> ...
>> And the same for delete-query collection and parent-delta-query
>> collection
>> I didn't atach de patch because is the first time I open an issue and
>> don't know if you want to code it as I do. Just wanted to explain the
>> idea and how I solved, I think it can be useful for other users.
>>
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
>
--
View this message in context: http://www.nabble.com/-jira--Created%3A-%28SOLR-1004%29-Optimizing-the-abort-command-in-delta-import-tp21808783p22096080.html
Sent from the Solr - Dev mailing list archive at Nabble.com.
[jira] Resolved: (SOLR-1004) Optimizing the abort command in delta
import
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar resolved SOLR-1004.
-----------------------------------------
Resolution: Fixed
Committed revision 745742.
Thanks Marc!
> Optimizing the abort command in delta import
> --------------------------------------------
>
> Key: SOLR-1004
> URL: https://issues.apache.org/jira/browse/SOLR-1004
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Environment: Java - Lucene - Solr - DataImportHandler
> Reporter: Marc Sturlese
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1004.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
> The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
> Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
> I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
> In the case of modifing documents, the code in DocBuilder.java would look like:
> while (pkIter.hasNext()) {
> Map<String, Object> map = pkIter.next();
> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
> buildDocument(vri, null, map, root, true, null);
> pkIter.remove();
> //check if abortion
> if (stop.get())
> {
> allPks = null ;
> pkIter = null ;
> return;
> }
> }
> In the case of document deletion (deleteAll function in DocBuilder): Just if (stop.get()){ break ; } at the end of every loop and call this just after deleteAll is called (in doDelta)
> if (stop.get())
> {
> allPks = null;
> deletedKeys = null;
> return;
> }
> Finally in collect delta:
> while (true) {
> //check for abortion
> if (stop.get()){ return myModifiedPks; }
> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
> if (row == null)
> break;
> ...
> And the same for delete-query collection and parent-delta-query collection
> I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1004) Optimizing the abort command in delta
import
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar updated SOLR-1004:
----------------------------------------
Attachment: SOLR-1004.patch
Changes
# Check for abort in nextModifiedRow detection
# Check for abort in nextDeletedRow
# Check in doDelta
# Check getModifiedParentRowKey
Marc, can you see the patch to ensure all your changes got in?
> Optimizing the abort command in delta import
> --------------------------------------------
>
> Key: SOLR-1004
> URL: https://issues.apache.org/jira/browse/SOLR-1004
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Environment: Java - Lucene - Solr - DataImportHandler
> Reporter: Marc Sturlese
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1004.patch
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
> The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
> Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
> I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
> In the case of modifing documents, the code in DocBuilder.java would look like:
> while (pkIter.hasNext()) {
> Map<String, Object> map = pkIter.next();
> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
> buildDocument(vri, null, map, root, true, null);
> pkIter.remove();
> //check if abortion
> if (stop.get())
> {
> allPks = null ;
> pkIter = null ;
> return;
> }
> }
> In the case of document deletion (deleteAll function in DocBuilder): Just if (stop.get()){ break ; } at the end of every loop and call this just after deleteAll is called (in doDelta)
> if (stop.get())
> {
> allPks = null;
> deletedKeys = null;
> return;
> }
> Finally in collect delta:
> while (true) {
> //check for abortion
> if (stop.get()){ return myModifiedPks; }
> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
> if (row == null)
> break;
> ...
> And the same for delete-query collection and parent-delta-query collection
> I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1004) Optimizing the abort command in delta
import
Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shalin Shekhar Mangar updated SOLR-1004:
----------------------------------------
Fix Version/s: (was: 1.3.1)
Assignee: Shalin Shekhar Mangar
> Optimizing the abort command in delta import
> --------------------------------------------
>
> Key: SOLR-1004
> URL: https://issues.apache.org/jira/browse/SOLR-1004
> Project: Solr
> Issue Type: Improvement
> Components: contrib - DataImportHandler
> Affects Versions: 1.3
> Environment: Java - Lucene - Solr - DataImportHandler
> Reporter: Marc Sturlese
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 1.4
>
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> I have seen that when abort command is called in a deltaImport, in DocBuilder.java, at doDelta functions it's just checked for abortion at the begining of collectDelta, after that function and at the end of collectDelta.
> The problem I have found is that if there is a big number of documents to modify and abort is called in the middle of delta collection, it will not take effect until all data is collected.
> Same happens when we start deleteting or updating documents. In updating case, there is an abortion check inside buildDocument but, as it is called inside a "while" for all docs to update, it will keep going throw all docs of the bucle and skipping them.
> I propose to do an abortion check inside every loop of data collection and after calling build document in doDelta function.
> In the case of modifing documents, the code in DocBuilder.java would look like:
> while (pkIter.hasNext()) {
> Map<String, Object> map = pkIter.next();
> vri.addNamespace(DataConfig.IMPORTER_NS + ".delta", map);
> buildDocument(vri, null, map, root, true, null);
> pkIter.remove();
> //check if abortion
> if (stop.get())
> {
> allPks = null ;
> pkIter = null ;
> return;
> }
> }
> In the case of document deletion (deleteAll function in DocBuilder): Just if (stop.get()){ break ; } at the end of every loop and call this just after deleteAll is called (in doDelta)
> if (stop.get())
> {
> allPks = null;
> deletedKeys = null;
> return;
> }
> Finally in collect delta:
> while (true) {
> //check for abortion
> if (stop.get()){ return myModifiedPks; }
> Map<String, Object> row = entityProcessor.nextModifiedRowKey();
> if (row == null)
> break;
> ...
> And the same for delete-query collection and parent-delta-query collection
> I didn't atach de patch because is the first time I open an issue and don't know if you want to code it as I do. Just wanted to explain the idea and how I solved, I think it can be useful for other users.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.