You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 15:36:34 UTC
[jira] [Created] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
remove O(n^2) slow slow indexing defaults in DataImportHandler
--------------------------------------------------------------
Key: SOLR-3142
URL: https://issues.apache.org/jira/browse/SOLR-3142
Project: Solr
Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
Fix For: 3.6, 4.0
By default, dataimporthandler optimizes the entire index when it commits.
This is bad for performance, because it means by default its doing a very
heavy index-wide operation even for an incremental update... essentially
O(n^2) indexing.
All that is needed is to set optimize=false by default. If someone wants
to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212759#comment-13212759 ]
Robert Muir commented on SOLR-3142:
-----------------------------------
{quote}
(ie: nightly) that it can be handy to have it auto-optimize when you are done.
{quote}
You can still do this, by specifying 'optimize=true' to your full-import.
Its just no longer the default. So we haven't taken away any capabilities here.
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211380#comment-13211380 ]
Uwe Schindler commented on SOLR-3142:
-------------------------------------
+1, are there any config files/parsing to edit? I somwhere have in my mind, that in DIH config there are also settings regading optimize?
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Updated] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated SOLR-3142:
------------------------------
Attachment: SOLR-3142.patch
patch for the optimize.
I agree about the soft commit, if not even the default it should at least be allowable/configurable... but I just didn't implement this in the patch.
In general whatever options are available for commit should be consistent with what DIH allows, maybe we should open a separate issue to ensure this is the case.
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211373#comment-13211373 ]
Yonik Seeley commented on SOLR-3142:
------------------------------------
+1
Might even make sense for it to be a "soft" commit.
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211382#comment-13211382 ]
Robert Muir commented on SOLR-3142:
-----------------------------------
I think it might be possible to configure this via files (versus the actual command), but i
searched for 'optimize' in the example-dih and found nothing :)
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211494#comment-13211494 ]
Robert Muir commented on SOLR-3142:
-----------------------------------
Unless there are objections I'd like to commit this to
make some progress.
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Resolved] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved SOLR-3142.
-------------------------------
Resolution: Fixed
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212761#comment-13212761 ]
Hoss Man commented on SOLR-3142:
--------------------------------
agreed, was just noting why i _think_ the original default was true..
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212760#comment-13212760 ]
Uwe Schindler commented on SOLR-3142:
-------------------------------------
any optimizing after a full import over a non-empty index is no longer really needed in Lucene (even if you do a IndexWriter.deleteAll() before as the fullimport does). Once IndexWriter merges (or on close or commit) and detects a segment only contains of deleted documents it will drop it. This was indeed not true in the past, but since Lucene 3.1 or like that it is.
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing
defaults in DataImportHandler
Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212752#comment-13212752 ]
Hoss Man commented on SOLR-3142:
--------------------------------
FWIW: I'm pretty sure the original assumption here was that in the (relatively common) usecase of doing a full-import rebuild on a regular basis (ie: nightly) that it can be handy to have it auto-optimize when you are done. I think the real problem is that that assumption was never challeneged regarding things like delta import.
so an argument could be made the the default should still be to optimze=true on full-import, and optimize=false on delta import ... but i'm not going to make that argument, i think this it's silly to assume true in either case. (particularly since a parameterized full import might actually be a rapidly repeating incremental)
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
> Key: SOLR-3142
> URL: https://issues.apache.org/jira/browse/SOLR-3142
> Project: Solr
> Issue Type: Bug
> Reporter: Robert Muir
> Assignee: Robert Muir
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org