You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (Created) (JIRA)" <ji...@apache.org> on 2012/02/19 15:36:34 UTC

[jira] [Created] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

remove O(n^2) slow slow indexing defaults in DataImportHandler
--------------------------------------------------------------

                 Key: SOLR-3142
                 URL: https://issues.apache.org/jira/browse/SOLR-3142
             Project: Solr
          Issue Type: Bug
            Reporter: Robert Muir
            Assignee: Robert Muir
             Fix For: 3.6, 4.0


By default, dataimporthandler optimizes the entire index when it commits.

This is bad for performance, because it means by default its doing a very
heavy index-wide operation even for an incremental update... essentially 
O(n^2) indexing.

All that is needed is to set optimize=false by default. If someone wants
to optimize, they can either set optimize=true or explicitly optimize themselves.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212759#comment-13212759 ] 

Robert Muir commented on SOLR-3142:
-----------------------------------

{quote}
(ie: nightly) that it can be handy to have it auto-optimize when you are done.
{quote}

You can still do this, by specifying 'optimize=true' to your full-import.
Its just no longer the default. So we haven't taken away any capabilities here.
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211380#comment-13211380 ] 

Uwe Schindler commented on SOLR-3142:
-------------------------------------

+1, are there any config files/parsing to edit? I somwhere have in my mind, that in DIH config there are also settings regading optimize?
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Robert Muir (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-3142:
------------------------------

    Attachment: SOLR-3142.patch

patch for the optimize.

I agree about the soft commit, if not even the default it should at least be allowable/configurable... but I just didn't implement this in the patch.

In general whatever options are available for commit should be consistent with what DIH allows, maybe we should open a separate issue to ensure this is the case.

                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Yonik Seeley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211373#comment-13211373 ] 

Yonik Seeley commented on SOLR-3142:
------------------------------------

+1

Might even make sense for it to be a "soft" commit.
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211382#comment-13211382 ] 

Robert Muir commented on SOLR-3142:
-----------------------------------

I think it might be possible to configure this via files (versus the actual command), but i 
searched for 'optimize' in the example-dih and found nothing :)
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Robert Muir (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211494#comment-13211494 ] 

Robert Muir commented on SOLR-3142:
-----------------------------------

Unless there are objections I'd like to commit this to 
make some progress.
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Robert Muir (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-3142.
-------------------------------

    Resolution: Fixed
    
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212761#comment-13212761 ] 

Hoss Man commented on SOLR-3142:
--------------------------------

agreed, was just noting why i _think_ the original default was true..
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Uwe Schindler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212760#comment-13212760 ] 

Uwe Schindler commented on SOLR-3142:
-------------------------------------

any optimizing after a full import over a non-empty index is no longer really needed in Lucene (even if you do a IndexWriter.deleteAll() before as the fullimport does). Once IndexWriter merges (or on close or commit) and detects a segment only contains of deleted documents it will drop it. This was indeed not true in the past, but since Lucene 3.1 or like that it is.
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3142) remove O(n^2) slow slow indexing defaults in DataImportHandler

Posted by "Hoss Man (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212752#comment-13212752 ] 

Hoss Man commented on SOLR-3142:
--------------------------------

FWIW: I'm pretty sure the original assumption here was that in the (relatively common) usecase of doing a full-import rebuild on a regular basis (ie: nightly) that it can be handy to have it auto-optimize when you are done.   I think the real problem is that that assumption was never challeneged regarding things like delta import.

so an argument could be made the the default should still be to optimze=true on full-import, and optimize=false on delta import ... but i'm not going to make that argument, i think this it's silly to assume true in either case.  (particularly since a parameterized full import might actually be a rapidly repeating incremental)
                
> remove O(n^2) slow slow indexing defaults in DataImportHandler
> --------------------------------------------------------------
>
>                 Key: SOLR-3142
>                 URL: https://issues.apache.org/jira/browse/SOLR-3142
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.6, 4.0
>
>         Attachments: SOLR-3142.patch
>
>
> By default, dataimporthandler optimizes the entire index when it commits.
> This is bad for performance, because it means by default its doing a very
> heavy index-wide operation even for an incremental update... essentially 
> O(n^2) indexing.
> All that is needed is to set optimize=false by default. If someone wants
> to optimize, they can either set optimize=true or explicitly optimize themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org