You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Noble Paul (JIRA)" <ji...@apache.org> on 2011/06/15 10:42:47 UTC

[jira] [Created] (SOLR-2593) A new command 'split' for splitting index

A new command 'split' for splitting index
-----------------------------------------

                 Key: SOLR-2593
                 URL: https://issues.apache.org/jira/browse/SOLR-2593
             Project: Solr
          Issue Type: New Feature
            Reporter: Noble Paul


If an index is too large/hot it would be desirable to split it out to another core 


There can be to be multiple strategies 
* random split of x or x% 
* fq="user:johndoe"

example 
example :
command=split&split=20percent&newcore=my_new_index
or
command=split&fq=user:johndoe&newcore=john_doe_index







--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin action 'split' for splitting index

Posted by "Jason Rutherglen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207407#comment-13207407 ] 

Jason Rutherglen commented on SOLR-2593:
----------------------------------------

Is there a patch for this issue available?  If not it's fine.
                
> A new core admin action 'split' for splitting index
> ---------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin action 'split' for splitting index

Posted by "Deepak Kumar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483335#comment-13483335 ] 

Deepak Kumar commented on SOLR-2593:
------------------------------------

I have a situation which demands 2 core merging, re-create data partitions, split & install in 2(or more) cores, seems like this place has got somewhat things closer in that area, basically the case is that there are 2 cores on same schema roughly of 55G and 35G(and growing) each and data keeps on getting pushed continuously on 35G core, we can't allow it to get filled infinitely so essentially over a period of time(offline period/maintenance period) we regenrate(by re-indexing to a fresh core) both the cores with the desired set of data keyed on some unique key, discard the old oversized cores and install the fresh ones, re-indexing is a kind of pain and eventually it'll create the same set of documents but the older core will loose too older docs due to size constraint and the smaller core would be further shrinked as it'll probably be holding lesser documents due to docs getting shifted to bigger one, this can be considered as a sliding time window based core, so the basic steps in demand could be:

1.) Merge N cores to 1 big core(high cost).
2.) Scan through all the documents of the big core and create N(num of cores that were merged initially) new cores till allowed size by the side.
3.) Hot swap the main cores with the fresh ones.
4.) Discard the old cores probably after backing it up.

Above 1 may be omitted if we can directly scan through documents of N cores and keep on pushing the new docs over to target cores.
                
> A new core admin action 'split' for splitting index
> ---------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.1
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049727#comment-13049727 ] 

Peter Sturge commented on SOLR-2593:
------------------------------------

This is a really great idea, thanks!
If it's possible, it would be cool to have config parameters to:
 create a new core
 overwrite an existing core
 rename an existing core, then create (rolling backup)
 merge with an existing core (ever-growing, but kind of an accessible 'archive' index)


> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050139#comment-13050139 ] 

Hoss Man commented on SOLR-2593:
--------------------------------

one thing to think about when talking about the API is how the implementation will actually work.

the fq type option is basically going to require making a full copy of hte index and then deleting by query. (unless i'm missing something) but for people who don't care how the index is partitioned a more efficient approach could probably happen by working at the segment level -- let the user say "split off a hunk of at least 20% but no more then 50%" and then you can look at individual segments and doc counts and see if it's possible to just move segments around (and maybe only do the "copy+deleteByQuery" logic on a single segment.


> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2593) A new core admin action 'split' for splitting index

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-2593:
-----------------------------

    Description: 
If an index is too large/hot it would be desirable to split it out to another core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq="user:johndoe"


example :
action=split&split=20percent&newcore=my_new_index
or
action=split&fq=user:johndoe&newcore=john_doe_index







  was:
If an index is too large/hot it would be desirable to split it out to another core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq="user:johndoe"

example 
example :
command=split&split=20percent&newcore=my_new_index
or
command=split&fq=user:johndoe&newcore=john_doe_index







        Summary: A new core admin action 'split' for splitting index  (was: A new core admin command 'split' for splitting index)

> A new core admin action 'split' for splitting index
> ---------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Koji Sekiguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049713#comment-13049713 ] 

Koji Sekiguchi commented on SOLR-2593:
--------------------------------------

CoreAdminHandler uses action, not command.

> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin action 'split' for splitting index

Posted by "Terrance A. Snyder (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153533#comment-13153533 ] 

Terrance A. Snyder commented on SOLR-2593:
------------------------------------------

@Noble Paul - do you have more information on this, we have a unique requirement that would greatly benefit from being able to take a 'slice' of data a user has modified and persist it in such a fashion.
                
> A new core admin action 'split' for splitting index
> ---------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin action 'split' for splitting index

Posted by "Andrzej Bialecki (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207452#comment-13207452 ] 

Andrzej Bialecki  commented on SOLR-2593:
-----------------------------------------

Jason, see LUCENE-2632 for a possible way to implement this at the Lucene level. Splitting into arbitrary parts so far required multiple passes over input data, using the approach of tee/filter codecs it's possible to do this in one pass over the input data.
                
> A new core admin action 'split' for splitting index
> ---------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example :
> action=split&split=20percent&newcore=my_new_index
> or
> action=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2593) A new command 'split' for splitting index

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-2593:
-----------------------------

    Description: 
If an index is too large/hot it would be desirable to split it out to another core .
This core may eventually be replicated out to another host.

There can be to be multiple strategies 
* random split of x or x% 
* fq="user:johndoe"

example 
example :
command=split&split=20percent&newcore=my_new_index
or
command=split&fq=user:johndoe&newcore=john_doe_index







  was:
If an index is too large/hot it would be desirable to split it out to another core 


There can be to be multiple strategies 
* random split of x or x% 
* fq="user:johndoe"

example 
example :
command=split&split=20percent&newcore=my_new_index
or
command=split&fq=user:johndoe&newcore=john_doe_index








> A new command 'split' for splitting index
> -----------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050137#comment-13050137 ] 

Hoss Man commented on SOLR-2593:
--------------------------------

bq. If it's possible, it would be cool to have config parameters to:

...those seem like they should be discrete actions that can be taken after the split has happened.  the simplest thing is to have a "split" action that _just_ creates a new core with the docs selected either using the fq (or randomly selection) and then use other CoreAdmin actions for the other stuff: rename, swap, swap+delete (the old one), merge ... merge is really the only one we don't have at a "core" level yet (i think)



> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050236#comment-13050236 ] 

Noble Paul commented on SOLR-2593:
----------------------------------

bq. the fq type option is basically going to require making a full copy of hte index and then deleting by query...

Lucene does it better. We can pass a Filtered Index to a new writer and it creates a new index w/ only those docs. I was surprised at the speed at which it split a dummy 1million doc index in < 1 sec





> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-2593:
----------------------------------------

    Fix Version/s: 4.0

> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>             Fix For: 4.0
>
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-2593) A new core admin command 'split' for splitting index

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Noble Paul updated SOLR-2593:
-----------------------------

    Summary: A new core admin command 'split' for splitting index  (was: A new command 'split' for splitting index)

> A new core admin command 'split' for splitting index
> ----------------------------------------------------
>
>                 Key: SOLR-2593
>                 URL: https://issues.apache.org/jira/browse/SOLR-2593
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Noble Paul
>
> If an index is too large/hot it would be desirable to split it out to another core .
> This core may eventually be replicated out to another host.
> There can be to be multiple strategies 
> * random split of x or x% 
> * fq="user:johndoe"
> example 
> example :
> command=split&split=20percent&newcore=my_new_index
> or
> command=split&fq=user:johndoe&newcore=john_doe_index

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org