You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by "Keith Turner (JIRA)" <ji...@apache.org> on 2012/04/21 00:43:33 UTC

[jira] [Created] (ACCUMULO-551) Experiment with multi-node batch writer

Keith Turner created ACCUMULO-551:
-------------------------------------

             Summary: Experiment with multi-node batch writer
                 Key: ACCUMULO-551
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
             Project: Accumulo
          Issue Type: Task
            Reporter: Keith Turner
             Fix For: 1.5.0


Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  

One possible way to do this is to :
 
 * batch mutations by rack on the client
 * send all of those mutations to one random tablet server on the rack 
 * have the random tablet server write to the other servers on the rack

This cuts down on the number of direct connections the client has to make.  Could have the following benefits.

 * Tablet servers can keep connections open to other tablet servers.
 * A write pipeline

Would be interesting to run some test and see how well this works.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266004#comment-13266004 ] 

Keith Turner commented on ACCUMULO-551:
---------------------------------------

At first I was thinking of having a batch writer that writes to a batch writer on another node, which would work nicely with a general proxy.  However I have been thinking about it more and I am not sure if this will work well.  There are a few reasons I am thinking this is not the best strategy.  First, when a tablet is not at an expected location and rack this info needs to propagate back to the source so it can invalidate its metadata cache.  Second, the second level batch writer will have to rebin the mutations even though the first level batch writer has already done this.  Third the second level batch writer will keep retrying until it gets everything through, even if its a small amount of mutations that is failing.  This could stall first level batch writer unnecessarily. 

I am thinking about making a delegating batch writer that bins mutations into rack/server and then sends this to a server on the rack.  A specialized proxy would just forward the mutations in parallel and report back failures.  This specialized proxy would just use the location passed to it by the source, it would not try to determine a mutations location. 


                
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Work started] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on ACCUMULO-551 started by Keith Turner.

> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268804#comment-13268804 ] 

Keith Turner commented on ACCUMULO-551:
---------------------------------------

Its a work in progress, but I have some working code for this available on github.

https://github.com/keith-turner/accumulo/tree/ACCUMULO-551
                
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Turner reassigned ACCUMULO-551:
-------------------------------------

    Assignee: Keith Turner
    
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "John Vines (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266561#comment-13266561 ] 

John Vines commented on ACCUMULO-551:
-------------------------------------

I'm wondering if we want to try to hit a middle ground. If a tablet migrates, go ahead and send the current batch over, but then notify the client that they need to update their metadata cache. Perhaps even have the proxy, which is aware of what the new location should be (since it's sending the batch to the new loc), notify the writer about the new tablet location.
                
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "Keith Turner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267669#comment-13267669 ] 

Keith Turner commented on ACCUMULO-551:
---------------------------------------

This model would also be useful for the batch scanner.
                
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ACCUMULO-551) Experiment with multi-node batch writer

Posted by "John Vines (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ACCUMULO-551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265988#comment-13265988 ] 

John Vines commented on ACCUMULO-551:
-------------------------------------

This sounds like it would play nicely with the proxy idea we've been talking about.
                
> Experiment with multi-node batch writer
> ---------------------------------------
>
>                 Key: ACCUMULO-551
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-551
>             Project: Accumulo
>          Issue Type: Task
>            Reporter: Keith Turner
>             Fix For: 1.5.0
>
>
> Accumulo has a batch writer that batches mutations by tablet server for writes.  This works well until there are alot of tablet servers being written to at which point only a small amount of data is being sent to each tablet server.  Would it be better for the client to batch writes for multiple tablet servers and send them to one server which writes directly to the tablet servers?  
> One possible way to do this is to :
>  
>  * batch mutations by rack on the client
>  * send all of those mutations to one random tablet server on the rack 
>  * have the random tablet server write to the other servers on the rack
> This cuts down on the number of direct connections the client has to make.  Could have the following benefits.
>  * Tablet servers can keep connections open to other tablet servers.
>  * A write pipeline
> Would be interesting to run some test and see how well this works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira