You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Liyin Tang (JIRA)" <ji...@apache.org> on 2012/05/25 21:08:24 UTC

[jira] [Created] (HBASE-6103) HBaseServer shall read and deserialize data from each connection in parallel

Liyin Tang created HBASE-6103:
---------------------------------

             Summary: HBaseServer shall read and deserialize data from each connection in parallel
                 Key: HBASE-6103
                 URL: https://issues.apache.org/jira/browse/HBASE-6103
             Project: HBase
          Issue Type: Improvement
            Reporter: Liyin Tang
            Assignee: Liyin Tang


Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

So when there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, this listener thread will be performance bottleneck. 

Ideally, the listener thread shall only accept the connection and handover the connection to the IPC threads directly, so that each IPC thread would read the data from network channel, deserialize the data and execute the Call. 

In this way, the HBaseServer can read and deserialize data from each connection in parallel.






--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6103) HBaseServer shall serialize the data for each connection in parallel

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-6103:
------------------------------

    Description: 
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 

So the solution is HBaseServer shall serialize the data for each connection in parallel.

BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.









  was:
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

Also HBaseServer is running with a single respond thread, which will serialize the writable objects into bytes and send them back through the connection.

When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 

So the solution is HBaseServer shall serialize and deserialize the data for each connection in parallel.

BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.









        Summary: HBaseServer shall serialize the data for each connection in parallel  (was: HBaseServer shall serialize and deserialize data for each connection in parallel)
    
> HBaseServer shall serialize the data for each connection in parallel
> --------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is HBaseServer shall serialize the data for each connection in parallel.
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509311#comment-13509311 ] 

stack commented on HBASE-6103:
------------------------------

https://issues.apache.org/jira/browse/HBASE-2941 added a fixed-size pool of Readers.  Its like this patch but its fixed size for the pool rather than number of processors and puts stuff on a queue if busy.  It looks like we could pick up a few of Liyin's improvements.  Worth a bit of study before closing it out.
                
> Optimize the HBaseServer to deserialize the data for each ipc connection in parallel
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6103-fb-89.patch
>
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-6103:
------------------------------

    Description: 
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 

So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer

BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.









  was:
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 

So the solution is HBaseServer shall serialize the data for each connection in parallel.

BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.









        Summary: Optimize the HBaseServer to deserialize the data for each ipc connection in parallel  (was: HBaseServer shall serialize the data for each connection in parallel)
    
> Optimize the HBaseServer to deserialize the data for each ipc connection in parallel
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467132#comment-13467132 ] 

stack commented on HBASE-6103:
------------------------------

HBASE-6619 overwrites this patch and technique.  Confirm after HBASE-6619 goes in and if so resolve as won't fix.
                
> Optimize the HBaseServer to deserialize the data for each ipc connection in parallel
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6103-fb-89.patch
>
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-6103:
-------------------------

      Component/s: performance
         Priority: Critical  (was: Major)
    Fix Version/s: 0.96.0

Bringing into 0.96.  See if we can forward port this Liyin fix.
                
> Optimize the HBaseServer to deserialize the data for each ipc connection in parallel
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6103-fb-89.patch
>
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6103) HBaseServer shall serialize and deserialize data for each connection in parallel

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-6103:
------------------------------

    Description: 
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

Also HBaseServer is running with a single respond thread, which will serialize the writable objects into bytes and send them back through the connection.

When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 

So the solution is HBaseServer shall serialize and deserialize the data for each connection in parallel.

BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.









  was:
Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 

So when there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, this listener thread will be performance bottleneck. 

Ideally, the listener thread shall only accept the connection and handover the connection to the IPC threads directly, so that each IPC thread would read the data from network channel, deserialize the data and execute the Call. 

In this way, the HBaseServer can read and deserialize data from each connection in parallel.






        Summary: HBaseServer shall serialize and deserialize data for each connection in parallel  (was: HBaseServer shall read and deserialize data from each connection in parallel)
    
> HBaseServer shall serialize and deserialize data for each connection in parallel
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> Also HBaseServer is running with a single respond thread, which will serialize the writable objects into bytes and send them back through the connection.
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is HBaseServer shall serialize and deserialize the data for each connection in parallel.
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6103) Optimize the HBaseServer to deserialize the data for each ipc connection in parallel

Posted by "Liyin Tang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-6103:
------------------------------

    Attachment: HBASE-6103-fb-89.patch

Here is the revision for this patch:
https://reviews.facebook.net/D3435

I shall port this patch to apache-trunk soon.

(Not sure why the reviews.facebook.net does not work out with jira this time.)
                
> Optimize the HBaseServer to deserialize the data for each ipc connection in parallel
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-6103
>                 URL: https://issues.apache.org/jira/browse/HBASE-6103
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: HBASE-6103-fb-89.patch
>
>
> Currently HBaseServer is running with a single listener thread, which is responsible for accepting the connection, reading the data from network channel, deserializing the data into writable objects and handover to the IPC handler threads. 
> When there are multiple hbase clients connecting to the region server (HBaseServer) and reading/writing a large set of data, the listener and the respond thread will be performance bottleneck. 
> So the solution is to deserialize the data for each ipc connection in parallel for HBaseServer
> BTW, it is also one of the reasons that the parallel scanning from multiple clients is far slower than single client case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira