You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Kadir Ozdemir (Jira)" <ji...@apache.org> on 2022/06/08 20:56:00 UTC

[jira] [Updated] (PHOENIX-6677) Parallelism within a batch of mutations

     [ https://issues.apache.org/jira/browse/PHOENIX-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kadir Ozdemir updated PHOENIX-6677:
-----------------------------------
    Fix Version/s:     (was: 4.17.0)
                       (was: 5.2.0)

> Parallelism within a batch of mutations 
> ----------------------------------------
>
>                 Key: PHOENIX-6677
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6677
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir OZDEMIR
>            Priority: Major
>
> Currently, Phoenix client simply passes the batches of row mutations from the application to HBase client without any parallelism or intelligent grouping (except grouping mutations for the same row). 
> Assume that the application creates batches 10000 row mutations for a given table. Phoenix client divides these rows based on their arrival order into HBase batches of n (e.g., 100) rows based on the configured batch size, i.e., the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch at a time (i.e., serially).  HBase client further divides a given batch of rows into smaller batches based on their regions. This means that a large batch created by the application is divided into many tiny batches and executed mostly serially. For slated tables, this will result in even smaller batches. 
> We can improve the current implementation greatly if we group the rows of the batch prepared by the application into sub batches based on table region boundaries and then execute these batches in parallel. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)