You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@phoenix.apache.org by "chenglei (Jira)" <ji...@apache.org> on 2019/12/04 03:13:00 UTC

[jira] [Comment Edited] (PHOENIX-5494) Batched, mutable Index updates are unnecessarily run one-by-one

    [ https://issues.apache.org/jira/browse/PHOENIX-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987473#comment-16987473 ] 

chenglei edited comment on PHOENIX-5494 at 12/4/19 3:12 AM:
------------------------------------------------------------

[~kozdemir]，thank you very much for the perf test, seems that this patch did not improve much as expected. How much byte size and mutation count of every batch in your test? I noticed the latency is so much long 


was (Author: comnetwork):
[~kozdemir]，thank you very much for the perf test, seems that this patch did not improve much as expected.

> Batched, mutable Index updates are unnecessarily run one-by-one
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-5494
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5494
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.15.0, 5.1.0
>            Reporter: Lars Hofhansl
>            Assignee: chenglei
>            Priority: Major
>              Labels: performance
>             Fix For: 4.15.0, 5.1.0
>
>         Attachments: 5494-4.x-HBase-1.5.txt, PHOENIX-5494-4.x-HBase-1.4.patch, PHOENIX-5494.master.001.patch, PHOENIX-5494.master.002.patch, PHOENIX-5494.master.003.patch, PHOENIX-5494_v9-4.x-HBase-1.4.patch, PHOENIX-5494_v9-master.patch, Screenshot_20191110_160243.png, Screenshot_20191110_160351.png, Screenshot_20191110_161453.png
>
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> I just noticed that index updates on mutable tables retrieve their deletes (to invalidate the old index entry) one-by-one.
> For batches, this can be *the* major time spent during an index update. The cost is mostly incured by the repeated setup (and seeking) of the new region scanner (for each row).
> We can instead do a skip scan and get all updates in a single scan per region.
> (Logically that is simple, but it will require some refactoring)
> I won't be getting to this, but recording it here in case someone feels inclined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)