You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by Smilencer <sm...@qq.com> on 2021/04/22 08:32:45 UTC

回复:[DISCUSS] Question about the performance of scanning with hash key prefix

Hi, we are glad you to give this issue to us. Related discussions have been recorded on Github(https://github.com/apache/incubator-pegasus/issues/723), and we can continue our discussion here.


------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "dev"                                                                                    <palomino219@gmail.com&gt;;
发送时间:&nbsp;2021年4月21日(星期三) 晚上6:20
收件人:&nbsp;"dev"<dev@pegasus.apache.org&gt;;

主题:&nbsp;[DISCUSS] Question about the performance of scanning with hash key prefix



Hi, Pegasus Community. Our company uses pegasus a lot and now we face a
timeout problem when we want to purge some data in the cluster.

Our key is like this

p_123#m_123

To avoid hot spot, we use this key for both hash key and sort key. And we
want to purge all the data start with p_123#, so we will use
getUnorderedScanners with hash key prefix as 'p_123#'. And then we find out
that, it is easy to timeout when the table getting larger.

Generally I think there are two problems:

1. Since we store the length of the hash key in front, scanning with hash
key prefix on a partition is not a simple range scan. For example, I use
prefix 'aaa', the actual start key will be '03aaa', where 03 is the length
of the hash key. And you can not stop when prefix is changed, as there
could be hash key like '04aaa', '05aaa', and so on. So usually you need to
scan all the remaining rows to find out all the matches, which could be
really slow.

2. Although we have a batch size to limit the scan time, it does not work
if the data is sparse. In the case above, we need to scan almost the whole
partition but it is possible that there is no row which matches the prefix,
then it will be easy to timeout.

For #1, I'm not sure what is the best solution as maybe we need to change
the storage format or change the comparator but either way would cause
compatibility issue.

For #2, I think we should implement something like 'always return when we
have spent half of the rpc timeout time to scan', or 'return after we have
scanned 1k rows', even if there is nothing in the result set, to avoid
timeout at client side.

Thanks. Waiting for your response.

Re: [DISCUSS] Question about the performance of scanning with hash key prefix

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
Thank you. Let's continue on the github issue.

Smilencer <sm...@qq.com> 于2021年4月22日周四 下午4:32写道:

> Hi, we are glad you to give this issue to us. Related discussions have
> been recorded on Github(
> https://github.com/apache/incubator-pegasus/issues/723), and we can
> continue our discussion here.
>
> ------------------ 原始邮件 ------------------
> *发件人:* "dev" <pa...@gmail.com>;
> *发送时间:* 2021年4月21日(星期三) 晚上6:20
> *收件人:* "dev"<de...@pegasus.apache.org>;
> *主题:* [DISCUSS] Question about the performance of scanning with hash key
> prefix
>
> Hi, Pegasus Community. Our company uses pegasus a lot and now we face a
> timeout problem when we want to purge some data in the cluster.
>
> Our key is like this
>
> p_123#m_123
>
> To avoid hot spot, we use this key for both hash key and sort key. And we
> want to purge all the data start with p_123#, so we will use
> getUnorderedScanners with hash key prefix as 'p_123#'. And then we find out
> that, it is easy to timeout when the table getting larger.
>
> Generally I think there are two problems:
>
> 1. Since we store the length of the hash key in front, scanning with hash
> key prefix on a partition is not a simple range scan. For example, I use
> prefix 'aaa', the actual start key will be '03aaa', where 03 is the length
> of the hash key. And you can not stop when prefix is changed, as there
> could be hash key like '04aaa', '05aaa', and so on. So usually you need to
> scan all the remaining rows to find out all the matches, which could be
> really slow.
>
> 2. Although we have a batch size to limit the scan time, it does not work
> if the data is sparse. In the case above, we need to scan almost the whole
> partition but it is possible that there is no row which matches the prefix,
> then it will be easy to timeout.
>
> For #1, I'm not sure what is the best solution as maybe we need to change
> the storage format or change the comparator but either way would cause
> compatibility issue.
>
> For #2, I think we should implement something like 'always return when we
> have spent half of the rpc timeout time to scan', or 'return after we have
> scanned 1k rows', even if there is nothing in the result set, to avoid
> timeout at client side.
>
> Thanks. Waiting for your response.
>
>