You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2021/04/22 08:03:09 UTC

[GitHub] [incubator-pegasus] Smityz opened a new issue #723: Proposal Redesign of Pegasus Scanner ###

Smityz opened a new issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723


   ## Proposal Redesign of Pegasus Scanner
   
   ### Background
   
   Pegasus provides three interfaces `on_get_scanner`  `on_scan` and `on_clear_scanner` , for clients to execute scanning tasks.
   
   If we want to full scan the whole table, at first, the client will call `on_get_scanner` on each partition, and then partitions return a `context_id` which is a random number generated by the server to record some parameters such as `hash_key_filter_type`, `batch_size` and the context of this scanning task.
   
   Secondly, the client uses this `context_id` to call `on_scan` and completes scanning in the corresponding partition in turn. Servers will scan the whole data of the table on the disk, and return compliant value to the client in batches.
   
   If the tasking end or any error happened, the client will call `on_clear_scanner` to clear its context_id on the server.
   
   ### Problem Statement
   
   In actual use, such a design will cause some problems.
   
   1. **prefix scan is too slow**
   
   If we execute this scanning task:
   
   ```shell
   full_scan --hash_key_filter_type prefix --hash_key_filter_pattern 2021-04-21
   ```
   
   Server will scan all the data in the table, then returns the prefix match key of the pattern. But we can speed it up by using prefix seeking futures of RocksDB.
   
   2. **scanning task is easily failed**
   
   Although we have a batch size to limit the scan time, it does not work if the data is sparse. In the case above, we need to scan almost the whole partition but it is possible that there is no row which matches the prefix,then it will be easy to timeout.
   
   ### Proposal
   
   **For problem 1**
   
   1. Pegasus store key schema in RocksDB is like `[hashkey_len(2bytes)][hashkey][sortkey]`, so we can't directly use prefix seeking. But we can prefix seek `[01][prefix_pattern]`,`[02][prefix_pattern]`,`[03][prefix_pattern]`...`[65535][prefix_pattern]` in RocksDB.
   2. Client can parallelly scan all the partitions instead of one by one.
   
   **For problem 2**
   
   1. We can set a `HeartbeatCheck` during scanning like [Hbase StoreScanner](https://github.com/apache/hbase/blob/048ca4e43fdf8b341c9ade5a9d455f627fc76041/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L591), pegasus sever sends heartbeat packets periodically to avoid timeout, which performed like a stream.
   
   2. We can change the way to count batch size: compliant value number -> already scan value number
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Apache9 commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Apache9 commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-827432388


   So let's change the comparator and check the performance impact first?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] shenxingwuying edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
shenxingwuying edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826267742


   Redesign of Pegasus Scanner, to solve the problem scan timeout.
   In my opinion,the root cause of the problem is the method of data sort.
   Rocksdb's data should use customized Comparator, which will reserve sorted by userkey(hash_key, sort_key), and then 
   the prefix filter should very fast. 
   
   Why comparator use the default ByteWiseComparator at the beginning? 
   At this time , maybe pegasus can fix to the new comparator(customized Comparator).
   To avoid data incompatible, we can support two comparator(add new Comparator), and the new pegasus cluster use new comparator.
   
   1、support postfix,should scan all data,the cost as before,  maybe the filter not important.
   2、support prefix,need not scan all data, speed will increase by reduce scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] shenxingwuying edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
shenxingwuying edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826267742


   Redesign of Pegasus Scanner, to solve the problem scan timeout.
   In my opinion,the root cause of the problem is the method of data sort.
   Rocksdb's data should use customized Comparator, which will reserve sorted by userkey(hash_key, sort_key), and then 
   the prefix filter should very fast. 
   
   Why comparator use the default ByteWiseComparator at the beginning? 
   At this time , maybe pegasus can fix to the new comparator(customized Comparator).
   To avoid data incompatible, we can support two comparator(add new Comparator), and the new pegasus cluster use new comparator.
   
   1、support postfix,should scan all data,the cost as before,  maybe the filter not important.
   2、support prefix,need not scan all data, speed will increase by reduce scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Shuo-Jia edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Shuo-Jia edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-842253291


   > 1. We can set a `HeartbeatCheck` during scanning like [Hbase StoreScanner](https://github.com/apache/hbase/blob/048ca4e43fdf8b341c9ade5a9d455f627fc76041/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L591), pegasus sever sends heartbeat packets periodically to avoid timeout, which performed like a stream
   
   @Apache9 @Smityz  https://github.com/XiaoMi/pegasus-java-client/pull/156 and https://github.com/XiaoMi/pegasus-go-client/pull/86 have fix `next retry failed when timeout`, you can resolve the problem before `refactor scanner `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] neverchanje edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
neverchanje edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826645906


   First of all, we use the default ByteWiseComparator because we design the key schema based on it.
   We design the hashkey length ahead of the hashkey bytes in order to prevent key conflict like:
   
   1. hashkey = a, sortkey = xxx
   
   2. hashkey = ax, sortkey = xx
   
   With the default comparator, the two keys are seen as distinct:
   
   ```
   01axxx
   02axxx
   ```
   
   So we chose this method, but didn't consider that one day we would need prefix filtering of hashkey. So now the problem is:
   how can we upgrade our key schema version to support efficient hashkey prefix-filtering, or do other workaround, without modifying the key schema (and also give up support of hashkey sorting), like the above solution that @Smityz came up with.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] shenxingwuying commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
shenxingwuying commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826267742


   Redesign of Pegasus Scanner, to solve the problem scan timeout.
   In my opinion,the root cause of the problem is the method of data sort.
   Rocksdb's data should use customized Comparator, which will reserve sorted by userkey(hash_key, sort_key), and then 
   the prefix filter should very fast. 
   
   Why comparator use the default ByteWiseComparator at the beginning?
   At this time , maybe pegasus can fix to the new comparator(customized Comparator).
   To avoid data incompatible, we can support two comparator(add new Comparator), and the new pegasus cluster use new comparator.
   
   1、support postfix,should scan all data,the cost as before,  maybe the filter not important.
   2、support prefix,need not scan all data, speed will increase by reduce scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] neverchanje edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
neverchanje edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826645906


   First of all, we use the default ByteWiseComparator because we design the key schema based on it.
   We design the hashkey length ahead of the hashkey bytes in order to prevent key conflict like:
   
   1. hashkey = a, sortkey = xxx
   
   2. hashkey = ax, sortkey = xx
   
   With the default compactor, the two keys are seen as distinct:
   
   ```
   01axxx
   02axxx
   ```
   
   So we chose this method, but didn't consider that one day we would need prefix filtering of hashkey. So now the problem is:
   how can we upgrade our key schema version to support efficient hashkey prefix-filtering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Shuo-Jia commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Shuo-Jia commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-842253291


   > 1. We can set a `HeartbeatCheck` during scanning like [Hbase StoreScanner](https://github.com/apache/hbase/blob/048ca4e43fdf8b341c9ade5a9d455f627fc76041/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L591), pegasus sever sends heartbeat packets periodically to avoid timeout, which performed like a stream
   
   @Apache9 @Smityz  https://github.com/XiaoMi/pegasus-java-client/pull/156 and https://github.com/XiaoMi/pegasus-go-client/pull/86 have fix `next retry failed when timeout`, you can resolve it before `refactor scanner `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] neverchanje edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
neverchanje edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826645906


   First of all, we use the default ByteWiseComparator because we design the key schema based on it.
   We design the hashkey length ahead of the hashkey bytes in order to prevent key conflict like:
   
   1. hashkey = a, sortkey = xxx
   
   2. hashkey = ax, sortkey = xx
   
   With the default compactor, the two keys are seen as distinct:
   
   ```
   01axxx
   02axxx
   ```
   
   So we chose this method, but didn't consider that one day we would need prefix filtering of hashkey. So now the problem is:
   how can we upgrade our key schema version to support efficient hashkey prefix-filtering, or do other workaround, without modifying the key schema (and also give up support of hashkey sorting), like the above solution that @Smityz came up with.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Shuo-Jia commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Shuo-Jia commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-842253291


   > 1. We can set a `HeartbeatCheck` during scanning like [Hbase StoreScanner](https://github.com/apache/hbase/blob/048ca4e43fdf8b341c9ade5a9d455f627fc76041/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L591), pegasus sever sends heartbeat packets periodically to avoid timeout, which performed like a stream
   
   @Apache9 @Smityz  https://github.com/XiaoMi/pegasus-java-client/pull/156 and https://github.com/XiaoMi/pegasus-go-client/pull/86 have fix `next retry failed when timeout`, you can resolve it before `refactor scanner `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Shuo-Jia edited a comment on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Shuo-Jia edited a comment on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-842253291


   > 1. We can set a `HeartbeatCheck` during scanning like [Hbase StoreScanner](https://github.com/apache/hbase/blob/048ca4e43fdf8b341c9ade5a9d455f627fc76041/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L591), pegasus sever sends heartbeat packets periodically to avoid timeout, which performed like a stream
   
   @Apache9 @Smityz  https://github.com/XiaoMi/pegasus-java-client/pull/156 and https://github.com/XiaoMi/pegasus-go-client/pull/86 have fix `next retry failed when timeout`, you can resolve the problem before `refactor scanner `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Apache9 commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Apache9 commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826269189


   Changing comparator will be a pain, as all the old data can not be read any more. Introduce a table level flag to indicate that whether we should use customized comparator? And we also need to test the performance impact of using customized comparator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] shenxingwuying commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
shenxingwuying commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826267742


   Redesign of Pegasus Scanner, to solve the problem scan timeout.
   In my opinion,the root cause of the problem is the method of data sort.
   Rocksdb's data should use customized Comparator, which will reserve sorted by userkey(hash_key, sort_key), and then 
   the prefix filter should very fast. 
   
   Why comparator use the default ByteWiseComparator at the beginning?
   At this time , maybe pegasus can fix to the new comparator(customized Comparator).
   To avoid data incompatible, we can support two comparator(add new Comparator), and the new pegasus cluster use new comparator.
   
   1、support postfix,should scan all data,the cost as before,  maybe the filter not important.
   2、support prefix,need not scan all data, speed will increase by reduce scans.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Smityz commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Smityz commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-834036108


   If there are no compatibility issues, I think changing the comparator is feasible, look forward to your PR @shenxingwuying 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] Apache9 commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
Apache9 commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826269189


   Changing comparator will be a pain, as all the old data can not be read any more. Introduce a table level flag to indicate that whether we should use customized comparator? And we also need to test the performance impact of using customized comparator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] neverchanje commented on issue #723: Proposal: Redesign of Pegasus Scanner

Posted by GitBox <gi...@apache.org>.
neverchanje commented on issue #723:
URL: https://github.com/apache/incubator-pegasus/issues/723#issuecomment-826645906


   First of all, we use the default ByteWiseComparator because we design the key schema based on it.
   We design the hashkey length ahead of the hashkey bytes in order to prevent key conflict like:
   
   1. hashkey = a, sortkey = xxx
   
   2. hashkey = ax, sortkey = xx
   
   With the default compactor, the two keys are seen as distinct:
   
   ```
   01axxx
   02axxx
   ```
   
   So we chose this method, but didn't consider that one day we would need prefix filtering of hashkey.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org