You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2020/09/18 09:03:51 UTC

[GitHub] [incubator-pegasus] hycdong commented on a change in pull request #603: feat(hotkey detection): build a fundamental framework of hotkey detection

hycdong commented on a change in pull request #603:
URL: https://github.com/apache/incubator-pegasus/pull/603#discussion_r490799358



##########
File path: src/server/hotkey_coarse_data_collector.h
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "base/pegasus_utils.h"
+
+namespace pegasus {
+namespace server {
+
+// hotkey_coarse_data_collector handles the first procedure (COARSE) of hotkey detection.
+// It captures the data without recording them, but simply divides the incoming requests
+// into a number of buckets and counts the accessed times of each bucket.
+// If the variance among the buckets exceeds the threshold, the most frequently accessed bucket
+// is regarded to contain the hotkey.
+//
+// This technique intends to reduce the load of data recording during FINE procedure,
+// filtering what's unnecessary to catch.
+class hotkey_coarse_data_collector
+{
+public:
+    // capture `hash_key` into the internal bucket
+    void capture_data(const dsn::blob &hash_key, uint64_t size);

Review comment:
       What the `size` used for?

##########
File path: src/server/hotkey_coarse_data_collector.cpp
##########
@@ -0,0 +1,18 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "hotkey_coarse_data_collector.h"

Review comment:
       I suggest adding consturctor function, the same as other classes

##########
File path: src/server/hotkey_collector.h
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <dsn/utility/string_view.h>
+#include <rrdb/rrdb_types.h>
+
+namespace pegasus {
+namespace server {
+
+class hotkey_coarse_data_collector;
+class hotkey_fine_data_collector;
+
+// hotkey_collector is responsible to find the hot keys after the partition
+// was detected to be hot. The two types of hotkey, READ & WRITE, are detected
+// separately.
+class hotkey_collector

Review comment:
       I suggest adding the process graph in this class to show the process of finding the hotkey.

##########
File path: src/server/hotkey_collector.h
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <dsn/utility/string_view.h>
+#include <rrdb/rrdb_types.h>
+
+namespace pegasus {
+namespace server {
+
+class hotkey_coarse_data_collector;
+class hotkey_fine_data_collector;
+
+// hotkey_collector is responsible to find the hot keys after the partition
+// was detected to be hot. The two types of hotkey, READ & WRITE, are detected
+// separately.
+class hotkey_collector
+{
+public:
+    // size: the cu size of raw_key/hash_key calculated by `capacity_unit_calculator`

Review comment:
       I think size is not suitable enough to show cu.

##########
File path: src/server/hotkey_coarse_data_collector.h
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "base/pegasus_utils.h"
+
+namespace pegasus {
+namespace server {
+
+// hotkey_coarse_data_collector handles the first procedure (COARSE) of hotkey detection.
+// It captures the data without recording them, but simply divides the incoming requests
+// into a number of buckets and counts the accessed times of each bucket.
+// If the variance among the buckets exceeds the threshold, the most frequently accessed bucket
+// is regarded to contain the hotkey.
+//
+// This technique intends to reduce the load of data recording during FINE procedure,
+// filtering what's unnecessary to catch.
+class hotkey_coarse_data_collector
+{
+public:
+    // capture `hash_key` into the internal bucket
+    void capture_data(const dsn::blob &hash_key, uint64_t size);
+
+    // Timely to analyze the data of the internal storage structure
+    // returns: id of the most accessed bucket.
+    //          -1 if not hot bucket is found.
+    int get_hotest_bucket();
+
+private:
+    // internal storage structure of hotkey_coarse_data_collector
+    // store the captured key into a hash table
+    // key:hash(hash_key)->value:count of this hash bucket
+    std::vector<std::atomic<uint64_t>> _hash_buckets;

Review comment:
       It seems `_hash_buckets` should be a map, not a vector.

##########
File path: src/server/hotkey_coarse_data_collector.h
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "base/pegasus_utils.h"
+
+namespace pegasus {
+namespace server {
+
+// hotkey_coarse_data_collector handles the first procedure (COARSE) of hotkey detection.
+// It captures the data without recording them, but simply divides the incoming requests
+// into a number of buckets and counts the accessed times of each bucket.
+// If the variance among the buckets exceeds the threshold, the most frequently accessed bucket
+// is regarded to contain the hotkey.
+//
+// This technique intends to reduce the load of data recording during FINE procedure,
+// filtering what's unnecessary to catch.
+class hotkey_coarse_data_collector
+{
+public:
+    // capture `hash_key` into the internal bucket
+    void capture_data(const dsn::blob &hash_key, uint64_t size);
+
+    // Timely to analyze the data of the internal storage structure
+    // returns: id of the most accessed bucket.

Review comment:
       What is the `the most accessed bucket`? Is it the hotest bucket?

##########
File path: src/server/hotkey_collector.h
##########
@@ -0,0 +1,53 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <dsn/utility/string_view.h>
+#include <rrdb/rrdb_types.h>
+
+namespace pegasus {
+namespace server {
+
+class hotkey_coarse_data_collector;
+class hotkey_fine_data_collector;
+
+// hotkey_collector is responsible to find the hot keys after the partition
+// was detected to be hot. The two types of hotkey, READ & WRITE, are detected
+// separately.
+class hotkey_collector
+{
+public:
+    // size: the cu size of raw_key/hash_key calculated by `capacity_unit_calculator`
+    void capture_raw_key(const dsn::blob &raw_key, uint64_t size);
+    void capture_hash_key(const dsn::blob &hash_key, uint64_t size);
+    // analyse_data is a periodic task, only valid when _state == collector_state::COARSE
+    // || collector_state::FINE
+    void analyse_data();
+    bool handle_operation(dsn::apps::hotkey_detect_action::type action, std::string &err_hint);
+    // find outlier in both coarse capture and fine capture
+    static int cal_outlier(const std::vector<uint64_t> &data_samples, int threshold);
+    // unify the hash way in both coarse capture and fine capture
+    static int get_bucket_id(dsn::string_view data);
+

Review comment:
       I could not figure out what the functions uage by function definitions and comments. I suggest you remove some function in this pull request, otherwise, you should explain what those functions used for clearly. 

##########
File path: src/server/hotkey_fine_data_collector.h
##########
@@ -0,0 +1,52 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "base/pegasus_utils.h"
+#include <readerwriterqueue/readerwriterqueue.h>
+
+namespace pegasus {
+namespace server {
+
+typedef std::vector<moodycamel::ReaderWriterQueue<std::pair<dsn::blob, uint64_t>>>
+    lockfree_capture_queues;
+
+// hotkey_fine_data_collector handles the second procedure (FINE) of hotkey detection.
+// It captures only the data mapping to the "hot" bucket.
+//
+// To prevent locking on the read path, we create one queue per thread of THREAD_POOL_LOCAL_APP.
+// The read request is captured right inside its execution thread.
+//
+// For writes we do not apply this optimization.
+class hotkey_fine_data_collector
+{
+public:
+    // capture `hash_key` into internal storage structure
+    void capture_data(const dsn::blob &hash_key, uint64_t size);

Review comment:
       The three classes have function `capture_data` or `capture_hash_key`, what is the difference between them?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org