You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2022/07/13 05:35:14 UTC

[GitHub] [skywalking] wankai123 opened a new pull request, #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

wankai123 opened a new pull request, #9339:
URL: https://github.com/apache/skywalking/pull/9339

   - [ ] If this is non-trivial feature, paste the links/URLs to the design doc.
   - [ ] Update the documentation to include this new feature.
   - [X] Tests(including UT, IT, E2E) are added to verify the new feature.
   - [ ] If it's UI related, attach the screenshots below.
   
   - [ ] If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #<issue number>.
   - [X] Update the [`CHANGES` log](https://github.com/apache/skywalking/blob/master/docs/en/changes/changes.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920610926


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,49 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();

Review Comment:
   You could use `if/else` or `return` fast in these two cases to avoid unnecessary costs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919693168


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,13 +303,33 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
-        indexName = indexNameConverter.apply(indexName);
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIdsGroup key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIdsGroup) {
+        Map<String, List<String>> map = new HashMap<>();
+        indexIdsGroup.forEach((indexName, ids) -> {
+            map.put(indexNameConverter.apply(indexName), ids);
+        });
+        return es.get().documents().mGet(TYPE, map);
+    }
+
+    /**
+     * Query by ids with index alias. When can not locate the physical index.
+     * @param indexAlias Index alias name
+     * @param ids Query ids
+     * @return SearchResponse
+     */
+    public SearchResponse idsWithIndexAlias(String indexAlias, Iterable<String> ids) {

Review Comment:
   If users can locate the physical index, I prefer to recommend users use `_mGet` way,  I just want to use the method and parameter name to remind users. WDYT @wu-sheng 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920066379


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java:
##########
@@ -77,10 +82,10 @@ public long readMetricsValue(final MetricsCondition condition,
 
         sourceBuilder.aggregation(entityIdAggregation);
 
-        final String index =
-            IndexController.LogicIndicesRegister.getPhysicalTableName(condition.getName());
-
-        final SearchResponse response = getClient().search(index, sourceBuilder.build());
+        final SearchResponse response = getClient().search(new TimeRangeIndexNameGenerator(

Review Comment:
   yes, it's work. This API supported https://github.com/apache/skywalking/blob/bf74b667639489e51db016aad575d8b8c4247e26/oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java#L251



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] kezhenxu94 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919774472


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);
             response.getHits().getHits().forEach(hit -> {
                 Metrics source = storageBuilder.storage2Entity(new HashMapConverter.ToEntity(hit.getSource()));
                 result.add(source);
             });
         });
 
+        Map<String, List<String>> indexIdsGroup = new HashMap<>();

Review Comment:
   > `_mget` doesn't support alias if there multi indexes... Do you have a better idea?
   
   Well. Then renaming the method to what @wu-sheng suggest is ok to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919703394


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);

Review Comment:
   @wu-sheng wu



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920606387


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,49 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)

Review Comment:
   I think this comment is not correct anymore. The alias is only used for `!isTimeRelativeID` entities only, right? It doesn't matter at the day edge or not anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920061411


##########
docs/en/changes/changes.md:
##########
@@ -21,6 +21,7 @@
 * Remove legacy OAL `percentile` functions, `p99`, `p95`, `p90`, `p75`, `p50` func(s).
 * Revert [#8066](https://github.com/apache/skywalking/pull/8066). Keep all metrics persistent even it is default value.
 * Skip loading UI templates if folder is empty or doesn't exist.
+* Optimize elasticsearch query performance by using `_mGet` and physical index name.

Review Comment:
   ```suggestion
   * Optimize ElasticSearch query performance by using `_mGet` and physical index name rather than alias in these scenarios,  (a) Metrics aggregation (b) Zipkin query (c) Metrics query
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920088031


##########
docs/en/changes/changes.md:
##########
@@ -21,6 +21,7 @@
 * Remove legacy OAL `percentile` functions, `p99`, `p95`, `p90`, `p75`, `p50` func(s).
 * Revert [#8066](https://github.com/apache/skywalking/pull/8066). Keep all metrics persistent even it is default value.
 * Skip loading UI templates if folder is empty or doesn't exist.
+* Optimize ElasticSearch query performance by using `_mGet` and physical index name rather than alias in these scenarios,  (a) Metrics aggregation (b) Zipkin query (c) Metrics query

Review Comment:
   ```suggestion
   * Optimize ElasticSearch query performance by using `_mGet` and physical index name rather than alias in these scenarios,  (a) Metrics aggregation (b) Zipkin query (c) Metrics query (d) Log query
   ```
   
   Sorry, I missed one too in the last comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919712824


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);
             response.getHits().getHits().forEach(hit -> {
                 Metrics source = storageBuilder.storage2Entity(new HashMapConverter.ToEntity(hit.getSource()));
                 result.add(source);
             });
         });
 
+        Map<String, List<String>> indexIdsGroup = new HashMap<>();

Review Comment:
   We don't have to use `mget`. Like https://github.com/apache/skywalking/pull/9339#discussion_r919702835 mentioned, we could keep alias.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#issuecomment-1182915159

   Let's locate why e2e are failing. It seems there are unexpected effects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920611572


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,7 +303,28 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIds key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIds) {
+        Map<String, List<String>> map = new HashMap<>();
+        indexIds.forEach((indexName, ids) -> {
+            map.put(indexNameConverter.apply(indexName), ids);
+        });
+        return es.get().documents().mGet(TYPE, map);
+    }
+
+    /**
+     * Search by ids with index alias.
+     * When can not locate the physical index. Otherwise, recommend use method {@link #ids}
+     * @param indexName Index alias name or physical name
+     * @param ids ids
+     * @return SearchResponse

Review Comment:
   ```suggestion
        * Search by ids with index alias, when can not locate the physical index. 
        * Otherwise, recommend use method {@link #ids}
        * @param indexName Index alias name or physical name
        * @param ids ID list
        * @return SearchResponse
        * @since 9.2.0 this method was ids
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920062113


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java:
##########
@@ -77,10 +82,10 @@ public long readMetricsValue(final MetricsCondition condition,
 
         sourceBuilder.aggregation(entityIdAggregation);
 
-        final String index =
-            IndexController.LogicIndicesRegister.getPhysicalTableName(condition.getName());
-
-        final SearchResponse response = getClient().search(index, sourceBuilder.build());
+        final SearchResponse response = getClient().search(new TimeRangeIndexNameGenerator(

Review Comment:
   If this works, then `AggregationQueryEsDAO` should be changed in a similar way too.
   
   Both are aggregation queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919702835


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);

Review Comment:
   When the metric id is not TimeRelative. Could not locate the physical index.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 merged pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 merged PR #9339:
URL: https://github.com/apache/skywalking/pull/9339


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920608805


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,49 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();

Review Comment:
   First of all, we don't need to create two Map(s) always. For a particular model, it either uses `alias` or `mget`.  `model#isTimeRelativeID()` is the condition, so, wouldn't be affected by the metrics list.
   
   Also, I noticed, that for alias usage, there is no point to use a Map, as you are going to have one alias(`IndexController.INSTANCE.getTableName(model)` returns consistent).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] sonatype-lift[bot] commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
sonatype-lift[bot] commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920639478


##########
oap-server/server-library/library-elasticsearch-client/src/main/java/org/apache/skywalking/library/elasticsearch/requests/factory/v7/V7DocumentFactory.java:
##########
@@ -88,6 +90,30 @@ public HttpRequest mget(String index, String type, Iterable<String> ids) {
                           .build();
     }
 
+    @SneakyThrows
+    @Override
+    public HttpRequest mget(final String type, final Map<String, List<String>> indexIds) {
+        checkArgument(!isNullOrEmpty(type), "type cannot be null or empty");
+        checkArgument(indexIds != null && !indexIds.isEmpty(), "ids cannot be null or empty");
+        final List<Map<String, String>> indexIdList = new ArrayList<>();
+        indexIds.forEach((index, ids) -> {
+            checkArgument(ids != null && !isEmpty(ids), "ids cannot be null or empty");
+            ids.forEach(id -> {
+                indexIdList.add(ImmutableMap.of("_index", index, "_id", id));
+            });
+        });
+        final Map<String, Iterable<Map<String, String>>> m = ImmutableMap.of("docs", indexIdList);
+        final byte[] content = version.codec().encode(m);
+        if (log.isDebugEnabled()) {
+            log.debug("mget indexIds request: {}", new String(content));

Review Comment:
   *[DefaultCharset](https://errorprone.info/bugpattern/DefaultCharset):*  Implicit use of the platform default charset, which can result in differing behaviour between JVM executions or incorrect behavior if the encoding of the data source doesn't match expectations.
   
   
   ```suggestion
               log.debug("mget indexIds request: {}", new String(content, Charset.defaultCharset()));
   ```
   
   
   
   Reply with *"**@sonatype-lift help**"* for info about LiftBot commands.
   Reply with *"**@sonatype-lift ignore**"* to tell LiftBot to leave out the above finding from this PR.
   Reply with *"**@sonatype-lift ignoreall**"* to tell LiftBot to leave out all the findings from this PR and from the status bar in Github.
   
   When talking to LiftBot, you need to **refresh** the page to see its response. [Click here](https://help.sonatype.com/lift/talking-to-lift) to get to know more about LiftBot commands.
   
   ---
   
   Was this a good recommendation?
   [ [🙁 Not relevant](https://www.sonatype.com/lift-comment-rating?comment=297463968&lift_comment_rating=1) ] - [ [😕 Won't fix](https://www.sonatype.com/lift-comment-rating?comment=297463968&lift_comment_rating=2) ] - [ [😑 Not critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=297463968&lift_comment_rating=3) ] - [ [🙂 Critical, will fix](https://www.sonatype.com/lift-comment-rating?comment=297463968&lift_comment_rating=4) ] - [ [😊 Critical, fixing now](https://www.sonatype.com/lift-comment-rating?comment=297463968&lift_comment_rating=5) ]



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920060646


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java:
##########
@@ -77,10 +82,10 @@ public long readMetricsValue(final MetricsCondition condition,
 
         sourceBuilder.aggregation(entityIdAggregation);
 
-        final String index =
-            IndexController.LogicIndicesRegister.getPhysicalTableName(condition.getName());
-
-        final SearchResponse response = getClient().search(index, sourceBuilder.build());
+        final SearchResponse response = getClient().search(new TimeRangeIndexNameGenerator(

Review Comment:
   If the time range crosses days, then the query would apply for multiple physical indices names. Does this work?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920611736


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,7 +303,28 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIds key: indexName, value: ids list
+     * @return Documents

Review Comment:
   ```suggestion
        * Provide to get documents from multi indices by IDs.
        * @param indexIds key: indexName, value: ids list
        * @return Documents
        * @since 9.2.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920612734


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,7 +303,28 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIds key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIds) {

Review Comment:
   Why do other search APIs return `SearchResponse`, but here returning `Optional<Documents>`. Is this on purpose?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920613592


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,7 +303,28 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIds key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIds) {

Review Comment:
   The origin ES API returned differently



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r920613592


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,7 +303,28 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIds key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIds) {

Review Comment:
   The origin ES API returned differently, this from document API, not search



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] kezhenxu94 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919671464


##########
oap-server/server-library/library-elasticsearch-client/src/main/java/org/apache/skywalking/library/elasticsearch/requests/factory/DocumentFactory.java:
##########
@@ -39,6 +40,11 @@ public interface DocumentFactory {
      */
     HttpRequest mget(String index, String type, Iterable<String> ids);
 
+    /**
+     * Returns a request to get multiple documents of {@code indexIds}.
+     */
+    HttpRequest mget(final String type, final Map<String, List<String>> indexIdsGroup);

Review Comment:
   You wrote `indexIds` in JavaDoc and `Group` is ok to remove as `Map` type can indicate that.
   
   ```suggestion
       HttpRequest mget(final String type, final Map<String, List<String>> indexIds);
   ```



##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/TimeSeriesUtils.java:
##########
@@ -80,6 +82,33 @@ public static String[] superDatasetIndexNames(String indexName, long startSecond
         }
     }
 
+    public static String queryIndexName(String tableName,
+                                        long pointOfTB,
+                                        Step step,
+                                        boolean isRecord,
+                                        boolean isSuperDataSet) {
+        if (StringUtil.isBlank(tableName) || pointOfTB <= 0) {
+            throw new IllegalArgumentException(
+                "Arguments [tableName]: " + tableName + " can not be blank and [pointOfTB]: " + pointOfTB + " can not <= 0");
+        }
+        if (isRecord && isSuperDataSet) {
+            return tableName + Const.LINE + compressTimeBucket(pointOfTB / 1000000, SUPER_DATASET_DAY_STEP);
+        } else {
+            switch (step) {
+                case DAY:
+                    return tableName + Const.LINE + compressTimeBucket(pointOfTB, DAY_STEP);
+                case HOUR:
+                    return tableName + Const.LINE + compressTimeBucket(pointOfTB / 100, DAY_STEP);
+                case MINUTE:
+                    return tableName + Const.LINE + compressTimeBucket(pointOfTB / 10000, DAY_STEP);
+                case SECOND:
+                    return tableName + Const.LINE + compressTimeBucket(pointOfTB / 1000000, DAY_STEP);
+            }
+        }

Review Comment:
   ```suggestion
           }
           switch (step) {
               case DAY:
                   return tableName + Const.LINE + compressTimeBucket(pointOfTB, DAY_STEP);
               case HOUR:
                   return tableName + Const.LINE + compressTimeBucket(pointOfTB / 100, DAY_STEP);
               case MINUTE:
                   return tableName + Const.LINE + compressTimeBucket(pointOfTB / 10000, DAY_STEP);
               case SECOND:
                   return tableName + Const.LINE + compressTimeBucket(pointOfTB / 1000000, DAY_STEP);
           }
   ```



##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,13 +303,33 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
-        indexName = indexNameConverter.apply(indexName);
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIdsGroup key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIdsGroup) {
+        Map<String, List<String>> map = new HashMap<>();
+        indexIdsGroup.forEach((indexName, ids) -> {
+            map.put(indexNameConverter.apply(indexName), ids);
+        });
+        return es.get().documents().mGet(TYPE, map);
+    }
+
+    /**
+     * Query by ids with index alias. When can not locate the physical index.
+     * @param indexAlias Index alias name
+     * @param ids Query ids
+     * @return SearchResponse
+     */
+    public SearchResponse idsWithIndexAlias(String indexAlias, Iterable<String> ids) {

Review Comment:
   What about just keep the method name `ids` and parameter `index`? This method still can query by both physical index or alias. Just update the java doc `@param index Index name or alias name`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wankai123 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wankai123 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919709309


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);
             response.getHits().getHits().forEach(hit -> {
                 Metrics source = storageBuilder.storage2Entity(new HashMapConverter.ToEntity(hit.getSource()));
                 result.add(source);
             });
         });
 
+        Map<String, List<String>> indexIdsGroup = new HashMap<>();

Review Comment:
   `_mget` doesn't support alias if there multi indexes... Do you have a better idea?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] kezhenxu94 commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919706158


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);
             response.getHits().getHits().forEach(hit -> {
                 Metrics source = storageBuilder.storage2Entity(new HashMapConverter.ToEntity(hit.getSource()));
                 result.add(source);
             });
         });
 
+        Map<String, List<String>> indexIdsGroup = new HashMap<>();

Review Comment:
   I think we can merge `groupAliasIndices` into `groupIndices` and call `getClient().ids` once, then remove `idsWithIndexAlias`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919700305


##########
oap-server/server-library/library-client/src/main/java/org/apache/skywalking/oap/server/library/client/elasticsearch/ElasticSearchClient.java:
##########
@@ -300,13 +303,33 @@ public boolean existDoc(String indexName, String id) {
         return es.get().documents().exists(indexName, TYPE, id);
     }
 
-    public SearchResponse ids(String indexName, Iterable<String> ids) {
-        indexName = indexNameConverter.apply(indexName);
+
+    /**
+     * @since 9.2.0 Provide to get documents from multi indices by ids.
+     * @param indexIdsGroup key: indexName, value: ids list
+     * @return Documents
+     */
+    public Optional<Documents> ids(Map<String, List<String>> indexIdsGroup) {
+        Map<String, List<String>> map = new HashMap<>();
+        indexIdsGroup.forEach((indexName, ids) -> {
+            map.put(indexNameConverter.apply(indexName), ids);
+        });
+        return es.get().documents().mGet(TYPE, map);
+    }
+
+    /**
+     * Query by ids with index alias. When can not locate the physical index.
+     * @param indexAlias Index alias name
+     * @param ids Query ids
+     * @return SearchResponse
+     */
+    public SearchResponse idsWithIndexAlias(String indexAlias, Iterable<String> ids) {

Review Comment:
   Where would we use `idsWithIndexAlias`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [skywalking] wu-sheng commented on a diff in pull request #9339: Optimize elasticsearch query performance by using `_mGet` and physical index name

Posted by GitBox <gi...@apache.org>.
wu-sheng commented on code in PR #9339:
URL: https://github.com/apache/skywalking/pull/9339#discussion_r919712245


##########
oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/MetricsEsDAO.java:
##########
@@ -52,50 +52,47 @@ protected MetricsEsDAO(ElasticSearchClient client,
 
     @Override
     public List<Metrics> multiGet(Model model, List<Metrics> metrics) {
-        Map<String, List<Metrics>> groupIndices
-            = metrics.stream()
-                     .collect(
-                         groupingBy(metric -> {
+        Map<String, List<Metrics>> groupAliasIndices = new HashMap<>();
+        Map<String, List<Metrics>> groupIndices = new HashMap<>();
+        metrics.forEach(metric -> {
                              if (model.isTimeRelativeID()) {
                                  // Try to use with timestamp index name(write index),
-                                 // if latest cache shows this name doesn't exist,
-                                 // then fail back to template alias name.
-                                 // This should only happen in very rare case, such as this is the time to create new index
-                                 // as a new day comes, and the index cache is  pseudo real time.
-                                 // This case doesn't affect the result, just has lower performance due to using the alias name.
-                                 // Another case is that a removed index showing existing also due to latency,
-                                 // which could cause multiGet fails
-                                 // but this should not happen in the real runtime, TTL timer only removed the oldest indices,
-                                 // which should not have an update/insert.
                                  String indexName = TimeSeriesUtils.writeIndexName(model, metric.getTimeBucket());
-                                 // Format the name to follow the global physical index naming policy.
-                                 if (!IndicesMetadataCache.INSTANCE.isExisting(
-                                     getClient().formatIndexName(indexName))) {
-                                     indexName = IndexController.INSTANCE.getTableName(model);
-                                 }
-                                 return indexName;
+                                 groupIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              } else {
                                  // Metadata level metrics, always use alias name, due to the physical index of the records
                                  // can't be located through timestamp.
-                                 return IndexController.INSTANCE.getTableName(model);
+                                 String indexName = IndexController.INSTANCE.getTableName(model);
+                                 groupAliasIndices.computeIfAbsent(indexName, v -> new ArrayList<>()).add(metric);
                              }
-                         })
-                     );
-
+                         });
         // The groupIndices mostly include one or two group,
         // the current day and the T-1 day(if at the edge between days)
         List<Metrics> result = new ArrayList<>(metrics.size());
-        groupIndices.forEach((tableName, metricList) -> {
+        groupAliasIndices.forEach((tableName, metricList) -> {
             List<String> ids = metricList.stream()
                                          .map(item -> IndexController.INSTANCE.generateDocId(model, item.id()))
                                          .collect(Collectors.toList());
-            final SearchResponse response = getClient().ids(tableName, ids);
+            final SearchResponse response = getClient().idsWithIndexAlias(tableName, ids);

Review Comment:
   I think renaming `idsWithIndexAlias` to `searchIDs` would make things more clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org