You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "Jibing-Li (via GitHub)" <gi...@apache.org> on 2023/06/16 08:33:36 UTC

[GitHub] [doris] Jibing-Li opened a new pull request, #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Jibing-Li opened a new pull request, #20906:
URL: https://github.com/apache/doris/pull/20906

   <!--Describe your changes.-->
   
   The current column statistic cache loader is to load data from column_statistics olap table.
   This pr is to change the cache loader logic to First load from column_statistics olap table, if no data was loaded, then load from table metadata. This is mainly to support fetch statistics data for external catalog using HMS or Iceberg api.
   This is the first PR, next pr will implement the fetch logic for different external catalogs.
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Jibing-Li commented on a diff in pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "Jibing-Li (via GitHub)" <gi...@apache.org>.
Jibing-Li commented on code in PR #20906:
URL: https://github.com/apache/doris/pull/20906#discussion_r1233448879


##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -1753,7 +1753,7 @@ public class Config extends ConfigBase {
      * Otherwise, use external catalog metadata.
      */
     @ConfField(mutable = true)
-    public static boolean collect_external_table_stats_by_sql = false;
+    public static boolean collect_external_table_stats_by_sql = true;

Review Comment:
   Will remove this in next PR. Next PR will move the code of fetching statistics from data source to TableIf, and will remove this config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Jibing-Li commented on pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "Jibing-Li (via GitHub)" <gi...@apache.org>.
Jibing-Li commented on PR #20906:
URL: https://github.com/apache/doris/pull/20906#issuecomment-1598005196

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Jibing-Li commented on pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "Jibing-Li (via GitHub)" <gi...@apache.org>.
Jibing-Li commented on PR #20906:
URL: https://github.com/apache/doris/pull/20906#issuecomment-1596575044

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20906:
URL: https://github.com/apache/doris/pull/20906#issuecomment-1596529962

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Jibing-Li commented on pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "Jibing-Li (via GitHub)" <gi...@apache.org>.
Jibing-Li commented on PR #20906:
URL: https://github.com/apache/doris/pull/20906#issuecomment-1596700445

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] Jibing-Li commented on a diff in pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "Jibing-Li (via GitHub)" <gi...@apache.org>.
Jibing-Li commented on code in PR #20906:
URL: https://github.com/apache/doris/pull/20906#discussion_r1233562058


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/ColumnStatisticsCacheLoader.java:
##########
@@ -42,20 +43,39 @@ public class ColumnStatisticsCacheLoader extends StatisticsCacheLoader<Optional<
 
     @Override
     protected Optional<ColumnStatistic> doLoad(StatisticsCacheKey key) {
+        // Load from statistics table.
+        Optional<ColumnStatistic> columnStatistic = loadFromStatsTable(String.valueOf(key.tableId),
+                String.valueOf(key.idxId), key.colName);
+        if (columnStatistic.isPresent()) {
+            return columnStatistic;
+        }
+        // Load from data source metadata
+        try {
+            TableIf table = Env.getCurrentEnv().getCatalogMgr().getCatalog(key.catalogId)

Review Comment:
   内表如果走到这里不会去查,直接返回Optional.empty();



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on code in PR #20906:
URL: https://github.com/apache/doris/pull/20906#discussion_r1232368298


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/ColumnStatisticsCacheLoader.java:
##########
@@ -42,20 +43,38 @@ public class ColumnStatisticsCacheLoader extends StatisticsCacheLoader<Optional<
 
     @Override
     protected Optional<ColumnStatistic> doLoad(StatisticsCacheKey key) {
+        // Load from statistics table.
+        Optional<ColumnStatistic> columnStatistic = loadFromStatsTable(String.valueOf(key.tableId),
+                String.valueOf(key.idxId), key.colName);
+        if (columnStatistic.isPresent()) {
+            return columnStatistic;
+        }
+        // Load from data source metadata
+        try {
+            TableIf table = Env.getCurrentEnv().getCatalogMgr().getCatalog(key.catalogId)
+                    .getDbOrMetaException(key.dbId).getTableOrMetaException(key.tableId);
+            columnStatistic = table.getColumnStatistic();
+        } catch (Exception e) {
+            LOG.warn("Exception to get column statistics by metadata.", e);

Review Comment:
   better print catalog,db,table info



##########
fe/fe-common/src/main/java/org/apache/doris/common/Config.java:
##########
@@ -1753,7 +1753,7 @@ public class Config extends ConfigBase {
      * Otherwise, use external catalog metadata.
      */
     @ConfField(mutable = true)
-    public static boolean collect_external_table_stats_by_sql = false;
+    public static boolean collect_external_table_stats_by_sql = true;

Review Comment:
   How about just remove this config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman merged PR #20906:
URL: https://github.com/apache/doris/pull/20906


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #20906:
URL: https://github.com/apache/doris/pull/20906#issuecomment-1596529924

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] englefly commented on a diff in pull request #20906: [Improvement](multi catalog, statistics)Support two level external statistics cache loader.

Posted by "englefly (via GitHub)" <gi...@apache.org>.
englefly commented on code in PR #20906:
URL: https://github.com/apache/doris/pull/20906#discussion_r1233548073


##########
fe/fe-core/src/main/java/org/apache/doris/statistics/ColumnStatisticsCacheLoader.java:
##########
@@ -42,20 +43,39 @@ public class ColumnStatisticsCacheLoader extends StatisticsCacheLoader<Optional<
 
     @Override
     protected Optional<ColumnStatistic> doLoad(StatisticsCacheKey key) {
+        // Load from statistics table.
+        Optional<ColumnStatistic> columnStatistic = loadFromStatsTable(String.valueOf(key.tableId),
+                String.valueOf(key.idxId), key.colName);
+        if (columnStatistic.isPresent()) {
+            return columnStatistic;
+        }
+        // Load from data source metadata
+        try {
+            TableIf table = Env.getCurrentEnv().getCatalogMgr().getCatalog(key.catalogId)

Review Comment:
   内表的column 会跑到外表查一次吗?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org