You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morningman (via GitHub)" <gi...@apache.org> on 2023/06/28 08:16:55 UTC

[GitHub] [doris] morningman opened a new pull request, #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

morningman opened a new pull request, #21283:
URL: https://github.com/apache/doris/pull/21283

   ## Proposed changes
   
   When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache.
   And it will call "FileInputFormat.setInputPaths()".
   In this method, it will create a new FileSystem instance and store it in FileSystem's cache.
   So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM.
   
   This PR disable the FileSystem Cache.
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] yiguolei merged pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #21283:
URL: https://github.com/apache/doris/pull/21283


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] hello-stephen commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1611172665

   TeamCity pipeline, clickbench performance test result:
    the sum of best hot time: 35.84 seconds
    stream load tsv:          451 seconds loaded 74807831229 Bytes, about 158 MB/s
    stream load json:         23 seconds loaded 2358488459 Bytes, about 97 MB/s
    stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
    stream load parquet:          29 seconds loaded 861443392 Bytes, about 28 MB/s
    insert into select:          70.1 seconds inserted 10000000 Rows, about 142K ops/s
    https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230628104032_clickbench_pr_169312.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21283:
URL: https://github.com/apache/doris/pull/21283#discussion_r1244864299


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:
##########
@@ -328,6 +329,17 @@ private FileCacheValue loadFiles(FileCacheKey key) {
         try {
             Thread.currentThread().setContextClassLoader(ClassLoader.getSystemClassLoader());
             String finalLocation = S3Util.convertToS3IfNecessary(key.location);
+            // disable the fs cache in FileSystem, or it will always from new FileSystem
+            // and save it in cache when calling FileInputFormat.setInputPaths().
+            try {
+                Path path = new Path(finalLocation);
+                URI uri = path.toUri();
+                if (uri.getScheme() != null) {
+                    updateJobConf("fs." + uri.getScheme() + ".impl.disable.cache", "true");

Review Comment:
   s3 scheme: s3 and s3a
   cos scheme: cosn



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1611093924

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1612291112

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1610991475

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on a diff in pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on code in PR #21283:
URL: https://github.com/apache/doris/pull/21283#discussion_r1244962808


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:
##########
@@ -328,6 +329,17 @@ private FileCacheValue loadFiles(FileCacheKey key) {
         try {
             Thread.currentThread().setContextClassLoader(ClassLoader.getSystemClassLoader());
             String finalLocation = S3Util.convertToS3IfNecessary(key.location);
+            // disable the fs cache in FileSystem, or it will always from new FileSystem
+            // and save it in cache when calling FileInputFormat.setInputPaths().
+            try {
+                Path path = new Path(finalLocation);
+                URI uri = path.toUri();
+                if (uri.getScheme() != null) {
+                    updateJobConf("fs." + uri.getScheme() + ".impl.disable.cache", "true");

Review Comment:
   It does need to be same logic as in `FileInputFormat.setInputPaths()`, which is to use `schema` is file location.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org