You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "morningman (via GitHub)" <gi...@apache.org> on 2023/06/28 08:16:55 UTC
[GitHub] [doris] morningman opened a new pull request, #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
morningman opened a new pull request, #21283:
URL: https://github.com/apache/doris/pull/21283
## Proposed changes
When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache.
And it will call "FileInputFormat.setInputPaths()".
In this method, it will create a new FileSystem instance and store it in FileSystem's cache.
So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM.
This PR disable the FileSystem Cache.
## Further comments
If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] yiguolei merged pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "yiguolei (via GitHub)" <gi...@apache.org>.
yiguolei merged PR #21283:
URL: https://github.com/apache/doris/pull/21283
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] hello-stephen commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "hello-stephen (via GitHub)" <gi...@apache.org>.
hello-stephen commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1611172665
TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 35.84 seconds
stream load tsv: 451 seconds loaded 74807831229 Bytes, about 158 MB/s
stream load json: 23 seconds loaded 2358488459 Bytes, about 97 MB/s
stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
stream load parquet: 29 seconds loaded 861443392 Bytes, about 28 MB/s
insert into select: 70.1 seconds inserted 10000000 Rows, about 142K ops/s
https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/tmp/20230628104032_clickbench_pr_169312.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] wsjz commented on a diff in pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21283:
URL: https://github.com/apache/doris/pull/21283#discussion_r1244864299
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:
##########
@@ -328,6 +329,17 @@ private FileCacheValue loadFiles(FileCacheKey key) {
try {
Thread.currentThread().setContextClassLoader(ClassLoader.getSystemClassLoader());
String finalLocation = S3Util.convertToS3IfNecessary(key.location);
+ // disable the fs cache in FileSystem, or it will always from new FileSystem
+ // and save it in cache when calling FileInputFormat.setInputPaths().
+ try {
+ Path path = new Path(finalLocation);
+ URI uri = path.toUri();
+ if (uri.getScheme() != null) {
+ updateJobConf("fs." + uri.getScheme() + ".impl.disable.cache", "true");
Review Comment:
s3 scheme: s3 and s3a
cos scheme: cosn
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] morningman commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1611093924
run buildall
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1612291112
PR approved by at least one committer and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] github-actions[bot] commented on pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21283:
URL: https://github.com/apache/doris/pull/21283#issuecomment-1610991475
PR approved by anyone and no changes requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [doris] morningman commented on a diff in pull request #21283: [fix](catalog) disable FileSystem Cache to avoid too many fs cache
Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on code in PR #21283:
URL: https://github.com/apache/doris/pull/21283#discussion_r1244962808
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/hive/HiveMetaStoreCache.java:
##########
@@ -328,6 +329,17 @@ private FileCacheValue loadFiles(FileCacheKey key) {
try {
Thread.currentThread().setContextClassLoader(ClassLoader.getSystemClassLoader());
String finalLocation = S3Util.convertToS3IfNecessary(key.location);
+ // disable the fs cache in FileSystem, or it will always from new FileSystem
+ // and save it in cache when calling FileInputFormat.setInputPaths().
+ try {
+ Path path = new Path(finalLocation);
+ URI uri = path.toUri();
+ if (uri.getScheme() != null) {
+ updateJobConf("fs." + uri.getScheme() + ".impl.disable.cache", "true");
Review Comment:
It does need to be same logic as in `FileInputFormat.setInputPaths()`, which is to use `schema` is file location.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org