You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by "wsjz (via GitHub)" <gi...@apache.org> on 2023/06/27 07:30:00 UTC

[GitHub] [doris] wsjz opened a new pull request, #21238: [fix](multi-catalog)fix obj file cache

wsjz opened a new pull request, #21238:
URL: https://github.com/apache/doris/pull/21238

   ## Proposed changes
   
   Issue Number: close #xxx
   
   <!--Describe your changes.-->
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1614274408

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246169572


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/dlf/DLFCatalog.java:
##########
@@ -38,4 +47,26 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         String tableName = tableIdentifier.name();
         return new DLFTableOperations(this.conf, this.clients, this.fileIO, this.uid, dbName, tableName);
     }
+
+    protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
+        // read from converted properties or default by old s3 aws properties
+        String endpoint = properties.getOrDefault(Constants.ENDPOINT_KEY, properties.get(S3Properties.Env.ENDPOINT));
+        CloudCredential credential = new CloudCredential();
+        credential.setAccessKey(properties.getOrDefault(OssProperties.ACCESS_KEY,
+                    properties.get(S3Properties.Env.ACCESS_KEY)));
+        credential.setSecretKey(properties.getOrDefault(OssProperties.SECRET_KEY,
+                    properties.get(S3Properties.Env.SECRET_KEY)));
+        if (properties.containsKey(OssProperties.SESSION_TOKEN)
+                || properties.containsKey(S3Properties.Env.TOKEN)) {
+            credential.setSessionToken(properties.getOrDefault(OssProperties.SESSION_TOKEN,
+                    properties.get(S3Properties.Env.TOKEN)));
+        }
+        String region = properties.getOrDefault(OssProperties.REGION, properties.get(S3Properties.Env.REGION));
+        // s3 file io just supports s3-like endpoint
+        String s3Endpoint = endpoint.replace(region, "s3." + region);
+        URI endpointUri = URI.create(s3Endpoint);
+        FileIO io = new S3FileIO(() -> S3Util.buildS3Client(endpointUri, region, credential));

Review Comment:
   I find s3 file io is faster than hadoop file io



##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/HiveCompatibleCatalog.java:
##########
@@ -57,7 +57,7 @@ public void initialize(String name, FileIO fileIO,
     protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
         String fileIOImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
         if (fileIOImpl == null) {
-            FileIO io = new S3FileIO();
+            FileIO io = new HadoopFileIO(hadoopConf);

Review Comment:
   s3 file need some custom configuration, so hadoop io is used in superclass by default, we can add better implementations to derived class just like the implementation in dlf catalog
   .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1613370501

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1613372002

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246173988


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/HiveCompatibleCatalog.java:
##########
@@ -57,7 +57,7 @@ public void initialize(String name, FileIO fileIO,
     protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
         String fileIOImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
         if (fileIOImpl == null) {
-            FileIO io = new S3FileIO();
+            FileIO io = new HadoopFileIO(hadoopConf);

Review Comment:
   Note: s3 file need some custom configuration, so hadoop io is used in superclass by default, we can add better implementations to derived class just like the implementation in dlf catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246169572


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/dlf/DLFCatalog.java:
##########
@@ -38,4 +47,26 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         String tableName = tableIdentifier.name();
         return new DLFTableOperations(this.conf, this.clients, this.fileIO, this.uid, dbName, tableName);
     }
+
+    protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
+        // read from converted properties or default by old s3 aws properties
+        String endpoint = properties.getOrDefault(Constants.ENDPOINT_KEY, properties.get(S3Properties.Env.ENDPOINT));
+        CloudCredential credential = new CloudCredential();
+        credential.setAccessKey(properties.getOrDefault(OssProperties.ACCESS_KEY,
+                    properties.get(S3Properties.Env.ACCESS_KEY)));
+        credential.setSecretKey(properties.getOrDefault(OssProperties.SECRET_KEY,
+                    properties.get(S3Properties.Env.SECRET_KEY)));
+        if (properties.containsKey(OssProperties.SESSION_TOKEN)
+                || properties.containsKey(S3Properties.Env.TOKEN)) {
+            credential.setSessionToken(properties.getOrDefault(OssProperties.SESSION_TOKEN,
+                    properties.get(S3Properties.Env.TOKEN)));
+        }
+        String region = properties.getOrDefault(OssProperties.REGION, properties.get(S3Properties.Env.REGION));
+        // s3 file io just supports s3-like endpoint
+        String s3Endpoint = endpoint.replace(region, "s3." + region);
+        URI endpointUri = URI.create(s3Endpoint);
+        FileIO io = new S3FileIO(() -> S3Util.buildS3Client(endpointUri, region, credential));

Review Comment:
   Note:  I find s3 file io is faster than hadoop file io



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] AshinGau commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "AshinGau (via GitHub)" <gi...@apache.org>.
AshinGau commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1614283561

   LGTM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246169572


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/dlf/DLFCatalog.java:
##########
@@ -38,4 +47,26 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         String tableName = tableIdentifier.name();
         return new DLFTableOperations(this.conf, this.clients, this.fileIO, this.uid, dbName, tableName);
     }
+
+    protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
+        // read from converted properties or default by old s3 aws properties
+        String endpoint = properties.getOrDefault(Constants.ENDPOINT_KEY, properties.get(S3Properties.Env.ENDPOINT));
+        CloudCredential credential = new CloudCredential();
+        credential.setAccessKey(properties.getOrDefault(OssProperties.ACCESS_KEY,
+                    properties.get(S3Properties.Env.ACCESS_KEY)));
+        credential.setSecretKey(properties.getOrDefault(OssProperties.SECRET_KEY,
+                    properties.get(S3Properties.Env.SECRET_KEY)));
+        if (properties.containsKey(OssProperties.SESSION_TOKEN)
+                || properties.containsKey(S3Properties.Env.TOKEN)) {
+            credential.setSessionToken(properties.getOrDefault(OssProperties.SESSION_TOKEN,
+                    properties.get(S3Properties.Env.TOKEN)));
+        }
+        String region = properties.getOrDefault(OssProperties.REGION, properties.get(S3Properties.Env.REGION));
+        // s3 file io just supports s3-like endpoint
+        String s3Endpoint = endpoint.replace(region, "s3." + region);
+        URI endpointUri = URI.create(s3Endpoint);
+        FileIO io = new S3FileIO(() -> S3Util.buildS3Client(endpointUri, region, credential));

Review Comment:
   Note:  I find s3 file io is faster than hadoop file io. 
   The reason maybe is that the hadoop file io will load the hadoop configuration when create input file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] morningman merged pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "morningman (via GitHub)" <gi...@apache.org>.
morningman merged PR #21238:
URL: https://github.com/apache/doris/pull/21238


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246173988


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/HiveCompatibleCatalog.java:
##########
@@ -57,7 +57,7 @@ public void initialize(String name, FileIO fileIO,
     protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
         String fileIOImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
         if (fileIOImpl == null) {
-            FileIO io = new S3FileIO();
+            FileIO io = new HadoopFileIO(hadoopConf);

Review Comment:
   Note: s3 file need some custom configuration, so hadoop io is used in superclass by default, we can add better implementations to derived class just like the implementation in dlf catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on a diff in pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on code in PR #21238:
URL: https://github.com/apache/doris/pull/21238#discussion_r1246169572


##########
fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/dlf/DLFCatalog.java:
##########
@@ -38,4 +47,26 @@ protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
         String tableName = tableIdentifier.name();
         return new DLFTableOperations(this.conf, this.clients, this.fileIO, this.uid, dbName, tableName);
     }
+
+    protected FileIO initializeFileIO(Map<String, String> properties, Configuration hadoopConf) {
+        // read from converted properties or default by old s3 aws properties
+        String endpoint = properties.getOrDefault(Constants.ENDPOINT_KEY, properties.get(S3Properties.Env.ENDPOINT));
+        CloudCredential credential = new CloudCredential();
+        credential.setAccessKey(properties.getOrDefault(OssProperties.ACCESS_KEY,
+                    properties.get(S3Properties.Env.ACCESS_KEY)));
+        credential.setSecretKey(properties.getOrDefault(OssProperties.SECRET_KEY,
+                    properties.get(S3Properties.Env.SECRET_KEY)));
+        if (properties.containsKey(OssProperties.SESSION_TOKEN)
+                || properties.containsKey(S3Properties.Env.TOKEN)) {
+            credential.setSessionToken(properties.getOrDefault(OssProperties.SESSION_TOKEN,
+                    properties.get(S3Properties.Env.TOKEN)));
+        }
+        String region = properties.getOrDefault(OssProperties.REGION, properties.get(S3Properties.Env.REGION));
+        // s3 file io just supports s3-like endpoint
+        String s3Endpoint = endpoint.replace(region, "s3." + region);
+        URI endpointUri = URI.create(s3Endpoint);
+        FileIO io = new S3FileIO(() -> S3Util.buildS3Client(endpointUri, region, credential));

Review Comment:
   Note:  I find s3 file io is faster than hadoop file io. 
   The reason maybe is that the hadoop file io will load the hadoop configuration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1613371882

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] wsjz commented on pull request #21238: [fix](multi-catalog)fix obj file cache and dlf iceberg catalog

Posted by "wsjz (via GitHub)" <gi...@apache.org>.
wsjz commented on PR #21238:
URL: https://github.com/apache/doris/pull/21238#issuecomment-1614072709

   run p0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org