You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@paimon.apache.org by "zhangjun0x01 (via GitHub)" <gi...@apache.org> on 2023/03/19 16:06:24 UTC

[GitHub] [incubator-paimon] zhangjun0x01 opened a new pull request, #646: [core] add read batch size option

zhangjun0x01 opened a new pull request, #646:
URL: https://github.com/apache/incubator-paimon/pull/646

   related to https://github.com/apache/incubator-paimon/issues/641


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhangjun0x01 commented on a diff in pull request #646: [core] add read batch size option

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.
zhangjun0x01 commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1141548782


##########
paimon-format/src/main/java/org/apache/paimon/format/orc/OrcFileFormat.java:
##########
@@ -62,13 +61,15 @@ public class OrcFileFormat extends FileFormat {
     private final Properties orcProperties;
     private final org.apache.hadoop.conf.Configuration readerConf;
     private final org.apache.hadoop.conf.Configuration writerConf;
+    private final int readBatchSize;
 
-    public OrcFileFormat(Options formatOptions) {
+    public OrcFileFormat(Options formatOptions, int readBatchSize) {

Review Comment:
   Yes, we can get `readBatchSize` by `option.get('read. batch-size')`. But I don't think we should do this way. We should use `CoreOptions#READ_BATCH_SIZE.key()`, but the `paimon-format` module does not reference the `paimon-core` module, so I pass this parameter from `paimon-core` to `paimon-format`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhangjun0x01 closed pull request #646: [core] add read batch size option

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.
zhangjun0x01 closed pull request #646: [core] add read batch size option
URL: https://github.com/apache/incubator-paimon/pull/646


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhangjun0x01 closed pull request #646: [core] add read batch size option

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.
zhangjun0x01 closed pull request #646: [core] add read batch size option
URL: https://github.com/apache/incubator-paimon/pull/646


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1142151726


##########
paimon-format/src/main/java/org/apache/paimon/format/orc/OrcFileFormat.java:
##########
@@ -63,13 +62,15 @@ public class OrcFileFormat extends FileFormat {
     private final Properties orcProperties;
     private final org.apache.hadoop.conf.Configuration readerConf;
     private final org.apache.hadoop.conf.Configuration writerConf;
+    private final int readBatchSize;
 
-    public OrcFileFormat(Options formatOptions) {
+    public OrcFileFormat(Options formatOptions, int readBatchSize) {

Review Comment:
   We can just create constructor of `FormatContext`.
   We can create a `FormatContext.copy(Options newOptions)`.



##########
paimon-core/src/main/java/org/apache/paimon/CoreOptions.java:
##########
@@ -573,6 +580,15 @@ public String partitionDefaultName() {
         return options.get(PARTITION_DEFAULT_NAME);
     }
 
+    public static FileFormat fromTableOptions(Options options, ConfigOption<String> formatOption) {

Review Comment:
   `createFileFormat`, this is in `CoreOptions` now.



##########
paimon-format/src/main/java/org/apache/paimon/format/parquet/ParquetFileFormat.java:
##########
@@ -38,10 +38,12 @@
 public class ParquetFileFormat extends FileFormat {
 
     private final Options formatOptions;
+    private final int readBatchSize;
 
-    public ParquetFileFormat(Options formatOptions) {
+    public ParquetFileFormat(Options formatOptions, int readBatchSize) {

Review Comment:
   ditto



##########
paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java:
##########
@@ -99,12 +98,17 @@ public static FileFormat fromIdentifier(String identifier, Options options) {
     }
 
     private static Optional<FileFormat> fromIdentifier(
-            String formatIdentifier, Options formatOptions, ClassLoader classLoader) {
+            String formatIdentifier,
+            FileFormatFactory.FormatContext context,
+            ClassLoader classLoader) {
         ServiceLoader<FileFormatFactory> serviceLoader =
                 ServiceLoader.load(FileFormatFactory.class, classLoader);
         for (FileFormatFactory factory : serviceLoader) {
             if (factory.identifier().equals(formatIdentifier.toLowerCase())) {
-                return Optional.of(factory.create(formatOptions));
+                return Optional.of(
+                        factory.create(
+                                new FileFormatFactory.FormatContext(

Review Comment:
   why need to copy `FormatContext `?



##########
paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java:
##########
@@ -75,20 +75,19 @@ public Optional<FileStatsExtractor> createStatsExtractor(RowType type) {
         return Optional.empty();
     }
 
-    /** Create a {@link FileFormat} from table options. */
-    public static FileFormat fromTableOptions(
-            Options tableOptions, ConfigOption<String> formatOption) {
-        String formatIdentifier = tableOptions.get(formatOption);
-        return fromIdentifier(formatIdentifier, tableOptions.removePrefix(formatIdentifier + "."));
+    @VisibleForTesting
+    public static FileFormat fromIdentifier(String identifier, Options options) {
+        return fromIdentifier(identifier, new FileFormatFactory.FormatContext(options, 1024));
     }
 
     /** Create a {@link FileFormat} from format identifier and format options. */
-    public static FileFormat fromIdentifier(String identifier, Options options) {
+    public static FileFormat fromIdentifier(

Review Comment:
   import `FileFormatFactory.FormatContext`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1141657676


##########
paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java:
##########
@@ -77,18 +78,35 @@ public Optional<FileStatsExtractor> createStatsExtractor(RowType type) {
 
     /** Create a {@link FileFormat} from table options. */
     public static FileFormat fromTableOptions(
-            Options tableOptions, ConfigOption<String> formatOption) {
+            Options tableOptions,
+            ConfigOption<String> formatOption,
+            ConfigOption<Integer> readBatchSizeOption) {

Review Comment:
   Maybe we can remove this method.
   Provide a util static method in `CoreOptions`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhangjun0x01 closed pull request #646: [core] add read batch size option

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.
zhangjun0x01 closed pull request #646: [core] add read batch size option
URL: https://github.com/apache/incubator-paimon/pull/646


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#issuecomment-1477305368

   I accidentally merged this PR.
   @zhangjun0x01 You can add case in another PR, It is better to add case in `FileFormatTest`, test `CoreOptions.createFileFormat`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi merged pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi merged PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1141658611


##########
paimon-common/src/main/java/org/apache/paimon/format/FileFormat.java:
##########
@@ -77,18 +78,35 @@ public Optional<FileStatsExtractor> createStatsExtractor(RowType type) {
 
     /** Create a {@link FileFormat} from table options. */
     public static FileFormat fromTableOptions(
-            Options tableOptions, ConfigOption<String> formatOption) {
+            Options tableOptions,
+            ConfigOption<String> formatOption,
+            ConfigOption<Integer> readBatchSizeOption) {
         String formatIdentifier = tableOptions.get(formatOption);
-        return fromIdentifier(formatIdentifier, tableOptions.removePrefix(formatIdentifier + "."));
+        int readBatchSize = tableOptions.get(readBatchSizeOption);
+        return fromIdentifier(
+                formatIdentifier, readBatchSize, tableOptions.removePrefix(formatIdentifier + "."));
     }
 
-    /** Create a {@link FileFormat} from format identifier and format options. */
+    @VisibleForTesting
     public static FileFormat fromIdentifier(String identifier, Options options) {
+        return fromIdentifier(identifier, 1000, options);
+    }
+
+    /** Create a {@link FileFormat} from format identifier and format options. */
+    public static FileFormat fromIdentifier(String identifier, int readBatchSize, Options options) {

Review Comment:
   fromIdentifier(String identifier, FormatContext context)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhuangchong commented on a diff in pull request #646: [core] add read batch size option

Posted by "zhuangchong (via GitHub)" <gi...@apache.org>.
zhuangchong commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1141518178


##########
paimon-format/src/main/java/org/apache/paimon/format/orc/OrcFileFormat.java:
##########
@@ -62,13 +61,15 @@ public class OrcFileFormat extends FileFormat {
     private final Properties orcProperties;
     private final org.apache.hadoop.conf.Configuration readerConf;
     private final org.apache.hadoop.conf.Configuration writerConf;
+    private final int readBatchSize;
 
-    public OrcFileFormat(Options formatOptions) {
+    public OrcFileFormat(Options formatOptions, int readBatchSize) {

Review Comment:
   I have a question? Why the `readBatchSize` attribute is not obtained from options.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] zhangjun0x01 commented on pull request #646: [core] add read batch size option

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.
zhangjun0x01 commented on PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#issuecomment-1477331938

   > I accidentally merged this PR. @zhangjun0x01 You can add case in another PR, It is better to add case in `FileFormatTest`, test `CoreOptions.createFileFormat`.
   
   ok,I will add later 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #646: [core] add read batch size option

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.
JingsongLi commented on code in PR #646:
URL: https://github.com/apache/incubator-paimon/pull/646#discussion_r1141655103


##########
paimon-common/src/main/java/org/apache/paimon/format/FileFormatFactory.java:
##########
@@ -25,5 +25,5 @@ public interface FileFormatFactory {
 
     String identifier();
 
-    FileFormat create(Options formatOptions);
+    FileFormat create(Options formatOptions, int readBatchSize);

Review Comment:
   Maybe just create a class `FormatContext`:
   ```
   FormatContext {
       Options formatOptions();
       int readBatchSize();
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org