You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/17 00:46:59 UTC

[GitHub] [flink] JingGe opened a new pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

JingGe opened a new pull request #17501:
URL: https://github.com/apache/flink/pull/17501


   ## What is the purpose of the change
   
   The goal of this PR to provide AvroParquetRecordFormat implementation to read avro GenericRecords from parquet via the new Flink FileSource. 
   
   This is the draft PR, there is one failed unit test case, which is documented out for now with TODO comment. I am still working on it.
   
   ## Brief change log
   
   - Create RecordFormat interface which focuses on Path with default methods implementation and delegate createReader() calls to the overloaded methods from StreamFormat that focuses on FDDataInputStream.
   - Create AvroParquetRecordFormat implementation. Only reading avro GenericRecord from parquet file or stream is supported in this version. Support for other avro record types will be implemented later.
   - Splitting is not supported in this version. 
   
   ## Open Questions
   
   To give you some background, the original idea was to let AvroParquetRecordFormat implement FileRecordFormat. After considering that FileRecordFormat and StreamFormat have too many commons and StreamFormat has more built-in features like compression support(via StreamFormatAdapter), current design is based on StreamFormat. In order to keep SIP clear, 2-levels interfaces have been defined.  Let StreamFormat focuses on the abstract input stream and let RecordFormat pay attention to the concrete FileSystem, i.e. the Path. RecordFormat provides default implementation for the overloaded createReader(...) methods. Subclasses are therefore not forced to implement it.
   
   Following are some questions open for discussion:
   
   1. Compare to the 2-levels interfaces design, as another option, all default methods implemented in the RecordFormat could be merged into the StreamFormat. Such design keeps all createReader(...) methods in one interface(StreamFormat only, no more RecordFormat) that is in some ways easier to use. The downside is the SIP violation. Based on these consideration, I chose the current 2-levels API design.
   2. After considering priorities, Splitting is currently not supported, since we didn't get strong requirement from the business side. It will be implemented later, when the time comes.
   3. If this design works well, as next step, we should consider replacing the FileRecordFormat with the RecordFormat. In this way, the duplicated code and javacode could be avoided. This question is a little bit out of the scope of this PR, but does relate to the topic, could be therefore discussed too.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
     New unit test validates that : 
   - Reader can be created and restored correctly via Path.
   - Violation constraint checks for no splitting, null path, and no restoreOffset are working well.
   - GenericRecords can be read correctly from parquet file.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (**yes** / no)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (**yes** / no / don't know)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (**yes** / no)
     - If yes, how is the feature documented? (not applicable / docs / **JavaDocs** / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732516736



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Why was this change needed?
   
   since we use the default value of relativePath, the whole line could be saved. This change follows the official indication. http://maven.apache.org/ref/3.3.9/maven-model/maven.html#class_parent 
   Btw, Intellij Idea points out the issue too. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733730691



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       > Can we also support custom types or only `GenericRecord`?
   
   please refer to the PR description at the beginning.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733731918



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;

Review comment:
       > Do you plan to support the splits later? I saw that you can pass a file range.
   > 
   > ```java
   >                 AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
   >                         .withFileRange()
   > ```
   
   Ditto
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737416907



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(

Review comment:
       It will be renamed once more record types will be supported.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 4e3d378a6a50f02b99a903dc106a2ad9f931066f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "84e5e702fc9aaeeb14a7c445a14ea443141f9e9a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "84e5e702fc9aaeeb14a7c445a14ea443141f9e9a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   * 84e5e702fc9aaeeb14a7c445a14ea443141f9e9a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765032174



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       This is the question of API design: Do we want to make the internal used 3rd. party class visible to the area where AvroParquetRecordFormat is constructed, i.e. the factory methods? Since the constructor of AvroParquetRecordFormat is package private, it is acceptable to do it. But it is generally recommended to encapsulate such information with the class so that any changes of the 3rd. party lib on low level will not force signature changes on the higher level, e.g. avoid the Open/Closed Principle violation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r735440587



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740224352



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       sure, thanks @echauchot for the hint.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       should be actually checked in the `createReader(
               Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)`, I will add it to make it more robust.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<GenericRecord> getProducedType() {
+        return new GenericRecordAvroTypeInfo(schema);
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link RecordFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class GenericRecordReader implements RecordFormat.Reader<GenericRecord> {

Review comment:
       Because of the generic type `GenericRecord`. I will upgrade the reader to support custom types and rename the class.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;

Review comment:
       Splitting will be supported in a follow-up ticket.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*

Review comment:
       sure, I will merge the methods into the `StreamFormat`

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like to understand the Flink concept and therefor the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>

Review comment:
       ok, I will create a new ticket for it.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(

Review comment:
       The original idea was to replace the `FileRecordFormat` with this `RecordFormat`. Merging it into the `StreamFormat` is also fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740460859



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       After talking with @StephanEwen, it is recommended to use StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * efdd53bb50c9b8712b87f7d24495e20fef5b78f5 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765) 
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770496071



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       I'm still not sure if these changes provide any benefit to the user. If the framework uses them (I currently don't find any caller), we can also move them call-site. We should only add things to user facing interfaces if they help the user (and not the framework).

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -46,7 +46,8 @@
      */

Review comment:
       Commit message: [FLINK-21406][parquet] Rename Parquet**_Avro_**Writers to AvroParquetWriters 

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -46,7 +46,8 @@
      */
     public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
             final Class<T> typeClass) {
-        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass), SpecificData.get());
+        return new AvroParquetRecordFormat<>(

Review comment:
       These changes should be in the previous commit.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -50,11 +51,12 @@
 
     private final TypeInformation<E> type;
 
-    private final GenericData dataModel;
+    private final SerializableSupplier<GenericData> dataModelSupplier;
 
-    AvroParquetRecordFormat(TypeInformation<E> type, GenericData dataModel) {

Review comment:
       These changes should be in the previous commit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   * 3a547a5b480efdb2d61a2bddcb053dd6a8ee61be Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733491320



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       Can we also support custom types or only `GenericRecord`?

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;

Review comment:
       Do you plan to support the splits later? I saw that you can pass a file range. 
   
   ```java
                   AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
                           .withFileRange()
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-950315970


   R:@echauchot
   @JingGe thanks a lot for your work ! If I may, I'd like to review this PR as well as I was the author of the `ParquetAvroInputFormat` for the older source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-950315970


   R:@echauchot
   @JingGe thanks a lot for your work ! If I may, I'd like to review this PR as I was the author of the `ParquetAvroInputFormat` for the older source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737422936



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > > Why was this change needed?
   > 
   > since we use the default value of relativePath, the whole line could be saved. This change follows the official indication. http://maven.apache.org/ref/3.3.9/maven-model/maven.html#class_parent Btw, Intellij Idea points out the issue too.
   
   @echauchot for the "relativePath" question, please refer to this answer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-952858917


   @echauchot many thanks for your comments, I will try to answer them in each thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r735440587



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       please to not change maven conf




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765640737



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       It will be called by a `FormatAdapter` in the framework, because user has no access to any `StreamFormat`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740328273



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       these must be checked in the `createReader(
               Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)`. Adding these code here will make the logic redundant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r743200328



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       Thanks for the comprehensive explanation! I will change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r745787705



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       all 3 avro records have been supported.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733745562



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       No one said we only support `GenericRecord`, you can find all info in the PR description at the beginning. Would you mind to have a look at there?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740259768



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and therefore the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-952742574


   > @echauchot
   > 
   > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > 
   > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   
   I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737431535



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > +1 on what Fabian said. IMHO, I think this avro/parquet format should implement `BulkFormat` directly cf [this discussion](https://lists.apache.org/thread.html/re3a5724ba3e68d63cd2b83d9d14c41cdcb7547e7c46c6c5e5b7aeb73%40%3Cdev.flink.apache.org%3E) we had with Jingsong. Regarding block/rowgroup support with BulkFormat: could it be done by implementing `BulkFormat.Reader#readBatch `?
   
   I could only find the statement/conclusion of using BulkFormat in that discussion. Could you share the reason why should we implement BulkFormat directly? BTW, there was a similar discussion at #17520
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765) 
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1e9ac381ab9fc13d738a5bbd4be8207232240a0a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1e9ac381ab9fc13d738a5bbd4be8207232240a0a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345) 
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r731846389



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {

Review comment:
       Is this interface really necessary? I am finding it hard to understand the difference to `FileRecordFormat`.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       Why was this change needed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3a547a5b480efdb2d61a2bddcb053dd6a8ee61be Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182) 
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768389182



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       Yes, I'd do it here since the constructor is not public and we can adjust if needed (not that I expect it since the whole model abstraction in Avro is very stable). Your point is very valid on all public accessors though. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770565214



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       This is about the feature of working with Flink `Path` provided by `FileRecordFormat`. The default implementation will be provided here so we can deprecate `FileRecordFormat` without loosing the feature. The caller could be a `FormatAdapter` similar to `FileRecordFormatAdapter` which choose `Path` over `Stream`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765642743



##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormatTest.java
##########
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.serialization.BulkWriter;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.formats.parquet.ParquetWriterFactory;
+import org.apache.flink.formats.parquet.generated.Address;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/**
+ * Unit test for {@link AvroParquetRecordFormat} and {@link
+ * org.apache.flink.connector.file.src.reader.StreamFormat}.
+ */
+class AvroParquetRecordFormatTest {

Review comment:
       I have the same consideration. The question is: since the logic of specific and reflective record has already been tested in unit test and the integration test logic has been tested with generic record, what is the purpose to test them again in ITCase?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. 
   All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. 
   
   All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. 
   
       All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765639277



##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/Datum.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import java.io.Serializable;
+
+/** Test datum. */
+public class Datum implements Serializable {

Review comment:
       It is copied from `AvroStreamingFileSinkITCase` and turned into a normal class, since it could be used in different test classes. I think, in general, it is not a bad idea to make it be serializable, since the data will be read from a parquet file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770015031



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       Thanks for the info. The 3rd option should be acceptable for now and the 4th option looks much more interesting, I will try that. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765017804



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       Do you mean calling `forSpecificRecord(typeClass)` implicitly in this case and log warning?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768390250



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       Do you mean `StreamFormatAdapter`? I don't see any changes in this PR that calls this method. For me, this currently looks like dead code then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "997062683",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732878761



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       There are plenty logics implemented in the StreamFormatAdapter, as I mentioned in the "open questions" section, why should I do my own implementation again from BulkFormat instead of reusing them? The design idea is to let BulkStream handle batch and let StreamFormat/FileRecordFormat handle streaming, afaik. Your question is leading actually to a fundamental question: why do we need StreamFormat/FileRecordFormat if we can implement everything from the BulkFormat which supports both batch and streaming, quoted from your word, I didn't see any reference about this conclusion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737419289



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       sure, like mentioned in the PR description at the beginning, more types will be support later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732509201



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {

Review comment:
       Exactly, it is actually also hardly to find the difference between StreamFormat and FileRecordFormat. I've explained the intention of building this new interface in the "open questions" section above, please have a look. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733739540



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       I can't see a reason why only `GenericRecord`s are supported. AFAICT we also want to allow custom types if it is possible and only provide one format implementation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733820258



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       The batch and stream we are talking here refer to reading data not execution. I think it is an interesting discussion whether BulkFormat alone is a good fit to run streaming execution with batch read, the latency could be an issue. 
   
   For the `RecordFormat`, it is considered from the architecture perspective, because the `FileRecordFormat` and `StreamFormat` are very similar. And currently, more features have been built for StreamFormat than FileRecordFormat, Compression is only used as (wrong) example which is not used for parquet, there could be more others. We should keep an eye on DRY.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733480109



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > where is the reference to tell us that BulkFormat support streaming? Afaik, all javadocs about BulkFormat are only talking about batch, please refer to the javadoc of BulkFormat itself and the javadoc of FileSource.
   
   In general, all formats should support batch and streaming execution. As an example that `BulkFormat`s are also applicable to streaming executions you can take a look at this docstring [1]. The docstring mentions checkpoints and how the last offset/position is tracked. Checkpointing is not supported in batch execution. 
   The difference between `BulkFormat` and `FileRecordFormat` is how the underlying reader interacts with the filesystem. `BulkFormats` usually always read batches of data i.e. parquet reader always reads blocks/rowgroups as on the other hand `FileRecordFormat` usually reads the file line by line.
   
   After looking through the `AvroParquetReader` I think your assumption is right. We cannot implement a bulk format here because the reader does not expose any information about the underlying block/rowgroup structure.
   
   I am still a bit unsure about the newly introduced `RecordFormat` you have only mentioned we use the `StreamFormat` to support compression but I think the right way to support compression for ParquetAvro would be to configure it with a codecFactory.
   
   ```java
                   AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
                           .withCodecFactory(...)
   ```
   [1] https://github.com/apache/flink/blob/34de7d1038f1078980cc539273b724ce7c85696a/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/BulkFormat.java#L56




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "f9f06300555f2f98b148bc6eb46172ec07dd3065",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "f9f06300555f2f98b148bc6eb46172ec07dd3065",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * f9f06300555f2f98b148bc6eb46172ec07dd3065 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732516736



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Why was this change needed?
   since we use the default value of relativePath, the whole line could be saved. This change follows the official indication. http://maven.apache.org/ref/3.3.9/maven-model/maven.html#class_parent 
   Btw, Intellij Idea points out the issue too. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732506246



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Thanks for providing this pull request but I have a few preliminary questions about the design.
   > 
   > Every time I read something about parquet formats I always think the format should be based on the `BulkFormat` interface. Why did you base your implementation on the StreamFormat?
   > 
   > As a second point, I'd like to see an IT case using the new format with the `FileSource`. Did you already test this?
   
   
   
   > preliminary
   
   Thanks for asking. Using StreamFormat will enable streaming process for parquet file source. Further more, the same implementation can be used in batch processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik, this is one of the good design coming with the new FileSource.
   
   Logic has been tested in the UT. For the second question about the IT case, it is a good question to discuss here, I am open for the decision. Question 1: Format works more like a factory, do we really need IT for a factory? Question 2: since BulkFormat is the only format used to create FileSource internally, we could consider building the IT for the BulkFormat with the FileSource instead of the StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-960615936


   > > > > > @echauchot
   > > > > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > > > > 
   > > > > > 
   > > > > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > > > > 
   > > > > 
   > > > > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   > > > 
   > > > 
   > > > I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory.
   > > > The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.
   > > 
   > > 
   > > Yes I see that there is a countermeasure regarding possible OOM (fetch size) but still, for performance reasons, the split is important. Otherwise the parallelism is sub-optimal and Flink focuses on performance. I'm not a committer on the Flink project so it is not my decision to merge this PR without split but I would tend not to merge without split support to avoid that a user suffers from this lack of performance which seems to not meet project quality standards.
   > > @AHeise WDYT ?
   > 
   > I'm fine with a follow-up ticket/PR on that one to keep things going. Having any support for AvroParquet is better than having none. But it should be done before 1.15 release anyhow, such that end-users see only the splittable version.
   > 
   > We plan to support splitting for all formats with sync marks but in general the role of splitting has shifted, since the whole big data processing moved from block-based storages to cloud storages. Earlier, splitting was also needed to support data locality, which doesn't apply anymore. So now it's only needed to speed up ingestion (you can always rebalance after source), so it is necessary only for the most basic pipelines.
   > 
   > TL;DR while splitting is still a should-have feature, the days of must-have are gone.
   
   Having the split before any user could see the non-split version was all I was interested in. So delivering it before 1.15 release looks perfect ! 
   Thanks a lot Arvid for the info I did not have about the relative importance of splitting, I had only the rebalance in mind !
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-957254354


   > > > > @echauchot
   > > > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > > > 
   > > > > 
   > > > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > > > 
   > > > 
   > > > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   > > 
   > > 
   > > I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory.
   > > The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.
   > 
   > Yes I see that there is a countermeasure regarding possible OOM (fetch size) but still, for performance reasons, the split is important. Otherwise the parallelism is sub-optimal and Flink focuses on performance. I'm not a committer on the Flink project so it is not my decision to merge this PR without split but I would tend not to merge without split support to avoid that a user suffers from this lack of performance which seems to not meet project quality standards.
   > 
   > @AHeise WDYT ?
   
   I'm fine with a follow-up ticket/PR on that one to keep things going. Having any support for AvroParquet is better than having none. But it should be done before 1.15 release anyhow, such that end-users see only the splittable version.
   
   We plan to support splitting for all formats with sync marks but in general the role of splitting has shifted, since the whole big data processing moved from block-based storages to cloud storages. Earlier, splitting was also needed to support data locality, which doesn't apply anymore. So now it's only needed to speed up ingestion (you can always rebalance after source), so it is necessary only for the most basic pipelines.
   
   TL;DR while splitting is still a should-have feature, the days of must-have are gone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740847332



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       The format will be created in the `main` of the entry point. That could be a local client or in recent settings on the job manager. The job manager will create a `JobGraph` and chop it into `Task`s to send to the task managers, where each task has configuration that contains the serialized configuration. So while the `AvroParquetRecordFormat` is in the user jar that every task manager independently has access to, the specific format instance is serialized on JM, sent to TM, and deserialized there. Hence, every user function needs to be `Serializable` (e.g. `MapFunction`) or need to be created by a serializable factory (e.g. `BulkWriter.Factory`, `StreamFormat` as a factory for the `Reader`, `Source`, `Sink`). In this case, the `AvroParquetRecordFormat` is serialized without the schema information on JM, so the TM doesn't know the schema at all and will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740259768



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and incurred the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * efdd53bb50c9b8712b87f7d24495e20fef5b78f5 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208) 
   * 1e9ac381ab9fc13d738a5bbd4be8207232240a0a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1e9ac381ab9fc13d738a5bbd4be8207232240a0a Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345) 
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   * 8b1a7f936d0cb2e54ae4fb364075d72273386623 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168) 
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184) 
   * efdd53bb50c9b8712b87f7d24495e20fef5b78f5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737404542



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.

Review comment:
       > I guess it is only because this PR is a draft, but of course splitting is mandatory in production. If you need some pointers on how to implement split with Parquet, you can take a look at `ParquetColumnarRowSplitReader`.
   
   thanks for the hint, I have answer the question about splitting support in other comment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` with the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 1. read `PageReadStore` in batch mode; 2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part too. Using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. 
   
       All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess these were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part too. Using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 29acfe668f883ccc8792a4ae8d08329e43c1be7a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768830307



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;

Review comment:
       sure, updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297592


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa (Tue Dec 14 09:07:22 UTC 2021)
   
   **Warnings:**
    * **2 pom.xml files were touched**: Check for build and licensing issues.
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 4e3d378a6a50f02b99a903dc106a2ad9f931066f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768389463



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       Yes, I'm also torn on this one and I'd go with any of the two options.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r764680732



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;

Review comment:
       When is offset != 0?

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       We should inject the model with the factory methods. There we already have the knowledge where it's a specific, generic, or reflective type and we don't need to infer that here.

##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/Datum.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import java.io.Serializable;
+
+/** Test datum. */
+public class Datum implements Serializable {

Review comment:
       Does this even need to be serializable?

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       I wonder if we can do it directly and just log a warning. It's not like `GenericRecord`, where we simply lack information.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       I'm not fully satisfied with our ability to recover here. So I looked into what the AvroParquetReader is doing internally:
   
   ```
               // Injected
               GenericData model = GenericData.get();
               org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
   
               // Low level reader - fetch metadata
               ParquetFileReader reader = null;
               MessageType fileSchema = reader.getFileMetaData().getSchema();
               Map<String, String> metaData = reader.getFileMetaData().getKeyValueMetaData();
   
               // init Avro specific things
               AvroReadSupport<T> readSupport = new AvroReadSupport<>(model);
               ReadSupport.ReadContext readContext =
                       readSupport.init(
                               new InitContext(
                                     conf,
                                       metaData.entrySet().stream()
                                               .collect(Collectors.toMap(e -> e.getKey(), e -> Collections.singleton(e.getValue()))),
                                       fileSchema));
               RecordMaterializer<T> recordMaterializer = readSupport.prepareForRead(conf, metaData, fileSchema, readContext);
               MessageType requestedSchema = readContext.getRequestedSchema();
   
               // prepare record reader
               ColumnIOFactory columnIOFactory = new ColumnIOFactory(reader.getFileMetaData().getCreatedBy());
               MessageColumnIO columnIO = columnIOFactory.getColumnIO(requestedSchema, fileSchema, true);
   
               // for recovery
               while (...) {
                 reader.skipNextRowGroup();
               }
   
               // for reading
               PageReadStore pages;
               for (int block = 0; (pages = reader.readNextRowGroup()) != null; block++) {
                   RecordReader<T> recordReader = columnIO.getRecordReader(pages, recordMaterializer);
                   for (int i = 0; i < pages.getRowCount(); i++) {
                       T record = recordReader.read();
                       emit record;
                   }
               }
   ```
   
   Here we can easily track the block. Even better most of that snippet is already implemented in `ParquetVectorizedInputFormat`, so we may be able to solve both things at once. WDYT?

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       Who is invoking the new methods? Is it the framework or the user?

##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormatTest.java
##########
@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.serialization.BulkWriter;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.formats.parquet.ParquetWriterFactory;
+import org.apache.flink.formats.parquet.generated.Address;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/**
+ * Unit test for {@link AvroParquetRecordFormat} and {@link
+ * org.apache.flink.connector.file.src.reader.StreamFormat}.
+ */
+class AvroParquetRecordFormatTest {

Review comment:
       Good test coverage. Should the ITCase also have 1 test for specific and reflective?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-951026654


   @echauchot 
   > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   
   you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] fapaul commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
fapaul commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732562392



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Thanks for asking. Using StreamFormat will enable streaming process for parquet file source. Further more, the same implementation can be used in batch processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik, this is one of the good design coming with the new FileSource.
   
   I do not think that `StreamFormat` has a relation to the batch or stream execution.  It should only be based on the different format implementation. The `BulkFormat` was specifically designed to support Orc and Parquet use cases in batch and streaming. Therefore introducing a new `StreamFormat` to support a parquet feels wrong. Can you elaborate more on why you chose the `StreamFormat` instead of implementing the `BulkFormat`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * efdd53bb50c9b8712b87f7d24495e20fef5b78f5 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208) 
   * 1e9ac381ab9fc13d738a5bbd4be8207232240a0a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737414033



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       > nit: please rename to `ParquetAvroRecordFormat` for concistency with existing (Flink 1.13) `ParquetAvroInputFormat`
   
   This is the same question @fapaul asked. Here is a copy of my original answer: "I named it after the naming convention from apache parquet lib e.g. AvroParquetReader. It sound nature for "reading avro from parquet". I would suggest we change ParquetAvroWriters to AvroParquetWriters.". Since the only class using "ParquetAvro" instead of "AvroParquet" is `AvroParquetWriters` for the upcoming version 1.15, it is a good timing to make this change, before the gap of different naming convention compared with the official parquet lib goes too big.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737414033



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       > nit: please rename to `ParquetAvroRecordFormat` for concistency with existing (Flink 1.13) `ParquetAvroInputFormat`
   
   This is the same question @fapaul asked. Here is a copy of my original answer: "I named it after the naming convention from apache parquet lib e.g. AvroParquetReader. It sound nature for "reading avro from parquet". I would suggest we change ParquetAvroWriters to AvroParquetWriters.". Since the only class using "ParquetAvro" instead of "AvroParquet" is `AvroParquetWriters` for the upcoming version 1.15, it is a good timing to make this change, before the gap of different naming convention between Flink and the official parquet lib goes too big, which will confuse the users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737414033



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       > nit: please rename to `ParquetAvroRecordFormat` for concistency with existing (Flink 1.13) `ParquetAvroInputFormat`
   
   This is the same question @fapaul asked. Here is a copy of my original answer: "I named it after the naming convention from apache parquet lib e.g. AvroParquetReader. It sound nature for "reading avro from parquet". I would suggest we change ParquetAvroWriters to AvroParquetWriters.". AvroParquetWriters stands for "writing avro into parquet". Since the only class using "ParquetAvro" instead of "AvroParquet" is `AvroParquetWriters` for the upcoming version 1.15, it is a good timing to make this change, before the gap of different naming convention between Flink and the official parquet lib goes too big, which will confuse the users.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737418008



##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormatTest.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.serialization.BulkWriter;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.formats.parquet.ParquetWriterFactory;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/**
+ * Unit test for {@link AvroParquetRecordFormat} and {@link
+ * org.apache.flink.connector.file.src.reader.RecordFormat}.
+ */
+class AvroParquetRecordFormatTest {
+
+    private static final String USER_PARQUET_FILE = "user.parquet";
+
+    private static Path path;
+    private static Schema schema;
+    private static List<GenericRecord> records = new ArrayList<>(3);
+
+    @TempDir static java.nio.file.Path temporaryFolder;
+
+    /**
+     * Create a parquet file in the {@code TEMPORARY_FOLDER} directory.
+     *
+     * @throws IOException if new file can not be created.
+     */
+    @BeforeAll
+    static void setup() throws IOException {
+        schema =
+                new Schema.Parser()
+                        .parse(
+                                "{\"type\": \"record\", "
+                                        + "\"name\": \"User\", "
+                                        + "\"fields\": [\n"
+                                        + "        {\"name\": \"name\", \"type\": \"string\" },\n"
+                                        + "        {\"name\": \"favoriteNumber\",  \"type\": [\"int\", \"null\"] },\n"
+                                        + "        {\"name\": \"favoriteColor\", \"type\": [\"string\", \"null\"] }\n"
+                                        + "    ]\n"
+                                        + "    }");
+
+        records.add(createUser("Peter", 1, "red"));
+        records.add(createUser("Tom", 2, "yellow"));
+        records.add(createUser("Jack", 3, "green"));
+
+        path = new Path(temporaryFolder.resolve(USER_PARQUET_FILE).toUri());
+
+        ParquetWriterFactory<GenericRecord> writerFactory =
+                ParquetAvroWriters.forGenericRecord(schema);
+        BulkWriter<GenericRecord> writer =
+                writerFactory.create(
+                        path.getFileSystem().create(path, FileSystem.WriteMode.OVERWRITE));
+
+        for (GenericRecord record : records) {
+            writer.addElement(record);
+        }
+
+        writer.flush();
+        writer.finish();
+    }
+
+    @Test
+    void testCreateReader() throws IOException {
+        StreamFormat.Reader<GenericRecord> reader =
+                new AvroParquetRecordFormat(schema)
+                        .createReader(
+                                new Configuration(),
+                                path,
+                                0,
+                                path.getFileSystem().getFileStatus(path).getLen());
+        for (GenericRecord record : records) {
+            assertUserEquals(Objects.requireNonNull(reader.read()), record);
+        }
+    }
+
+    /** Expect exception since splitting is not supported now. */
+    @Test
+    void testCreateReaderWithSplitting() {
+        assertThrows(
+                IllegalArgumentException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .createReader(new Configuration(), path, 5, 5));
+    }
+
+    @Test
+    void testCreateReaderWithNullPath() {
+        assertThrows(
+                NullPointerException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .createReader(new Configuration(), (Path) null, 0, 0));
+    }
+
+    @Test
+    void testRestoreReaderWithNoOffset() {
+        assertThrows(
+                IllegalArgumentException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .restoreReader(
+                                        new Configuration(),
+                                        path,
+                                        CheckpointedPosition.NO_OFFSET,
+                                        0,
+                                        path.getFileSystem().getFileStatus(path).getLen()));
+    }
+
+    @Test
+    void testRestoreReader() throws IOException {

Review comment:
       source better, thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r738073990



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       else  `checkArgument(splitOffset == 0)` + `checkArgument(splitLength == fileStatus.getLen())`  ?

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       I agree that this is confusing for folks that have used the `AvroParquetWriter`. `ParquetAvroWriters` is, however, easier to find for Flink users without prior knowledge. So I'm a bit torn here. Let's include @StephanEwen in the discussion.
   
   If we go with consistency with Parquet, then we should deprecate the existing `ParquetAvroWriters`, move the code to `AvroParquetWriters`, and use deprecated forward functions from old to new in a separate commit.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*

Review comment:
       Please change commit message to something like
   ```
   [FLINK-21406][file connector] Added RecordFormat
   
   1. Create RecordFormat interface which focuses on Path with default methods implementation and delegate createReader() calls to the overloaded methods from StreamFormat that focuses on FDDataInputStream.
   2. Create AvroParquetRecordFormat implementation. Only reading avro GenericRecord from parquet file or stream is supported in this version.
   3. Splitting is not supported in this version.
   ```
   The list of components is not that clear (you can look at JIRA or usually a git blame helps) but it should give other Flink devs a rough idea on which module was changed. We also should always provide a crisp first line description as that's the only thing that will be shown by default in many tools.
   
   We could expand the description even to `Added RecordFormat as an opinionated StreamRecord` or so.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<GenericRecord> getProducedType() {
+        return new GenericRecordAvroTypeInfo(schema);
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link RecordFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class GenericRecordReader implements RecordFormat.Reader<GenericRecord> {

Review comment:
       How would a `SpecificRecordReader` look different? I'm suspecting that we can probably use the same class for all flavors.

##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       Yes, please make it `optional`. For non-avro parquet use cases, the user will receive smaller user jars.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       The commit is mixing concerns as also seen in your commit message: `RecordFormat` does not depend on `AvroParquetRecordFormat`, so I'd split the two things:
   1. First a commit with `RecordFormat` that introduces and motivates it in the commit messages. It also needs test coverage.
   2. `AvroParquetRecordFormat` as is with tests.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>

Review comment:
       Please don't change the project setup stuff just for one module. We usually value consistency over correctness. If there a general problem, pull it into a separate commit/PR and solve it for all modules.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(

Review comment:
       Where is this method called? It feels like this is mostly for testing. I think we should just add these methods directly to `StreamFormat<T>` instead of creating a 4. interface. Or if it's for testing only, we could add a wrapper class around it.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       Without a serializable Schema, this field will be `null` on all TMs. You can try to use `SerializableAvroSchema` or we go for Avro 1.10.X where Schema is serializable. The writer part uses a string representation of the schema to circumvent the issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732707127



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.

Review comment:
       I guess it is only because this PR is a draft, but of course splitting is mandatory in production. If you need some pointers on how to implement split with Parquet, you can take a look at `ParquetColumnarRowSplitReader`.

##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       ditto

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(

Review comment:
       I would avoid creating this abstraction unless it is needed. This class is just a passthrough wrapper to `ParquetReader`. I think it is better to call `ParquetReader` directly 

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,

Review comment:
       missing the end of the comment ?

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > The batch and stream we are talking here refer to reading data not the execution. I think it is an interesting discussion whether BulkFormat alone is a good fit to run streaming execution with batch read, the latency could be an issue.
   > 
   > For the `RecordFormat`, it is considered from the architecture perspective, because the `FileRecordFormat` and `StreamFormat` are very similar. And currently, more features have been built for StreamFormat than FileRecordFormat, Compression is only used as (wrong) example which is not used for parquet, there could be more others. We should keep an eye on DRY.
   
   I was about to tell that there was indeed a confusion in the discussion between a java stream and streaming mode execution of the pipeline.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       +1 to supporting other types as `GenericRecord` holds avro schema which can be costly in memory use and serialisation/deserialisation.

##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       maybe put this dep `<optional>true</optional>` as it was done for the older flink-parquet component (see https://github.com/apache/flink/pull/15156#pullrequestreview-672808431)

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       
   > In general, all formats should support batch and streaming execution. As an example that `BulkFormat`s are also applicable to streaming executions you can take a look at this docstring [1]. The docstring mentions checkpoints and how the last offset/position is tracked. Checkpointing is not supported in batch execution. The difference between `BulkFormat` and `FileRecordFormat` is how the underlying reader interacts with the filesystem. `BulkFormats` usually always read batches of data i.e. parquet reader always reads blocks/rowgroups as on the other hand `FileRecordFormat` usually reads the file line by line.
   > 
   > After looking through the `AvroParquetReader` I think your assumption is right. We cannot implement a bulk format here because the reader does not expose any information about the underlying block/rowgroup structure.
   
   +1 on what Fabian said. IMHO, I think this avro/parquet format should implement `BulkFormat` directly cf [this discussion](https://lists.apache.org/thread.html/re3a5724ba3e68d63cd2b83d9d14c41cdcb7547e7c46c6c5e5b7aeb73%40%3Cdev.flink.apache.org%3E) we had with Jingsong. Regarding block/rowgroup support with BulkFormat: could it be done by implementing `BulkFormat.Reader#readBatch `?

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Why was this change needed?
   
   as @fapaul I don't think changing the relativePath is needed but maybe I missed something.
   

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       nit: please rename to `ParquetAvroRecordFormat` for concistency with existing (Flink 1.13) `ParquetAvroInputFormat`

##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormatTest.java
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.serialization.BulkWriter;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.formats.parquet.ParquetWriterFactory;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.io.TempDir;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertThrows;
+
+/**
+ * Unit test for {@link AvroParquetRecordFormat} and {@link
+ * org.apache.flink.connector.file.src.reader.RecordFormat}.
+ */
+class AvroParquetRecordFormatTest {
+
+    private static final String USER_PARQUET_FILE = "user.parquet";
+
+    private static Path path;
+    private static Schema schema;
+    private static List<GenericRecord> records = new ArrayList<>(3);
+
+    @TempDir static java.nio.file.Path temporaryFolder;
+
+    /**
+     * Create a parquet file in the {@code TEMPORARY_FOLDER} directory.
+     *
+     * @throws IOException if new file can not be created.
+     */
+    @BeforeAll
+    static void setup() throws IOException {
+        schema =
+                new Schema.Parser()
+                        .parse(
+                                "{\"type\": \"record\", "
+                                        + "\"name\": \"User\", "
+                                        + "\"fields\": [\n"
+                                        + "        {\"name\": \"name\", \"type\": \"string\" },\n"
+                                        + "        {\"name\": \"favoriteNumber\",  \"type\": [\"int\", \"null\"] },\n"
+                                        + "        {\"name\": \"favoriteColor\", \"type\": [\"string\", \"null\"] }\n"
+                                        + "    ]\n"
+                                        + "    }");
+
+        records.add(createUser("Peter", 1, "red"));
+        records.add(createUser("Tom", 2, "yellow"));
+        records.add(createUser("Jack", 3, "green"));
+
+        path = new Path(temporaryFolder.resolve(USER_PARQUET_FILE).toUri());
+
+        ParquetWriterFactory<GenericRecord> writerFactory =
+                ParquetAvroWriters.forGenericRecord(schema);
+        BulkWriter<GenericRecord> writer =
+                writerFactory.create(
+                        path.getFileSystem().create(path, FileSystem.WriteMode.OVERWRITE));
+
+        for (GenericRecord record : records) {
+            writer.addElement(record);
+        }
+
+        writer.flush();
+        writer.finish();
+    }
+
+    @Test
+    void testCreateReader() throws IOException {
+        StreamFormat.Reader<GenericRecord> reader =
+                new AvroParquetRecordFormat(schema)
+                        .createReader(
+                                new Configuration(),
+                                path,
+                                0,
+                                path.getFileSystem().getFileStatus(path).getLen());
+        for (GenericRecord record : records) {
+            assertUserEquals(Objects.requireNonNull(reader.read()), record);
+        }
+    }
+
+    /** Expect exception since splitting is not supported now. */
+    @Test
+    void testCreateReaderWithSplitting() {
+        assertThrows(
+                IllegalArgumentException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .createReader(new Configuration(), path, 5, 5));
+    }
+
+    @Test
+    void testCreateReaderWithNullPath() {
+        assertThrows(
+                NullPointerException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .createReader(new Configuration(), (Path) null, 0, 0));
+    }
+
+    @Test
+    void testRestoreReaderWithNoOffset() {
+        assertThrows(
+                IllegalArgumentException.class,
+                () ->
+                        new AvroParquetRecordFormat(schema)
+                                .restoreReader(
+                                        new Configuration(),
+                                        path,
+                                        CheckpointedPosition.NO_OFFSET,
+                                        0,
+                                        path.getFileSystem().getFileStatus(path).getLen()));
+    }
+
+    @Test
+    void testRestoreReader() throws IOException {

Review comment:
       nit: please rename to `testReadWithRestoredReader`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r743200328



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       Thanks for the comprehensive explanation! I will change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r743200328



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       Thanks for the comprehensive explanation! I will change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   * 8b1a7f936d0cb2e54ae4fb364075d72273386623 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 29acfe668f883ccc8792a4ae8d08329e43c1be7a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740259768



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and therefor the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740460859



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       @echauchot 
   After talking with @StephanEwen, it is recommended to use StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-954722957


   > > > @echauchot
   > > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > > 
   > > > 
   > > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > > 
   > > 
   > > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   > 
   > I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory.
   > 
   > The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.
   
   Yes I see that there is a countermeasure regarding possible OOM (fetch size) but still, for performance reasons, the split is important. Otherwise the parallelism is sub-optimal and Flink focuses on performance. I'm not a committer on the Flink project so it is not my decision to merge this PR without split but I would tend not to merge without split support to avoid that a user suffers from this lack of performance which seems to not meet project quality standards.
   
   @AHeise WDYT ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r735440587



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       please do not change maven conf




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740847332



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       The format will be created in the `main` of the entry point. That could be a local client or in recent settings on the job manager. The job manager will create a `JobGraph` and chop it into `Task`s to send to the task managers, where each task has configuration that contains the serialized configuration. So while the `AvroParquetRecordFormat` is in the user jar that every task manager independently has access to, the specific format instance is serialized on JM, sent to TM, and deserialized there. Hence, every user function needs to be `Serializable` (e.g. `MapFunction`) or need to be created by a serializable factory (e.g. `BulkWriter.Factory`, `StreamFormat` as a factory for the `Reader`, `Source`, `Sink`). In this case, the `AvroParquetRecordFormat` is serialized without the schema information on JM, so the TM doesn't know the schema at all and will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   * 8b1a7f936d0cb2e54ae4fb364075d72273386623 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8b1a7f936d0cb2e54ae4fb364075d72273386623 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168) 
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe removed a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe removed a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-952858917


   @echauchot many thanks for your comments, I will try to answer them in each thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 47ecd809d099606b06d8e3bc526d9735ae8c3321 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107) 
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r733820258



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       The batch and stream we are talking here refer to reading data not the execution. I think it is an interesting discussion whether BulkFormat alone is a good fit to run streaming execution with batch read, the latency could be an issue. 
   
   For the `RecordFormat`, it is considered from the architecture perspective, because the `FileRecordFormat` and `StreamFormat` are very similar. And currently, more features have been built for StreamFormat than FileRecordFormat, Compression is only used as (wrong) example which is not used for parquet, there could be more others. We should keep an eye on DRY.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3a547a5b480efdb2d61a2bddcb053dd6a8ee61be Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r769936294



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       Unfortunately, after injecting the model, we have to hold the model as an instance variable with the type of `GenericData` which is not 'Serializable'. One solution might be override readObject(...) writeObject(...) methods and the logic will not be easier than the old one in the getDataModel(). Another option is to define a new enum for Generic, Specific, and Reflect as a workaround. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770011202



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       For example `FileRecordFormatAdapter` or any other new `FormatAdapter` that wants to work with Flink Path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765614731



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;

Review comment:
       Unfortunately, the offset should always be `CheckpointedPosition.NO_OFFSET`. We are not able to manage the offset, as far as the high level API ParquetReader is used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765032174



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       This is the question of API design: Do we want to make the internal used 3rd. party class visible to the area where AvroParquetRecordFormat is constructed, i.e. the factory methods? Since the constructor of AvroParquetRecordFormat is package private, it is acceptable to do it. But it is generally recommended to encapsulate such information within the class so that any changes of the 3rd. party lib on the low level will not force signature changes on the higher level, e.g. avoid the Open/Closed Principle violation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 1. read `PageReadStore` in batch mode; 2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770560805



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -50,11 +51,12 @@
 
     private final TypeInformation<E> type;
 
-    private final GenericData dataModel;
+    private final SerializableSupplier<GenericData> dataModelSupplier;
 
-    AvroParquetRecordFormat(TypeInformation<E> type, GenericData dataModel) {

Review comment:
       sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770565214



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       This is about the feature of working with Flink `Path` provided by `FileRecordFormat`. The default implementation will be provided here so we can deprecate `FileRecordFormat` without loosing the feature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 4e3d378a6a50f02b99a903dc106a2ad9f931066f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 4e3d378a6a50f02b99a903dc106a2ad9f931066f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "997062683",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737403581



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       hmmm, should be fine. We had internal discussion about this, and let it be no optional for now. But, optional should be no doubt an option. @AHeise Do you have any concern about making it optional?

##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       should be fine. We had internal discussion about this, and let it be no optional for now. But, optional should be no doubt an option. @AHeise Do you have any concern about making it optional?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-957254354


   > > > > @echauchot
   > > > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > > > 
   > > > > 
   > > > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > > > 
   > > > 
   > > > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   > > 
   > > 
   > > I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory.
   > > The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.
   > 
   > Yes I see that there is a countermeasure regarding possible OOM (fetch size) but still, for performance reasons, the split is important. Otherwise the parallelism is sub-optimal and Flink focuses on performance. I'm not a committer on the Flink project so it is not my decision to merge this PR without split but I would tend not to merge without split support to avoid that a user suffers from this lack of performance which seems to not meet project quality standards.
   > 
   > @AHeise WDYT ?
   
   I'm fine with a follow-up ticket/PR on that one to keep things going. Having any support for AvroParquet is better than having none. But it should be done before 1.15 release anyhow, such that end-users see only the splittable version.
   
   We plan to support splitting for all formats with sync marks but in general the role of splitting has shifted, since the whole big data processing moved from block-based storages to cloud storages. Earlier, splitting was also needed to support data locality, which doesn't apply anymore. So now it's only needed to speed up ingestion (you can always rebalance after source), so it is necessary only for the most basic pipelines.
   
   TL;DR while splitting is still a should-have feature, the days of must-have are gone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part. Using the low level API `ParquetFileReader` is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. 
        All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I am keen to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r769957779



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       Er, tbh I forgot to check whether `GenericData` is serializable. 
   
   While `readObject` is the most canonical solution, it's also the least developer-friendly version (there are surprisingly many devs that don't know how that works correctly). 
   So from your options, I'd prefer the enum.
   
   I'd add two more options: 3) revert to the state before my comment (duplicated logic). 4) Use `SerializableSupplier` (I'm turning it into `PublicEvolving` into another PR so that this will also work if we outsource formats).
   So you have
   ```
   AvroParquetRecordFormat(TypeInformation<E> type, SerializableSupplier<GenericData> modelSupplier) {
     this.modelSupplier = modelSupplier;
     ...
   }
   ```
   and
   ```
           return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass), () -> GenericData.get());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   * 3a547a5b480efdb2d61a2bddcb053dd6a8ee61be UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r769940141



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       Yes it's easier to start with the current solution and later make it more lenient than vice versa. So 👍 to leave as-is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732506246



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Thanks for providing this pull request but I have a few preliminary questions about the design.
   > 
   > Every time I read something about parquet formats I always think the format should be based on the `BulkFormat` interface. Why did you base your implementation on the StreamFormat?
   > 
   > As a second point, I'd like to see an IT case using the new format with the `FileSource`. Did you already test this?
   
   
   Thanks for asking. Using StreamFormat will enable streaming process for parquet file source. Further more, the same implementation can be used in batch processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik, this is one of the good design coming with the new FileSource.
   
   Logic has been tested in the UT. For the second question about the IT case, it is a good question to discuss here, I am open for the decision. Question 1: Format works more like a factory, do we really need IT for a factory? Question 2: since BulkFormat is the only format used to create FileSource internally, we could consider building the IT for the BulkFormat with the FileSource instead of the StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732865726



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       where is the reference to tell us that BulkFormat support streaming? Afaik, all javadocs about BulkFormat are only talking about batch, please refer to the javadoc of BulkFormat itself and the javadoc of FileSource.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-948663153


   > Thanks for the nice discussion! I think I now better understand the contribution.
   > 
   > Can you rename the class `AvroParquetRecordFormat` to `ParquetAvro...`? The codebase already has a `ParquetAvroWriters` class and it would be nice to keep it consistent.
   
   I named it after the naming convention from apache parquet lib e.g. `AvroParquetReader`. It sound nature for "reading avro from parquet". I would suggest we change ParquetAvroWriters to AvroParquetWriters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "4e3d378a6a50f02b99a903dc106a2ad9f931066f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 4e3d378a6a50f02b99a903dc106a2ad9f931066f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-997062683


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d3ffea2e3b98c4c481715d63c3ea5d05c9313c0a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3ffea2e3b98c4c481715d63c3ea5d05c9313c0a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * d3ffea2e3b98c4c481715d63c3ea5d05c9313c0a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770562481



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -46,7 +46,8 @@
      */

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r771626315



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java
##########
@@ -157,6 +166,88 @@
             long splitEnd)
             throws IOException;
 
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {

Review comment:
       the commit of adding two default methods has been dropped.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "29acfe668f883ccc8792a4ae8d08329e43c1be7a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   * 29acfe668f883ccc8792a4ae8d08329e43c1be7a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-950676384


   > > R:@echauchot @JingGe thanks a lot for your work ! If I may, I'd like to review this PR as I was the author of the `ParquetAvroInputFormat` for the older source.
   > 
   > @echauchot Thanks for your interest and effort. It would be great if you would review this PR. Appreciate it. Please be aware of the "draft" of the current PR status. There are a lot information written in the PR description at the beginning that might hopefully give you the background info for the review.
   
   Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul  WDYT ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737431535



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > +1 on what Fabian said. IMHO, I think this avro/parquet format should implement `BulkFormat` directly cf [this discussion](https://lists.apache.org/thread.html/re3a5724ba3e68d63cd2b83d9d14c41cdcb7547e7c46c6c5e5b7aeb73%40%3Cdev.flink.apache.org%3E) we had with Jingsong. Regarding block/rowgroup support with BulkFormat: could it be done by implementing `BulkFormat.Reader#readBatch `?
   
   I could only find the statement of using BulkFormat in that discussion. Could you share the reason why should we implement BulkFormat directly? BTW, there was a similar discussion at #17520
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r769936294



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       Unfortunately, after injecting the model, we have to hold the model as an instance variable with the type of `GenericData` which is not 'Serializable'. One solution might be override readObject(...) writeObject(...) methods and the logic will not be easier than the old one in the getDataModel(). Another option is to define a new enum for Generic as a workaround. WDYT?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3a547a5b480efdb2d61a2bddcb053dd6a8ee61be Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182) 
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r770561049



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -46,7 +46,8 @@
      */
     public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
             final Class<T> typeClass) {
-        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass), SpecificData.get());
+        return new AvroParquetRecordFormat<>(

Review comment:
       thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise closed pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise closed pull request #17501:
URL: https://github.com/apache/flink/pull/17501


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, I am also unsatisfied with the recovery part too. Using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required. I'd really like to work on that as the next step.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765017804



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       This is the design idea how strong should we control the type. Doing this is to let the users be aware that they are actually using a SpecificRecord instead of a normal POJO they not knew. Do you mean calling `forSpecificRecord(typeClass)` implicitly in this case and log warning instead of throwing the exception? It might look convenient for users but we will take over users' responsibility in this way. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-952861699


   > > @echauchot
   > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > 
   > > 
   > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > 
   > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   
   I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory. 
   
   The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732516736



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Why was this change needed?
   since we use the default value of relativePath, the whole live could be saved. This change follows the official indication. http://maven.apache.org/ref/3.3.9/maven-model/maven.html#class_parent 
   Btw, Intellij Idea points out the issue too. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-950667366


   > R:@echauchot @JingGe thanks a lot for your work ! If I may, I'd like to review this PR as I was the author of the `ParquetAvroInputFormat` for the older source.
   
   @echauchot Thanks for your interest and effort. It would be great if you would review this PR. Appreciate it. Please be aware of the "draft" of the current PR status. There are a lot information written in the PR description at the beginning that might hopefully give you the background info for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732881890



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       Further more, I am not aware that BulkFormat was "specifically" designed to support orc and parquet. The javadoc tells us that "The BulkFormat reads and decodes **batches** of records at a time. **Examples of bulk formats** are formats like ORC or Parquet."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   * 99dac03879947eb2a9e3774c0c6170f9389e5185 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760) 
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-957254354


   > > > > @echauchot
   > > > > > Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits until the PR is ready for prod @JingGe @fapaul WDYT ?
   > > > > 
   > > > > 
   > > > > you are right, that was the idea of the draft PR. Speaking of the splitting support specifically, which will make the implementation way more complicated, this PR might be merged without it, because we didn't get any requirement for it from the business side. If you have any strong requirement w.r.t. the splitting, we'd like to know and reconsider it.
   > > > 
   > > > 
   > > > I think splitting is mandatory because if you read a big parquet file with no split support, then all the content will end up in a single task manager which will lead to OOM
   > > 
   > > 
   > > I agree with you, I have actually the same concern, especially from the SQL perspective. I didn't really understand your concern about OOM, because on the upper side we can control it via `StreamFormat.FETCH_IO_SIZ` and on the under side, `ParquetFileReader` will be used, we will not read the whole parquet file into memory.
   > > The goal of this PR is to make everyone on the same page w.r.t. the implementation design. Once the design is settled down, the splitting support as a feature could be easily done in a follow-up PR. That is why I wrote in the PR description explicitly at the beginning that "Splitting is not supported in this version". I will update it with more background info.
   > 
   > Yes I see that there is a countermeasure regarding possible OOM (fetch size) but still, for performance reasons, the split is important. Otherwise the parallelism is sub-optimal and Flink focuses on performance. I'm not a committer on the Flink project so it is not my decision to merge this PR without split but I would tend not to merge without split support to avoid that a user suffers from this lack of performance which seems to not meet project quality standards.
   > 
   > @AHeise WDYT ?
   
   I'm fine with a follow-up ticket/PR on that one to keep things going. Having any support for AvroParquet is better than having none. But it should be done before 1.15 release anyhow, such that end-users see only the splittable version.
   
   We plan to support splitting for all formats with sync marks but in general the role of splitting has shifted, since the whole big data processing moved from block-based storages to cloud storages. Earlier, splitting was also needed to support data locality, which doesn't apply anymore. So now it's only needed to speed up ingestion (you can always rebalance after source), so it is necessary only for the most basic pipelines.
   
   TL;DR while splitting is still a should-have feature, the days of must-have are gone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740847332



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       The format will be created in the `main` of the entry point. That could be a local client or in recent settings on the job manager. The job manager will create a `JobGraph` and chop it into `Task`s to send to the task managers, where each task has configuration that contains the serialized configuration. So while the `AvroParquetRecordFormat` is in the user jar that every task manager independently has access to, the specific format instance is serialized on JM, sent to TM, and deserialized there. Hence, every user function needs to be `Serializable` (e.g. `MapFunction`) or need to be created by a serializable factory (e.g. `BulkWriter.Factory`, `StreamFormat` as a factory for the `Reader`, `Source`, `Sink`). In this case, the `AvroParquetRecordFormat` is serialized without the schema information on JM, so the TM doesn't know the schema at all and will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * bbb03788c8414baff5c1ffc4ff214a0d129b6bfb Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r740224352



##########
File path: flink-formats/flink-parquet/pom.xml
##########
@@ -45,6 +45,14 @@ under the License.
 			<scope>provided</scope>
 		</dependency>
 
+		<!-- Flink-avro -->
+
+		<dependency>
+			<groupId>org.apache.flink</groupId>
+			<artifactId>flink-avro</artifactId>
+			<version>${project.version}</version>

Review comment:
       sure, thanks @echauchot for the hint.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       should be actually checked in the `createReader(
               Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)`, I will add it to make it more robust.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<GenericRecord> getProducedType() {
+        return new GenericRecordAvroTypeInfo(schema);
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link RecordFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class GenericRecordReader implements RecordFormat.Reader<GenericRecord> {

Review comment:
       Because of the generic type `GenericRecord`. I will upgrade the reader to support custom types and rename the class.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;
+
+    public AvroParquetRecordFormat(Schema schema) {
+        this.schema = schema;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link GenericRecordReader}, {@link
+     * ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<GenericRecord> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new GenericRecordReader(
+                AvroParquetReader.<GenericRecord>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(GenericData.get())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. Since current version does not support
+     * splitting,
+     */
+    @Override
+    public Reader<GenericRecord> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        // current version just ignore the splitOffset and use restoredOffset
+        stream.seek(restoredOffset);
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;

Review comment:
       Splitting will be supported in a follow-up ticket.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*

Review comment:
       sure, I will merge the methods into the `StreamFormat`

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like to understand the Flink concept and therefor the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>

Review comment:
       ok, I will create a new ticket for it.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(

Review comment:
       The original idea was to replace the `FileRecordFormat` with this `RecordFormat`. Merging it into the `StreamFormat` is also fine.

##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       these must be checked in the `createReader(
               Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)`. Adding these code here will make the logic redundant.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       After talking with @StephanEwen, it is recommended to use StreamFormat.

##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       @echauchot 
   After talking with @StephanEwen, it is recommended to use StreamFormat.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and therefor the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and therefore the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.

##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {
+
+    private final transient Schema schema;

Review comment:
       We are using 1.10.0 now, the `Schema` is serializable. Just out of curiosity, since the transient keyword has been used, I'd like take this opportunity to understand the Flink concept and incurred the reason why do we care about the serializable here. Will the Format object be created locally on each TM or will it created and transferred to TMs via network, which will need ser/des? Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 47ecd809d099606b06d8e3bc526d9735ae8c3321 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107) 
   * 92b897b4f14d61bf0414fe0d0f95771bdf5f05b5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768387505



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Yes, I think ultimately both `AvroParquetRecordFormat` and `ParquetVectorizedInputFormat` should share a common ancestor and be bulk-based. However, it's strictly speaking a performance improvement and probably improves maintainability in the long run (one base format where all the logic resides). For now, your solution already closes some gaps in the feature matrix and has a good value on its own. I'd pursue the other option later in a separate ticket/PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768390735



##########
File path: flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/Datum.java
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import java.io.Serializable;
+
+/** Test datum. */
+public class Datum implements Serializable {

Review comment:
       I guess in the end, it doesn't really matter since this is only in test scope.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768433756



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       fair enough, I will change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768682464



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetReaders.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.formats.avro.typeutils.AvroTypeInfo;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.specific.SpecificRecordBase;
+
+/**
+ * Convenience builder to create {@link AvroParquetRecordFormat} instances for the different Avro
+ * types.
+ */
+public class AvroParquetReaders {
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro {@link
+     * org.apache.avro.specific.SpecificRecord SpecificRecords}.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     */
+    public static <T extends SpecificRecordBase> AvroParquetRecordFormat<T> forSpecificRecord(
+            final Class<T> typeClass) {
+        return new AvroParquetRecordFormat<>(new AvroTypeInfo<>(typeClass));
+    }
+
+    /**
+     * Creates a new {@link AvroParquetRecordFormat} that reads the parquet file into Avro records
+     * via reflection.
+     *
+     * <p>To read into Avro {@link GenericRecord GenericRecords}, use the {@link
+     * #forGenericRecord(Schema)} method.
+     *
+     * <p>To read into Avro {@link org.apache.avro.specific.SpecificRecord SpecificRecords}, use the
+     * {@link #forSpecificRecord(Class)} method.
+     *
+     * @see #forGenericRecord(Schema)
+     * @see #forSpecificRecord(Class)
+     */
+    public static <T> AvroParquetRecordFormat<T> forReflectRecord(final Class<T> typeClass) {
+        if (SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            throw new IllegalArgumentException(
+                    "Please use AvroParquetReaders.forSpecificRecord(Class<T>) for SpecificRecord.");

Review comment:
       I would suggest let users be aware of it. We could provide the current solution as the first version and could change it to the solution you suggested anytime if users choose convenience over transparency.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765032174



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }

Review comment:
       This is the question of API design: Do we want to make the internal used 3rd. party class visible to the area where AvroParquetRecordFormat is constructed, i.e. the factory methods? Since the constructor of AvroParquetRecordFormat is package private, it is acceptable to do it. But it is generally recommended to encapsulate such information with the class so that any changes of the 3rd. party lib on the low level will not force signature changes on the higher level, e.g. avoid the Open/Closed Principle violation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r765238437



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;
+        private long skipCount;
+        private final boolean checkpointed;
+
+        private AvroParquetRecordReader(ParquetReader<E> parquetReader) {
+            this(parquetReader, CheckpointedPosition.NO_OFFSET, 0, false);
+        }
+
+        private AvroParquetRecordReader(
+                ParquetReader<E> parquetReader, long offset, long skipCount, boolean checkpointed) {
+            this.parquetReader = parquetReader;
+            this.offset = offset;
+            this.skipCount = skipCount;
+            this.checkpointed = checkpointed;
+        }
+
+        @Nullable
+        @Override
+        public E read() throws IOException {
+            E record = parquetReader.read();
+            incrementPosition();
+            return record;
+        }
+
+        @Override
+        public void close() throws IOException {
+            parquetReader.close();
+        }
+
+        @Nullable
+        @Override
+        public CheckpointedPosition getCheckpointedPosition() {
+            return checkpointed ? new CheckpointedPosition(offset, skipCount) : null;

Review comment:
       Thanks for your effort for researching and providing the details code logics. Yes, using the low level API ParquetFileReader is another option I've considered and finally got the feeling that it'd be better to use with `BulkFormat` directly like 'ParquetVectorizedInputFormat' did instead of with `StreamFormat` for the following reasons:
   
   -  the read logic is built in the internal low level class `InternalParquetRecordReader` with package private visibility in parquet-hadoop lib which uses another low level class `ParquetFileReader` internally. This makes the implementation of StreamFormat very complicated. I think the design idea of StreamFormat is to simplify the implementation. They do not seem to work together.
   
   -  `ParquetFileReader`reads data in batch mode, i.e. `PageReadStore pages = reader.readNextFilteredRowGroup();`. If we build these logic into StreamFormat(`AvroParquetRecordFormat` in this case), `AvroParquetRecordFormat` has to take over the role `InternalParquetRecordReader` does, including but not limited to 
         1. read `PageReadStore` in batch mode. 
         2. manage `PageReadStore`, i.e. read next page when all records in the current page have been consumed and cache it. 
         3. manage the read index within the current `PageReadStore` because StreamFormat has its own setting for read size, etc. All of these make `AvroParquetRecordFormat` become the `BulkFormat` instead of `StreamFormat`  
   
   - `StreamFormat` can only be used via `StreamFormatAdapter`, which means everything we will do with the low level APIs for parquet-hadoop lib should have not conflict with the built-in logic provided by `StreamFormatAdapter`.
   
   Now we could see if we build these logics into a `StreamFormat` implementation, i.e. `AvroParquetRecordFormat`, all convenient built-in logic provided by the `StreamFormatAdapter` turns into obstacles. There is also a violation of single responsibility principle, i.e. `AvroParquetRecordFormat`will take some responsibility of `BulkFormat`. I guess this were the reasons why 'ParquetVectorizedInputFormat' implemented `BulkFormat` instead of `StreamFormat`.
   
   In order to build a unified parquet implementation for both Table API and DataStream API, it makes more sense to consider building these code into a `BulkFormat` implementation class. Speaking of "solve both things at once", since the output data types are different, `RowData` vs. `Avro`, extra converter logic should be introduced into the architecture design. This is beyond the scope of this PR. I would suggest to open another ticket to focus on it. Depending on how complicated the issue will be and how big the impact it will have on the current code base, a new FLIP might be required.
   
   Current implementation follows the design idea of `StreamFormat` and keep everything on the high level and simple. It is therefore easy to implement and easy for user to understand. It is a good fit for simple use cases. It has no conflict to the other solution based on the code mentioned above teams up with `BulkFormat`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   * 1b28a0b7c702da551a283cc16fd4ea9dfa03eea3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on a change in pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r768387795



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.annotation.VisibleForTesting;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.StreamFormat;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.avro.specific.SpecificData;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+
+/** */
+public class AvroParquetRecordFormat<E> implements StreamFormat<E> {
+
+    private static final long serialVersionUID = 1L;
+
+    static final Logger LOG = LoggerFactory.getLogger(AvroParquetRecordFormat.class);
+
+    private final TypeInformation<E> type;
+
+    AvroParquetRecordFormat(TypeInformation<E> type) {
+        this.type = type;
+    }
+
+    /**
+     * Creates a new reader to read avro {@link GenericRecord} from Parquet input stream.
+     *
+     * <p>Several wrapper classes haven be created to Flink abstraction become compatible with the
+     * parquet abstraction. Please refer to the inner classes {@link AvroParquetRecordReader},
+     * {@link ParquetInputFile}, {@link FSDataInputStreamAdapter} for details.
+     */
+    @Override
+    public Reader<E> createReader(
+            Configuration config, FSDataInputStream stream, long fileLen, long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        return new AvroParquetRecordReader<E>(
+                AvroParquetReader.<E>builder(new ParquetInputFile(stream, fileLen))
+                        .withDataModel(getDataModel())
+                        .build());
+    }
+
+    /**
+     * Restores the reader from a checkpointed position. It is in fact identical since only {@link
+     * CheckpointedPosition#NO_OFFSET} as the {@code restoredOffset} is support.
+     */
+    @Override
+    public Reader<E> restoreReader(
+            Configuration config,
+            FSDataInputStream stream,
+            long restoredOffset,
+            long fileLen,
+            long splitEnd)
+            throws IOException {
+
+        // current version does not support splitting.
+        checkNotSplit(fileLen, splitEnd);
+
+        checkArgument(
+                restoredOffset == CheckpointedPosition.NO_OFFSET,
+                "The restoredOffset should always be NO_OFFSET");
+
+        return createReader(config, stream, fileLen, splitEnd);
+    }
+
+    @VisibleForTesting
+    GenericData getDataModel() {
+        Class<E> typeClass = getProducedType().getTypeClass();
+        if (org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(typeClass)) {
+            return SpecificData.get();
+        } else if (org.apache.avro.generic.GenericRecord.class.isAssignableFrom(typeClass)) {
+            return GenericData.get();
+        } else {
+            return ReflectData.get();
+        }
+    }
+
+    /** Current version does not support splitting. */
+    @Override
+    public boolean isSplittable() {
+        return false;
+    }
+
+    /**
+     * Gets the type produced by this format. This type will be the type produced by the file source
+     * as a whole.
+     */
+    @Override
+    public TypeInformation<E> getProducedType() {
+        return type;
+    }
+
+    private static void checkNotSplit(long fileLen, long splitEnd) {
+        if (splitEnd != fileLen) {
+            throw new IllegalArgumentException(
+                    String.format(
+                            "Current version of AvroParquetRecordFormat is not splittable, "
+                                    + "but found split end (%d) different from file length (%d)",
+                            splitEnd, fileLen));
+        }
+    }
+
+    /**
+     * {@link StreamFormat.Reader} implementation. Using {@link ParquetReader} internally to read
+     * avro {@link GenericRecord} from parquet {@link InputFile}.
+     */
+    private static class AvroParquetRecordReader<E> implements StreamFormat.Reader<E> {
+
+        private final ParquetReader<E> parquetReader;
+
+        private final long offset;

Review comment:
       Then we should remove the offset here and rather use the constant where applicable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d5ac3c1e54935cba4d90307be1aa50ef6c050aa Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521) 
   * d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109) 
   * 1e7d015f8af6c7528eb626b60d64a3143f8c3033 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 51d9550ffc3102a2d12b183a3a41100621ba90b3 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215) 
   * 1364bd658c0d829857bb27ab9899ce9cef8db077 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262) 
   * b8a96d9c1dc6facf2708deb6d74a20afd4183da9 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184) 
   * efdd53bb50c9b8712b87f7d24495e20fef5b78f5 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   * 8b1a7f936d0cb2e54ae4fb364075d72273386623 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168) 
   * 95ad572805a3aa45e9c78e07f83cd820abe00a52 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437) 
   * 3b1d71a530fe2a8f262d49b984632743e81934f8 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r749116740



##########
File path: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/RecordFormat.java
##########
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.connector.file.src.reader;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.util.CheckpointedPosition;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.core.fs.FileStatus;
+import org.apache.flink.core.fs.FileSystem;
+import org.apache.flink.core.fs.Path;
+
+import java.io.IOException;
+
+import static org.apache.flink.util.Preconditions.checkArgument;
+import static org.apache.flink.util.Preconditions.checkNotNull;
+
+/**
+ * A reader format that reads individual records from a file via {@link Path}.
+ *
+ * <p>This interface teams up with its superinterface together build a 2-levels API. {@link
+ * StreamFormat} focuses on abstract input stream and {@link RecordFormat} pays attention to the
+ * concrete FileSystem. This format is for cases where the readers need access to the file directly
+ * or need to create a custom stream. For readers that can directly work on input streams, consider
+ * using the superinterface {@link StreamFormat}.
+ *
+ * <p>Please refer the javadoc of {@link StreamFormat} for details.
+ *
+ * @param <T> - The type of records created by this format reader.
+ */
+@PublicEvolving
+public interface RecordFormat<T> extends StreamFormat<T> {
+
+    /**
+     * Creates a new reader to read in this format. This method is called when a fresh reader is
+     * created for a split that was assigned from the enumerator. This method may also be called on
+     * recovery from a checkpoint, if the reader never stored an offset in the checkpoint (see
+     * {@link #restoreReader(Configuration, Path, long, long, long)} for details.
+     *
+     * <p>Provide the default implementation, subclasses are therefore not forced to implement it.
+     * Compare to the {@link #createReader(Configuration, FSDataInputStream, long, long)}, This
+     * method put the focus on the {@link Path}. The default implementation adapts information given
+     * by method arguments to {@link FSDataInputStream} and calls {@link
+     * #createReader(Configuration, FSDataInputStream, long, long)}.
+     *
+     * <p>If the format is {@link #isSplittable() splittable}, then the {@code inputStream} is
+     * positioned to the beginning of the file split, otherwise it will be at position zero.
+     */
+    default StreamFormat.Reader<T> createReader(
+            Configuration config, Path filePath, long splitOffset, long splitLength)
+            throws IOException {
+
+        checkNotNull(filePath, "filePath");
+
+        final FileSystem fileSystem = filePath.getFileSystem();
+        final FileStatus fileStatus = fileSystem.getFileStatus(filePath);
+        final FSDataInputStream inputStream = fileSystem.open(filePath);
+
+        if (isSplittable()) {
+            inputStream.seek(splitOffset);
+        }

Review comment:
       added it and will revisit it if redundant issue raised.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-968676739


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "997062683",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] AHeise commented on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
AHeise commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-997449037


   Merged into master as b4ca35041988cae9b23affe0441595a25506aaba..cdf3d483e191716ab40bd4185c7a674c7a648b6e.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26168",
       "triggerID" : "8b1a7f936d0cb2e54ae4fb364075d72273386623",
       "triggerType" : "PUSH"
     }, {
       "hash" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26184",
       "triggerID" : "95ad572805a3aa45e9c78e07f83cd820abe00a52",
       "triggerType" : "PUSH"
     }, {
       "hash" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26208",
       "triggerID" : "efdd53bb50c9b8712b87f7d24495e20fef5b78f5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26345",
       "triggerID" : "1e9ac381ab9fc13d738a5bbd4be8207232240a0a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26437",
       "triggerID" : "b3c4a89278761f5d995bb7e7c03c0bc80e0d8cfe",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3b1d71a530fe2a8f262d49b984632743e81934f8",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26516",
       "triggerID" : "968676739",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=26521",
       "triggerID" : "9d5ac3c1e54935cba4d90307be1aa50ef6c050aa",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28109",
       "triggerID" : "d6ef4f4b2b8f1bfbe4493c2f6904d495e610e6db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28115",
       "triggerID" : "1e7d015f8af6c7528eb626b60d64a3143f8c3033",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28122",
       "triggerID" : "1b28a0b7c702da551a283cc16fd4ea9dfa03eea3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28182",
       "triggerID" : "3a547a5b480efdb2d61a2bddcb053dd6a8ee61be",
       "triggerType" : "PUSH"
     }, {
       "hash" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28215",
       "triggerID" : "51d9550ffc3102a2d12b183a3a41100621ba90b3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28262",
       "triggerID" : "1364bd658c0d829857bb27ab9899ce9cef8db077",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28274",
       "triggerID" : "b8a96d9c1dc6facf2708deb6d74a20afd4183da9",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280",
       "triggerID" : "0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341",
       "triggerID" : "b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0507c16fcb133b7efb3ad5ac20ee163bd8cb46d6 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28280) 
   * b8f5219af5fd7fb7994d5ffa20eb8d23fa81693d Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=28341) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737400499



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Hi @JingGe Thanks for your work ! I must admit I have mixed feelings about this PR: I feel like it is very java-stream and single-split oriented. I think as Fabian that implementing `BulkFormat` would be better.
   
   @echauchot  many thanks for your comments, I will try to answer them in each thread.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r737414033



##########
File path: flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##########
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.formats.parquet.avro;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.connector.file.src.reader.RecordFormat;
+import org.apache.flink.core.fs.FSDataInputStream;
+import org.apache.flink.formats.avro.typeutils.GenericRecordAvroTypeInfo;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.parquet.avro.AvroParquetReader;
+import org.apache.parquet.hadoop.ParquetReader;
+import org.apache.parquet.io.DelegatingSeekableInputStream;
+import org.apache.parquet.io.InputFile;
+import org.apache.parquet.io.SeekableInputStream;
+
+import javax.annotation.Nullable;
+
+import java.io.IOException;
+
+/** */
+public class AvroParquetRecordFormat implements RecordFormat<GenericRecord> {

Review comment:
       > nit: please rename to `ParquetAvroRecordFormat` for concistency with existing (Flink 1.13) `ParquetAvroInputFormat`
   
   This is the same question @fapaul asked. Here is a copy of my original answer: "I named it after the naming convention from apache parquet lib e.g. AvroParquetReader. It sound nature for "reading avro from parquet". I would suggest we change ParquetAvroWriters to AvroParquetWriters.". 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-944297212


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25107",
       "triggerID" : "47ecd809d099606b06d8e3bc526d9735ae8c3321",
       "triggerType" : "PUSH"
     }, {
       "hash" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25170",
       "triggerID" : "92b897b4f14d61bf0414fe0d0f95771bdf5f05b5",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25760",
       "triggerID" : "99dac03879947eb2a9e3774c0c6170f9389e5185",
       "triggerType" : "PUSH"
     }, {
       "hash" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25765",
       "triggerID" : "bbb03788c8414baff5c1ffc4ff214a0d129b6bfb",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982",
       "triggerID" : "b48104914a31c05ad902cb0c36aef52b2b4093a8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b48104914a31c05ad902cb0c36aef52b2b4093a8 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25982) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Posted by GitBox <gi...@apache.org>.
echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-950676384


   > > R:@echauchot @JingGe thanks a lot for your work ! If I may, I'd like to review this PR as I was the author of the `ParquetAvroInputFormat` for the older source.
   > 
   > @echauchot Thanks for your interest and effort. It would be great if you would review this PR. Appreciate it. Please be aware of the "draft" of the current PR status. There are a lot information written in the PR description at the beginning that might hopefully give you the background info for the review.
   
   Sure, I saw the comments about split and data types etc... But I feel unconfortable about draft PRs because they usually cannot be merged as is. In the case of your PR, merging it without the split support could not be done. So I guess the correct way to proceed is to use this PR as an environment for design discussions and add extra commits util the PR is ready for prod @JingGe @fapaul  WDYT ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org