You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/20 07:54:33 UTC

[GitHub] [flink] JingGe commented on a change in pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

JingGe commented on a change in pull request #17501:
URL: https://github.com/apache/flink/pull/17501#discussion_r732506246



##########
File path: flink-formats/flink-avro/pom.xml
##########
@@ -26,7 +26,7 @@ under the License.
 		<groupId>org.apache.flink</groupId>
 		<artifactId>flink-formats</artifactId>
 		<version>1.15-SNAPSHOT</version>
-		<relativePath>..</relativePath>
+		<relativePath>../pom.xml</relativePath>

Review comment:
       > Thanks for providing this pull request but I have a few preliminary questions about the design.
   > 
   > Every time I read something about parquet formats I always think the format should be based on the `BulkFormat` interface. Why did you base your implementation on the StreamFormat?
   > 
   > As a second point, I'd like to see an IT case using the new format with the `FileSource`. Did you already test this?
   
   
   
   > preliminary
   
   Thanks for asking. Using StreamFormat will enable streaming process for parquet file source. Further more, the same implementation can be used in batch processing via adapter too, please refer to e.g. StreamFormatAdapter. Afaik, this is one of the good design coming with the new FileSource.
   
   Logic has been tested in the UT. For the second question about the IT case, it is a good question to discuss here, I am open for the decision. Question 1: Format works more like a factory, do we really need IT for a factory? Question 2: since BulkFormat is the only format used to create FileSource internally, we could consider building the IT for the BulkFormat with the FileSource instead of the StreamFormat.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org