You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/20 02:40:06 UTC

[GitHub] [iceberg] JingsongLi commented on issue #1215: FlinkParquetWriter should build writer with schema visitor

JingsongLi commented on issue #1215:
URL: https://github.com/apache/iceberg/issues/1215#issuecomment-660768444


   +1, We should deal with these differences and we can consider rewriting some `ParquetValueWriter`s.
   The same is true for reader and Avro, including ORC.
   
   This workload is not small. I wonder if we can only implement Flink internal data structure conversions, and just provide a converter to convert Flink `Row` to `RowData`.
   
   Here is a Flink table/SQL data structure table, the internal data structure is more efficient, and the external structure is more often used in UDF.
   
   ```
    * +--------------------------------+-----------------------------------------+-------------------+
    * | SQL Data Types                 | Internal Data Structures (RowData)      |  External (Row)   |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | BOOLEAN                        | boolean                                 |  boolean          |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | CHAR / VARCHAR / STRING        | {@link StringData}                      |  String           |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | BINARY / VARBINARY / BYTES     | byte[]                                  |  byte[]           |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | DECIMAL                        | {@link DecimalData}                     |  BigDecimal       |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | TINYINT                        | byte                                    |  byte             |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | SMALLINT                       | short                                   |  short            |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | INT                            | int                                     |  int              |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | BIGINT                         | long                                    |  long             |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | FLOAT                          | float                                   |  float            |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | DOUBLE                         | double                                  |  double           |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | DATE                           | int (number of days since epoch)        |  LocalDate        |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | TIME                           | int (number of milliseconds of the day) |  LocalTime        |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | TIMESTAMP                      | {@link TimestampData}                   |  LocalDateTime    |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | TIMESTAMP WITH LOCAL TIME ZONE | {@link TimestampData}                   |  Instant          |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | INTERVAL YEAR TO MONTH         | int (number of months)                  |  Period           |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | INTERVAL DAY TO MONTH          | long (number of milliseconds)           |  Duration         |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | ROW / structured types         | {@link RowData}                         |  Row              |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | ARRAY                          | {@link ArrayData}                       |  T[]              |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | MAP / MULTISET                 | {@link MapData}                         |  Map              |
    * +--------------------------------+-----------------------------------------+-------------------+
    * | RAW                            | {@link RawValueData}                    |  T                |
    * +--------------------------------+-----------------------------------------+-------------------+
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org