You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Krzysztof Chmielewski (Jira)" <ji...@apache.org> on 2023/02/23 10:22:00 UTC

[jira] [Created] (FLINK-31197) Exception while writing Parqeut files containing Arrays with complex types.

Krzysztof Chmielewski created FLINK-31197:
---------------------------------------------

             Summary: Exception while writing Parqeut files containing Arrays with complex types.
                 Key: FLINK-31197
                 URL: https://issues.apache.org/jira/browse/FLINK-31197
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.16.1, 1.15.3, 1.15.2, 1.16.0, 1.15.1, 1.15.0, 1.17.0, 1.15.4, 1.16.2, 1.17.1, 1.15.5
            Reporter: Krzysztof Chmielewski
         Attachments: ParquetSinkArrayOfArraysIssue.java

After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array<Arrays>
Array<Map>
Array<Row> 

When trying to write a Parquet row with such types, the below exception is thrown:
{code:java}
Caused by: java.lang.RuntimeException: org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
	at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
	at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
	at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
	at org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported Exception no nothing, simply empty methods. In https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here.
1. writing those three types is still not possible
2. Flink is throwing an exception that gives no hint about the real issue here. It could throw "Unsupported operation" for now. Maybe this should be item for different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write to Parquet file a single row with one column of type Array<Array<int>>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)