You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Vitalii Diravka (Jira)" <ji...@apache.org> on 2021/04/13 06:11:00 UTC

[jira] [Created] (PARQUET-2026) Allow empty row in parquet file

Vitalii Diravka created PARQUET-2026:
----------------------------------------

             Summary: Allow empty row in parquet file
                 Key: PARQUET-2026
                 URL: https://issues.apache.org/jira/browse/PARQUET-2026
             Project: Parquet
          Issue Type: Task
          Components: parquet-mr
    Affects Versions: 1.12.0
            Reporter: Vitalii Diravka
             Fix For: 1.13.0
         Attachments: Screenshot from 2021-04-13 08-52-56.png

PARQUET-1851 starts abandon to write parquet files with schema (meta information), but with 0 rows, aka empty files.
In result it prevent to store empty tables in DRILL by using parquet files, for example:
{code:java}
CREATE TABLE dfs.tmp.%s AS SELECT * FROM cp.`employee.json` WHERE 1=0{code}
{code:java}
CREATE TABLE dfs.tmp.%s AS select * from dfs.`parquet/alltypes_required.parquet` where `col_int` = 0{code}
{code:java}
create table dfs.tmp.%s as select * from dfs.`parquet/empty/complex/empty_complex.parquet`{code}
So PARQUET-1851 breaks the following test cases:
{code:java}
TestUntypedNull.testParquetTableCreation   TestParquetWriterEmptyFiles.testComplexEmptyFileSchema   TestParquetWriterEmptyFiles.testWriteEmptyFile   TestParquetWriterEmptyFiles.testWriteEmptyFileWithSchema   TestParquetWriterEmptyFiles.testWriteEmptySchemaChange TestMetastoreCommands.testAnalyzeEmptyRequiredParquetTable  TestMetastoreCommands.testSelectEmptyRequiredParquetTable{code}
 I suggest to use warning in the process of creating empty parquet files or create alternative _endBlock_ for backward compatibility with other tools:
!Screenshot from 2021-04-13 08-52-56.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)