You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Xinli Shang (Jira)" <ji...@apache.org> on 2021/09/20 19:40:00 UTC

[jira] [Created] (PARQUET-2093) Add rewriter version to Parquet footer

Xinli Shang created PARQUET-2093:
------------------------------------

             Summary: Add rewriter version to Parquet footer 
                 Key: PARQUET-2093
                 URL: https://issues.apache.org/jira/browse/PARQUET-2093
             Project: Parquet
          Issue Type: Improvement
    Affects Versions: 1.13.0
            Reporter: Xinli Shang
            Assignee: Xinli Shang


Parquet footer records the writer's version in the field of 'create-by'. As we introduce several rewrites, the new file is written partially by the rewriter. In this case, we need to record the rewriter's version also. 

Some questions (about a common rewriter) we need to answer before step forward:

What would be the place of the rewriter versions? (New specific field or key-value metadata? Which key shall we use?)
Shall we somehow also save what the rewriter has done? How?
At what level shall we copy the original created_by field and what level shall we write the version of the rewriter to that field instead? (What different levels are possible?)
From the introduction of this rewriter(s) field in case of any related writer version dependent fix we need to check this field as well and not only the created_by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)