You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Victor (JIRA)" <ji...@apache.org> on 2019/04/12 07:47:00 UTC

[jira] [Commented] (PARQUET-65) Create a jackson integration module for pojo support

    [ https://issues.apache.org/jira/browse/PARQUET-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816053#comment-16816053 ] 

Victor commented on PARQUET-65:
-------------------------------

Is this still a subject?

It would be great to be able to generate a parquet schema from a pojo (in the same way as [https://github.com/FasterXML/jackson-dataformats-binary/tree/master/avro#generating-avro-schema-from-pojo-definition)] and then be able to write it to a parquet file, but without all the overhead of going through avro (which implies serializing to bytes then read it back with a generic record from avro before usirg the AvroParquetWriter, cf https://github.com/FasterXML/jackson-dataformats-binary/issues/9#issuecomment-325685012).

> Create a jackson integration module for pojo support
> ----------------------------------------------------
>
>                 Key: PARQUET-65
>                 URL: https://issues.apache.org/jira/browse/PARQUET-65
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Priority: Minor
>
> There's currently a PR for pojo support:
> https://github.com/apache/incubator-parquet-mr/pull/21
> And it occurred to me that one way we could do this without re-inventing the wheel is to use jackson. Jackson can essentially take a parse tree, either the result of parsing XML, or json, or anything (for example there's a yaml plugin), and then, there are 3 things jackson lets you do with that tree. You can either visit the nodes in the tree (they call this streaming), you can map the tree onto the datastructures built into java (essentially get a Map<Object, Object>, or, you can map the tree onto a user defined class. The latter lets you work with a well typed class, and also lets you use jackson's annotations for controlling how the tree -> pojo mapping works (renaming fields and so on).
> We could leverage all of that by creating something that goes from parquet data to the jackson parse tree, and then leave the rest of the work to jackson. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)