You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Tilak Patidar (JIRA)" <ji...@apache.org> on 2017/09/12 11:51:00 UTC

[jira] [Created] (GOBBLIN-249) Documenting an abstract JSON schema specification to convert records to different formats

Tilak Patidar created GOBBLIN-249:
-------------------------------------

             Summary: Documenting an abstract JSON schema specification to convert records to different formats
                 Key: GOBBLIN-249
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-249
             Project: Apache Gobblin
          Issue Type: Wish
          Components: gobblin-core
            Reporter: Tilak Patidar
            Assignee: Abhishek Tiwari


Various converters are using the source.schema value to convert source record into respective data formats providing the support for data types both primitive and complex. It seems like we should write down a specification for defining a source.schema. The specification should include instructions on:
* Converters and their use case <Source, Target>.
* Converters and the data types supported by them.
* List of data types and their properties.
* Examples of writing schema both nested and simple.
* List of configuration values used by converters.
* List of various options available for defining the schema of a field. (size, nullable etc)

This source.schema would act as an abstraction over the underlying schemas and data types of different formats such as Avro, Parquet, ORC etc. The user will define the source.schema adhering to our specification and can convert and write to different data format without worrying about target data format schema.

Data type abstraction
For example, Parquet does not have MAP type, but a map can be created by using a repeatable group in parquet. If the user defines a MAP on source schema we can do the necessary conversion and provide him with a MAP like structure in Parquet. In this way, the user is freed from the concern of type conversion and target schema. And maybe the converters can be made a separate module acting as conversion library for different data formats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)