You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/06 22:35:53 UTC

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6845: Add the complex data type transformer for complex type handling

Jackie-Jiang commented on pull request #6845:
URL: https://github.com/apache/incubator-pinot/pull/6845#issuecomment-833918349


   > > For the complex type column, I can see 4 different operations:
   > > 
   > > * SKIP (not include in the output)
   > > * FLATTEN (into key-value pairs, can generate multiple records for list)
   > > * RETAIN (as Map or List)
   > > * TO_JSON_STRING
   > > 
   > > We can have a default operation, then have an override field list for each operation.
   > > We should probably separate the map config from the list config because I can see a common case where user only want to flatten map but not the list, so that list can be stored as MV column.
   > > Also, I think we should make delimiter configurable. I remember there are some issues storing column name with dot in presto connector (@xiangfu0 to confirm).
   > > So the overall config would look like:
   > > ```
   > > complexTypeConfig: {
   > >   "map": {
   > >     "default": "FLATTEN",
   > >     "SKIP": ["colA"],
   > >     "TO_JSON_STRING": ["colB"],
   > >     "delimiter": "_"
   > >   },
   > >   "list": {
   > >     "default": "RETAIN",
   > >     "FLATTEN": ["colC_listField"],
   > >   }
   > > }
   > > ```
   > 
   > `SKIP (not include in the output)`
   > Is this needed? If the column (or all descendent columns) does not show up in the schema, then it means skipped?
   > 
   > `FLATTEN (into key-value pairs, can generate multiple records for list)`
   > Yes, this is the current default behavior for map
   > 
   > `RETAIN (as Map or List)`
   > Same, this info shall be conveyed by the columns in schema
   > 
   > `TO_JSON_STRING`
   > I was thinking about this, and I feel it can be done via `jsonFormat` as your current jsonMeetupQuickStart example uses. Not sure if we shall add this syntax sugar in this config.
   
   Discussed offline and we only need the `unnestFields` and `delimiter` in the config for now:
   ```
   "complexTypeConfig": {
     "delimiter": "_",
     "unnestFields": [...]
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org