You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2023/03/13 21:18:58 UTC

[GitHub] [druid] gianm commented on issue #13923: Proposal: Druid extension to read and ingest Iceberg data files

gianm commented on issue #13923:
URL: https://github.com/apache/druid/issues/13923#issuecomment-1466977105

   Very cool! I have heard a lot of people asking about Iceberg integration, so I think this kind of capability would be very interesting to the community.
   
   One question I have is whether it makes sense to do this as purely an InputSource? Like:
   
   ```
   "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "iceberg",
           "tableName": "logs",
           "namespace": "webapp",
           "partitionColumn": "event_time",
           "intervals": ["2023-01-26T00:00:00.000Z/2023-02-18T00:00:00.000Z"]
         }
   }
   ```
   
   I'm imagining that the IcebergInputSource goes out and finds the data backing the table, then delegates (internally) to the appropriate InputSource and InputFormat. In your example it'd internally delegate to HdfsInputSource to compute splits. For formatting, it would return `false` from `needsFormat`, then internally it would use ParquetInputFormat on calls to `reader`.
   
   As to MSQ integration, unless the implementation is doing something weird, it should work out of the box with `EXTERN`. EXTERN lets people use any InputSource and any InputFormat. We can add nicer SQL syntax for it in a future patch, but there is that fallback method.
   
   @maytasm was asking about this on Slack recently so might be interested as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org