You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/10 17:53:33 UTC

[GitHub] [druid] josephglanville opened a new pull request #9671: AvroOCFInputFormat

josephglanville opened a new pull request #9671: AvroOCFInputFormat
URL: https://github.com/apache/druid/pull/9671
 
 
   ### Description
   
   Implements support for reading files in [Avro Object Container Format](https://avro.apache.org/docs/current/spec.html#Object+Container+Files) with the new native indexing InputFormat interface.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [x] been tested in a test Druid cluster.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `AvroOCFInputFormat`
    * `AvroOCFReader`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville edited a comment on issue #9671: Add support for Avro OCF using InputFormat

Posted by GitBox <gi...@apache.org>.
josephglanville edited a comment on issue #9671: Add support for Avro OCF using InputFormat
URL: https://github.com/apache/druid/pull/9671#issuecomment-612416882
 
 
   One thing I noticed was that the web console format detection runs CSV and TSV heuristics before magic byte sequence detections. I think this ordering is flawed and should be changed because formats that contain uncompressed string data may inadvertently trip these heuristics but the magic byte matching should be quite reliable.
   
   Should I change that here or send a follow up PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on issue #9671: AvroOCFInputFormat

Posted by GitBox <gi...@apache.org>.
josephglanville commented on issue #9671: AvroOCFInputFormat
URL: https://github.com/apache/druid/pull/9671#issuecomment-612354839
 
 
   There are 2 main things this PR could benefit from, obviously documentation but also the ability to specify a reader schema. Currently this will always read the files with the writer schema which may not be desired (especially if needing to read multiple old versions in a single index job).
   
   I will attempt to add both in the coming days.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] josephglanville commented on issue #9671: Add support for Avro OCF using InputFormat

Posted by GitBox <gi...@apache.org>.
josephglanville commented on issue #9671: Add support for Avro OCF using InputFormat
URL: https://github.com/apache/druid/pull/9671#issuecomment-612416882
 
 
   One thing I noticed was that the web console format detection runs CSV and TSV heuristics before magic byte sequence detections. I think this approach is flawed and should be changed because formats that contain uncompressed string data may inadvertently trip these heuristics but the magic byte matching should be quite reliable.
   
   Should I change that here or send a follow up PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org