You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/19 17:18:04 UTC

[GitHub] [arrow] nevi-me commented on pull request #8968: ARROW-10979: [Rust] Basic Kafka Reader

nevi-me commented on pull request #8968:
URL: https://github.com/apache/arrow/pull/8968#issuecomment-748500486


   Hi @kflansburg this is some great work. I've just gone through the code briefly.
   
   > I really like your idea of using Kafka as a transport layer for Arrow Flight messages.
   
   I'd be interested in seeing how we could go about with implementing this.
   
   > I was planning to try to implement some sort of JSON parsing -> Arrow StructArray for the Kafka payload field, but parsing it as Arrow flight would be very cool as well.
   
   Our JSON reader already has the building blocks needed to trivially do this, and after #8938, you should be able to read all nested JSON types.
   
   I played around with converting Avro messages from Kafka into Arrow data. This would also be an interesting use-case for your streaming usecase.
   
   ___
   
   There is a slight downside to having the `arrow-kafka` live in this repository, which is that `librdkafka` isn't trivial to install in Windows (I use it in WSL instead). So from a development perspective, it might impose some load on developers (esp drive-by contributions).
   
   I'm a proponent of bundling crates into `arrow/rust` if they could benefit from us (i.e. the commiters and regular contributors) making some changes to keep them compiling. We sometimes make breaking changes to our interfaces, so being able to fix the crates is very useful.
   
   With the above said, I think we should use this crate as an opportunity to have a bigger discussion about where additional modules should live. For example, I recently opened a draft RFC for `arrow-sql` (#8731), with my main motivation of wanting to put it into `rust/arrow/arrow-sql` being that it could also benefit from the performance improvements that we're regularly making.
   
   We could try the `arrow-contrib` approach, where we maintain additional IO modules and other crates or projects in languages other than Rust.
   This would be similar to other projects like OpenTracing & OpenTelemetry where separate tracing libraries are maintained within the same organisation, but under different repos.
   This is probably a bigger mailing list discussion, but I'd like to hear your and @andygrove 's thoughts first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org