You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/24 13:26:39 UTC

[GitHub] [arrow-datafusion] yarenty commented on issue #1544: Streaming support for DataFusion

yarenty commented on issue #1544:
URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1165574649

   I just trying to integrate datafusion with kafka,  final goal is to have end-to-end streaming. But I started from a "different side"  => step 1 is to publish output to kafka, so I copied code/ created kafka publisher:  https://github.com/yarenty/arrow-datafusion/tree/master/datafusion/core/src/physical_plan/kafka 
   
   Test case is here:
   https://github.com/yarenty/arrow-datafusion/blob/master/datafusion/core/tests/ordered_sql_to_kafka.rs
   
   All finished with something like this: 
   ```rust
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let ctx = SessionContext::new();
       ctx.register_csv("example", "tests/example.csv", CsvReadOptions::new()).await?;
   
       let df = ctx
           .sql("SELECT a, MIN(b) as bmin FROM example GROUP BY a ORDER BY a LIMIT 100")
           .await?;
   
       // kafka context
       let stream_ctx = KafkaContext::with_config(
           KafkaConfig::new("test_topic")
               .set("bootstrap.servers", "127.0.0.1:9092")
               .set("compression.codec", "snappy"),
       );
   
       df.publish_to_kafka( stream_ctx).await?;
   
       Ok(())
   }
   ```
   
   Still not sure if this is the correct way to do it and if I put code in the proper places ... still ... you are learning something new every day.
   
   Is there any other place where you can share code / check ideas?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org