You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/27 07:59:39 UTC
[GitHub] [druid] JulianJaffePinterest edited a comment on pull request #10920: Spark Direct Readers and Writers for Druid.

JulianJaffePinterest edited a comment on pull request #10920:
URL: https://github.com/apache/druid/pull/10920#issuecomment-849417236


   I've had a number of conversations about these connectors and how to use them. Common pain points are partitioning and user-friendliness, and so I've added three new partitioners, ergonomic ways to use the new partitioners, and a semi-typed way to configure the readers and writers. The improved ergonomics do come at the cost of introducing scala implicits into the project, which I have tried to avoid to ease comprehension for other developers. However, I think the tradeoff here is worth it. See the [extension documentation for more details](https://github.com/JulianJaffePinterest/druid/blob/spark_druid_connector/docs/development/extensions-core/spark.md) for more details.
   
   Example usages:
   
   Configuring the reader:
   ```scala
   import org.apache.druid.spark.DruidDataFrameReader
   
   sparkSession
     .read
     .brokerHost("localhost")
     .brokerPort(8082)
     .metadataDbType("mysql")
     .metadataUri("jdbc:mysql://druid.metadata.server:3306/druid")
     .metadataUser("druid")
     .metadataPassword("diurd")
     .dataSource("dataSource")
     .druid()
   ```
   
   Configuring the writer:
   ```scala
   import org.apache.druid.spark.DruidDataFrameWriter
   
   val deepStorageConfig = new LocalDeepStorageConfig().storageDirectory("/mnt/druid/druid-segments/")
   
   df
     .write
     .metadataDbType("mysql")
     .metadataUri("jdbc:mysql://druid.metadata.server:3306/druid")
     .metadataUser("druid")
     .metadataPassword("diurd")
     .version(1)
     .deepStorage(deepStorageConfig)
     .mode(SaveMode.Overwrite)
     .dataSource("dataSource")
     .druid()
   ```
   
   Using the new partitioners and the ergonomic approach to passing the a partition map to the writer:
   ```scala
   import org.apache.druid.spark.DruidDataFrame
   import org.apache.druid.spark.DruidDataFrameWriter
   
   val deepStorageConfig = new LocalDeepStorageConfig().storageDirectory("/mnt/druid/druid-segments/")
   
   df
     .rangePartitionerAndWrite(tsCol, tsFormat, granularityString, rowsPerPartition, partitionCol)
     .metadataDbType("mysql")
     .metadataUri("jdbc:mysql://druid.metadata.server:3306/druid")
     .metadataUser("druid")
     .metadataPassword("diurd")
     .version(1)
     .deepStorage(deepStorageConfig)
     .mode(SaveMode.Overwrite)
     .dataSource("dataSource")
     .druid()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org