You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 20:17:14 UTC

[GitHub] [beam] kennknowles opened a new issue, #18737: SolrIO: Expose commitWithin to the Solr write

kennknowles opened a new issue, #18737:
URL: https://github.com/apache/beam/issues/18737

   A good improvement for to SolrIO would be to allow the caller to provide a `commitWithin` parameter.  Currently the batch is passed to the underlying `solrClient` which results in defaulting to the configured server behavior.
   
   The justification for exposing this is that the collection in the target SOLR server might be configured in a way that is not suitable for this beam job.  E.g. a server tuned to accept real time updates with fast flush times from streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
   
   This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered together. I understand that the policy of Beam is not to expose parameters for tuning.  When it comes to the IOs which are for interfacing with external systems I recommend this policy be reconsidered.  The IO modules typically wrap clients to target systems (`CloudSolrClient` in this case) which all have tunable parameters for good reason. My recommendation would be to keep `SolrIO.write()` providing sensible defaults but expose an additional builder e.g. `SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()` .
   
   Please feel free to assign to me if of interest and I'll provide a PR.
   
   Imported from Jira [BEAM-3862](https://issues.apache.org/jira/browse/BEAM-3862). Original Jira may contain additional context.
   Reported by: timrobertson100.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org