You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Tim Robertson (JIRA)" <ji...@apache.org> on 2018/03/16 09:40:00 UTC

[jira] [Updated] (BEAM-3862) SolrIO: Expose commitWithin to the Solr write

     [ https://issues.apache.org/jira/browse/BEAM-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Robertson updated BEAM-3862:
--------------------------------
    Description: 
A good improvement for to SolrIO would be to allow the caller to provide a {{commitWithin}} parameter.  Currently the batch is passed to the underlying {{solrClient}} which results in defaulting to the configured server behavior.

The justification for exposing this is that the collection in the target SOLR server might be configured in a way that is not suitable for this beam job.  E.g. a server tuned to accept real time updates with fast flush times from streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.

This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered together. I understand that the policy of Beam is not to expose parameters for tuning.  When it comes to the IOs which are for interfacing with external systems I recommend this policy be reconsidered.  The IO modules typically wrap clients to target systems ({{CloudSolrClient}} in this case) which all have tunable parameters for good reason. My recommendation would be to keep {{SolrIO.write()}} providing sensible defaults but expose an additional builder e.g. {{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}} .

Please feel free to assign to me if of interest and I'll provide a PR.

  was:
A good improvement for to SolrIO would be to allow the caller to provide a `commitWithin` parameter.  Currently the batch is passed to the underlying `solrClient` which results in defaulting to the configured server behavior.

The justification for exposing this is that the collection in the target SOLR server might be configured in a way that is not suitable for this beam job.  E.g. a server tuned to accept real time updates with fast flush times from streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.

This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered together. I understand that the policy of Beam is not to expose parameters for tuning.  When it comes to the IOs which are for interfacing with external systems I recommend this policy be reconsidered.  The IO modules typically wrap clients to target systems (`CloudSolrClient` in this case) which all have tunable parameters for good reason. My recommendation would be to keep `SolrIO.write()` providing sensible defaults but expose an additional builder e.g.`SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()`.

Please feel free to assign to me if of interest and I'll provide a PR.


> SolrIO: Expose commitWithin to the Solr write
> ---------------------------------------------
>
>                 Key: BEAM-3862
>                 URL: https://issues.apache.org/jira/browse/BEAM-3862
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Tim Robertson
>            Assignee: Ismaël Mejía
>            Priority: Trivial
>
> A good improvement for to SolrIO would be to allow the caller to provide a {{commitWithin}} parameter.  Currently the batch is passed to the underlying {{solrClient}} which results in defaulting to the configured server behavior.
> The justification for exposing this is that the collection in the target SOLR server might be configured in a way that is not suitable for this beam job.  E.g. a server tuned to accept real time updates with fast flush times from streaming Beam job 1, while Beam job 2 is doing an nightly bulk load.
> This is related to (BEAM-3849, BEAM-3848, BEAM-3820) and should be considered together. I understand that the policy of Beam is not to expose parameters for tuning.  When it comes to the IOs which are for interfacing with external systems I recommend this policy be reconsidered.  The IO modules typically wrap clients to target systems ({{CloudSolrClient}} in this case) which all have tunable parameters for good reason. My recommendation would be to keep {{SolrIO.write()}} providing sensible defaults but expose an additional builder e.g. {{SolrIO.writeBuilder().withCommitWithinMs(300000).withBatchSize(9000).build()}} .
> Please feel free to assign to me if of interest and I'll provide a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)