You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2018/04/27 12:46:00 UTC
[jira] [Comment Edited] (BEAM-3947) Add support for Solr 6.x/7.x

    [ https://issues.apache.org/jira/browse/BEAM-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456285#comment-16456285 ] 

Cao Manh Dat edited comment on BEAM-3947 at 4/27/18 12:45 PM:
--------------------------------------------------------------

After taking a look at the current state, I think we must discuss the goal of this issue.

If we just want the pipeline to be able to read from Solr, then the current code is fine, it can read/write data from/to Solr 5x, Solr 6x and Solr 7x. Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since all the documents are streamed in one response and the cost of retrieving document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 


was (Author: caomanhdat):
After taking a look at the current state, I think we must discuss the goal of this issue.

If we just want the pipeline to be able to read from Solr, then the current code is fine, it can read data from Solr 5x, Solr 6x and Solr 7x. Because all the uses of SolrJ ( the client for Solr cluster ) in Beam right now are very basic, ex
 * Parsing data from Zookeeper to know where the Solr nodes live
 * Calling some HTTP APIs that has not changed since then

It seems that we should focus on using new features of Solr 6x and Solr 7x ( we may or may not need to update the SolrJ )
 * Support "/export" handler, it will make SolrIO significantly faster since all the documents are streamed in one response and the cost of retrieving document's fields are much less than current ( column-oriented vs row-oriented )
 * BoundedSolrSource.split can split the source into arbitrary smaller parts.

 

 

> Add support for Solr 6.x/7.x
> ----------------------------
>
>                 Key: BEAM-3947
>                 URL: https://issues.apache.org/jira/browse/BEAM-3947
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-solr
>            Reporter: Ismaël Mejía
>            Assignee: Cao Manh Dat
>            Priority: Minor
>
> The initial PR on Solr was based on Solr 6.x, however at that time we supported Java 7 so Insisted to move it to Solr 5.x (which is Java 7 compatible). This issue is to add support for multiple versions of Solr ideally in a single module.
> Notice that I was able to recover the original code for Solr 6.x created by [~caomanhdat] here (there are some differences in the way the Split was calculated and maybe some other minor things):(
> [https://github.com/iemejia/beam/blob/recover-solrio/sdks/java/io/solr/pom.xml]
> This issue does not cover support for Solr 7, but if it is possible to add it as part of it, it would be great.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)