You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2013/04/25 11:22:17 UTC

[jira] [Commented] (FLUME-1687) ApacheSolrSink

    [ https://issues.apache.org/jira/browse/FLUME-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641593#comment-13641593 ] 

Mike Percy commented on FLUME-1687:
-----------------------------------

Israel, cool patch! I have some high level feedback and some nitpicky feedback.

High level:
* Can we abstract the SolrEventSerializer concept a bit more broadly to be a SolrIndexer? The idea is that people may want to do more than simply map one event to one document, as well as use implementations other than ConcurrentUpdateSolrServer. In order to support more complex indexing use cases in the future, one way to do it could be adding an interface like:

{noformat}
public interface SolrIndexer extends Configurable {
  public void configure(Context ctx);
  public void init();
  public void load(Event event) throws IOException, SolrServerException;
  public void beginSolrTransaction() throws IOException, SolrServerException;
  public void commitSolrTransaction() throws IOException, SolrServerException;
  public void rollbackSolrTransaction() throws IOException, SolrServerException;
  public void shutdown();
}
{noformat}

So stuff like docs.add(eventSerializer.prepareInputDocument(event)) would be abstracted into indexer.load(event), and solrServer.add(docs) + solrServer.commit() would be abstracted into indexer.commitSolrTransaction().

Thoughts?

Aside from this suggestion, could you also do the following?
# Attach a .patch file that compiles instead of a jar
# Ensure indentation is consistent and kept to 2 lines
# How about some unit tests?

Regards,
Mike
                
> ApacheSolrSink
> --------------
>
>                 Key: FLUME-1687
>                 URL: https://issues.apache.org/jira/browse/FLUME-1687
>             Project: Flume
>          Issue Type: New Feature
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0, v1.4.0
>            Reporter: wolfgang hoschek
>            Assignee: Israel Ekpo
>         Attachments: flume-new-feature-dependencies.zip, flume-new-features-1.3.1.jar, flume-new-features-1.3.1-sources.jar
>
>
> Some use cases need near real time full text indexing of data through Flume into Solr, where a Flume sink can write directly to a Solr search server. This is a scalable way to provide low latency querying and data acquisition. It complements (rather than replaces) use cases based on Map Reduce batch analysis of HDFS data.
> Apache Solr has a client API that uses REST to add documents to a Solr server, which in turn is based on Lucene. A Solr Sink can extract documents from flume events and forward them to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira