You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Gerlowski (JIRA)" <ji...@apache.org> on 2015/12/29 04:01:49 UTC

[jira] [Updated] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

     [ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Gerlowski updated SOLR-7535:
----------------------------------
    Attachment: SOLR-7535.patch

After a bit of thought and a holiday break, I've got my first attempt at this ready for some feedback.

h5. Notes about this Patch
1.) No tests yet.  It does work (I tried it out manually), but it's getting close to the end of my night, and I wanted to get this out there on the off chance that someone has the time to take a look and give me some feedback before I sit back down to work on this again tomorrow evening.  But I am planning on adding tests to {{StreamExpressionTest}}, and {{StreamExpressionToExpessionTest}}.
2.) I didn't make any attempt to restrict the {{TupleStream}} implementations that {{UpdateStream}} can wrap.  Mainly because I didn't get around to it yet.  But also because, IMO, there are use cases where a user wouldn't need to use a {{SelectStream}} (for example, if they're doing field filtering in their initial Solr query/search() expression).  Happy to change this in a subsequent patch.  Just wanted to see what people thought.
3.) I kept my original tuple-to-input-doc mapping in tact.  It's limited, but as Joel mentioned, will probably do the job for a first pass.

h5. Questions about Surrounding Code
These aren't necessarily related to this JIRA/patch, but working on this patch made me think of a few questions that I couldn't figure out answers to on my own.

1.) Many of the {{TupleStream}} implementations require a collection to be explicitly stated as the first argument (i.e. {{search(gettingstarted...)}}.  However, the collection-name is already specified in the URL path (i.e. {{localhost:7574/solr/gettingstarted/stream?...}}).  Are these values ever allowed to be different?
2.) Many of the Stream Expressions are specified using a syntax that mixes named parameters (rows, sort, zkHost, etc.), and unnamed parameters ('collection' is probably the most common).  Are there any guidelines/logic around which parameters are named, and which are unnamed?  If I'm creating a new TupleStream type (as we are here), are there any guidelines on what the expression interface should look like?


Thanks in advance if anyone can help clarify some of those things for me.  Should be back online soon to revise this further. 

> Add UpdateStream to Streaming API and Streaming Expression
> ----------------------------------------------------------
>
>                 Key: SOLR-7535
>                 URL: https://issues.apache.org/jira/browse/SOLR-7535
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java, SolrJ
>            Reporter: Joel Bernstein
>            Priority: Minor
>         Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and streaming expressions. The UpdateStream will wrap a TupleStream and send the Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, merge and transform the streams and send the transformed data to another Solr Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org