You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2016/06/21 16:41:58 UTC
[jira] [Updated] (SOLR-9240) Add the partitionKeys parameter to the topic() Streaming Expression

     [ https://issues.apache.org/jira/browse/SOLR-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Bernstein updated SOLR-9240:
---------------------------------
    Description: 
Currently the topic() function doesn't accept a partitionKeys parameter like the search() function does. This means the topic() function can't be wrapped by the parallel() function to run across worker nodes.

It would be useful to support parallelizing the topic function because it would provide a general purpose parallelized approach for processing batches of data as they enter the index.

For example this would allow a classify() function to be wrapped around a topic() function to classify documents in parallel across worker nodes. 

Sample syntax:

{code}
parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
{code}

The example above would send a daemon to worker nodes that would classify all new documents returned by the topic() function. The update function would send the output of classify() to a SolrCloud collection for indexing.

The partitionKeys parameter would ensure that each worker would receive a partition of the results returned by the topic() function. This allows the classify() function to be run in parallel.






  was:
Currently the topic() function doesn't accept a partitionKeys parameter like the search() function does. This means the topic() function can't be wrapped by the parallel() function to run across worker nodes.

It would be useful to support parallelizing the topic function because it would provide a general purpose parallelized approach for processing batches of data as they enter the index.

For example this would allow a classify() function to be wrapped around a topic() function to classify documents in parallel across worker nodes. 

Sample syntax:

{code}
parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
{code}

The example above would send a daemon out to worker nodes that would classify all new documents returned by the topic() function. The update function would send the output of classify() to a SolrCloud collection for indexing.







> Add the partitionKeys parameter to the topic() Streaming Expression
> -------------------------------------------------------------------
>
>                 Key: SOLR-9240
>                 URL: https://issues.apache.org/jira/browse/SOLR-9240
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Joel Bernstein
>
> Currently the topic() function doesn't accept a partitionKeys parameter like the search() function does. This means the topic() function can't be wrapped by the parallel() function to run across worker nodes.
> It would be useful to support parallelizing the topic function because it would provide a general purpose parallelized approach for processing batches of data as they enter the index.
> For example this would allow a classify() function to be wrapped around a topic() function to classify documents in parallel across worker nodes. 
> Sample syntax:
> {code}
> parallel(daemon(update(classify(topic(..., partitionKeys="id")))))
> {code}
> The example above would send a daemon to worker nodes that would classify all new documents returned by the topic() function. The update function would send the output of classify() to a SolrCloud collection for indexing.
> The partitionKeys parameter would ensure that each worker would receive a partition of the results returned by the topic() function. This allows the classify() function to be run in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org