You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2018/05/27 20:22:00 UTC

[jira] [Commented] (SOLR-12408) Introduce parallelShards() in Streaming Expressions

    [ https://issues.apache.org/jira/browse/SOLR-12408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492146#comment-16492146 ] 

Joel Bernstein commented on SOLR-12408:
---------------------------------------

I believe that the JSON facet api has a streaming mode for facets which does not work in distributed mode. The idea was to have Streaming Expressions perform the merge of the streaming facets. Does that scenario resolve what you are describing?

Or is there another high cardinality faceting use case that you have in mind?

> Introduce parallelShards() in Streaming Expressions
> ---------------------------------------------------
>
>                 Key: SOLR-12408
>                 URL: https://issues.apache.org/jira/browse/SOLR-12408
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: streaming expressions
>            Reporter: Mikhail Khludnev
>            Priority: Major
>
> {{parallel()}} uses hash filter partitioning, which doesn't work in some edge cases with high cardinality facets since they kill coordinator on merge phase. 
> I propose to introduce {{parallelShards()}} which will accepts a collection, and spawns per-shard substreams (I'm not sure wether to use {{distrib=false}} or {{shards=foo}}). So, far it's not clear whether {{workerCollection}} is useful for it at all.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org