You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/09/18 20:20:00 UTC

[jira] [Commented] (SOLR-14470) Add streaming expressions to /export handler

    [ https://issues.apache.org/jira/browse/SOLR-14470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198512#comment-17198512 ] 

ASF subversion and git services commented on SOLR-14470:
--------------------------------------------------------

Commit 1160216bfba491218fe45a644f9fda8b557c5b91 in lucene-solr's branch refs/heads/reference_impl_dev from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1160216 ]

SOLR-14470: Fix precommit


> Add streaming expressions to /export handler
> --------------------------------------------
>
>                 Key: SOLR-14470
>                 URL: https://issues.apache.org/jira/browse/SOLR-14470
>             Project: Solr
>          Issue Type: Improvement
>          Components: Export Writer, streaming expressions
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>             Fix For: 8.6
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Many streaming scenarios would greatly benefit from the ability to perform partial rollups (or other transformations) as early as possible, in order to minimize the amount of data that has to be sent from shards to the aggregating node.
> This can be implemented as a subset of streaming expressions that process the data directly inside each local {{ExportHandler}} and outputs only the records from the resulting stream. 
> Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is the case with {{Combiner}}, because the input data is processed in batches there would be no guarantee that only 1 record per unique sort values would be emitted - in fact, in most cases multiple partial aggregations would be emitted. Still, in many scenarios this would allow reducing the amount of data to be sent by several orders of magnitude.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org