You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Joel Bernstein (JIRA)" <ji...@apache.org> on 2016/12/29 15:40:58 UTC

[jira] [Issue Comment Deleted] (SOLR-9905) Add NullStream to isolate the performance of the ExportWriter

     [ https://issues.apache.org/jira/browse/SOLR-9905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Bernstein updated SOLR-9905:
---------------------------------
    Comment: was deleted

(was: The NullStream is a utility function to test the raw performance of the ExportWriter. This is a nice utility to have to diagnose bottlenecks in streaming MapReduce operations. The NullStream will allow developers to test the performance of the shuffling (Sorting, Partitioning, Exporting) in isolation from the reduce operation (Rollup, Join, Group, etc..). 

The NullStream simply iterates it's internal stream and eats the tuples. It returns a single Tuple from each worker with the number of Tuples processed. The idea is to iterate the stream without additional overhead so the performance of the underlying stream can be isolated.

Sample syntax:
{code}
parallel(collection2, workers=7, sort="count desc", 
      null(search(collection1, 
                   q=*:*, 
                   fl="id", 
                   sort="id desc", 
                   qt="/export", 
                   wt="javabin", 
                   partitionKeys=id)))
{code}

In the example above the NullStream is sent to 7 workers. Each worker will iterate the search() expression and the NullStream will eat the tuples so the raw performance of the search() can be understood.)

> Add NullStream to isolate the performance of the ExportWriter
> -------------------------------------------------------------
>
>                 Key: SOLR-9905
>                 URL: https://issues.apache.org/jira/browse/SOLR-9905
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>
> The NullStream is a utility function to test the raw performance of the ExportWriter. This is a nice utility to have to diagnose bottlenecks in streaming MapReduce operations. The NullStream will allow developers to test the performance of the shuffling (Sorting, Partitioning, Exporting) in isolation from the reduce operation (Rollup, Join, Group, etc..). 
> The NullStream simply iterates it's internal stream and eats the tuples. It returns a single Tuple from each worker with the number of Tuples processed. The idea is to iterate the stream without additional overhead so the performance of the underlying stream can be isolated.
> Sample syntax:
> {code}
> parallel(collection2, workers=7, sort="count desc", 
>       null(search(collection1, 
>                    q=*:*, 
>                    fl="id", 
>                    sort="id desc", 
>                    qt="/export", 
>                    wt="javabin", 
>                    partitionKeys=id)))
> {code}
> In the example above the NullStream is sent to 7 workers. Each worker will iterate the search() expression and the NullStream will eat the tuples so the raw performance of the search() can be understood.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org