You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by jb...@apache.org on 2019/04/08 15:47:32 UTC

[lucene-solr] branch SOLR-13105-visual updated: SOLR-13105: First commit

This is an automated email from the ASF dual-hosted git repository.

jbernste pushed a commit to branch SOLR-13105-visual
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/SOLR-13105-visual by this push:
     new dc13c03  SOLR-13105: First commit
dc13c03 is described below

commit dc13c03a87f1c002342dc3bf69722b1c65f81e46
Author: Joel Bernstein <jb...@apache.org>
AuthorDate: Mon Apr 8 11:47:13 2019 -0400

    SOLR-13105: First commit
---
 solr/solr-ref-guide/src/streaming-expressions.adoc | 89 ++++++++--------------
 1 file changed, 33 insertions(+), 56 deletions(-)

diff --git a/solr/solr-ref-guide/src/streaming-expressions.adoc b/solr/solr-ref-guide/src/streaming-expressions.adoc
index c1c0404..a1a21d0 100644
--- a/solr/solr-ref-guide/src/streaming-expressions.adoc
+++ b/solr/solr-ref-guide/src/streaming-expressions.adoc
@@ -1,5 +1,5 @@
 = Streaming Expressions
-:page-children: stream-source-reference, stream-decorator-reference, stream-evaluator-reference, math-expressions, graph-traversal
+:page-children: visualization, stream-source-reference, stream-decorator-reference, stream-evaluator-reference, math-expressions, graph-traversal
 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
@@ -17,39 +17,26 @@
 // specific language governing permissions and limitations
 // under the License.
 
-Streaming Expressions provide a simple yet powerful stream processing language for Solr Cloud.
+Streaming Expressions exposes the capabilities of Solr Cloud as composable functions. Many of the existing capabilities of the search
+engine such as searching and faceting are available as functions and many new capabilities have been added to search in different
+ways and transform, analyze and visualize the results.
 
-Streaming expressions are a suite of functions that can be combined to perform many different parallel computing tasks. These functions are the basis for the <<parallel-sql-interface.adoc#parallel-sql-interface,Parallel SQL Interface>>.
+At a high level there a four main capabilities that will be explored in the documentation:
 
-There is a growing library of functions that can be combined to implement:
+* *Searching*, sampling and aggregating results from Solr.
 
-* Request/response stream processing
-* Batch stream processing
-* Fast interactive MapReduce
-* Aggregations (Both pushed down faceted and shuffling MapReduce)
-* Parallel relational algebra (distributed joins, intersections, unions, complements)
-* Publish/subscribe messaging
-* Distributed graph traversal
-* Machine learning and parallel iterative model training
-* Anomaly detection
-* Recommendation systems
-* Retrieve and rank services
-* Text classification and feature extraction
-* Streaming NLP
-* Statistical Programming
+* *Transforming* result sets after they are retrieved from Solr.
 
-Streams from outside systems can be joined with streams originating from Solr and users can add their own stream functions by following Solr's {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/io/stream/package-summary.html[Java streaming API].
+* *Analyzing* and modeling result sets using probability and statistics and machine learning libraries.
+
+* *Visualizing* result sets, aggregations and statistical models of the data.
 
-[IMPORTANT]
-====
-Both streaming expressions and the streaming API are considered experimental, and the APIs are subject to change.
-====
 
 == Stream Language Basics
 
 Streaming Expressions are comprised of streaming functions which work with a Solr collection. They emit a stream of tuples (key/value Maps).
 
-Many of the provided streaming functions are designed to work with entire result sets rather than the top N results like normal search. This is supported by the <<exporting-result-sets.adoc#exporting-result-sets,/export handler>>.
+Some of the provided streaming functions are designed to work with entire result sets rather than the top N results like normal search. This is supported by the <<exporting-result-sets.adoc#exporting-result-sets,/export handler>>.
 
 Some streaming functions act as stream sources to originate the stream flow. Other streaming functions act as stream decorators to wrap other stream functions and perform operations on the stream of tuples. Many streams functions can be parallelized across a worker collection. This can be particularly powerful for relational algebra functions.
 
@@ -64,8 +51,7 @@ The `/stream` request handler takes one parameter, `expr`, which is used to spec
 curl --data-urlencode 'expr=search(enron_emails,
                                    q="from:1800flowers*",
                                    fl="from, to",
-                                   sort="from asc",
-                                   qt="/export")' http://localhost:8983/solr/enron_emails/stream
+                                   sort="from asc")' http://localhost:8983/solr/enron_emails/stream
 ----
 
 Details of the parameters for each function are included below.
@@ -95,49 +81,40 @@ For the above example the `/stream` handler responded with the following JSON re
 
 Note the last tuple in the above example stream is `{"EOF":true,"RESPONSE_TIME":33}`. The `EOF` indicates the end of the stream. To process the JSON response, you'll need to use a streaming JSON implementation because streaming expressions are designed to return the entire result set which may have millions of records. In your JSON client you'll need to iterate each doc (tuple) and check for the EOF tuple to determine the end of stream.
 
-The {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/io/package-summary.html[`org.apache.solr.client.solrj.io`] package provides Java classes that compile streaming expressions into streaming API objects. These classes can be used to execute streaming expressions from inside a Java application. For example:
 
-[source,java]
-----
-    StreamFactory streamFactory = new DefaultStreamFactory().withCollectionZkHost("collection1", zkServer.getZkAddress());
-    InjectionDefense defense = new InjectionDefense("parallel(collection1, group(search(collection1, q=\"*:*\", fl=\"id,a_s,a_i,a_f\", sort=\"a_s asc,a_f asc\", partitionKeys=\"a_s\"), by=\"a_s asc\"), workers=\"2\", zkHost=\"?$?\", sort=\"a_s asc\")");
-    defense.addParameter(zkhost);
-    ParallelStream pstream = (ParallelStream)streamFactory.constructStream(defense.safeExpressionString());
-----
+== Elements of the Lanaguage
 
-Note that InjectionDefense need only be used if the string being inserted could contain user supplied data. See the
-javadoc for `InjectionDefense` for usage details and SOLR-12891 for an example of the potential risks.
-Also note that for security reasons normal parameter substitution no longer applies to the expr parameter
-unless the jvm has been started with `-DStreamingExpressionMacros=true` (usually via `solr.in.sh`)
+=== Stream Sources
 
-=== Data Requirements
+Stream sources originate streams. There are rich set of searching, sampling and aggregation stream sources to choose from.
 
-Because streaming expressions relies on the `/export` handler, many of the field and field type requirements to use `/export` are also requirements for `/stream`, particularly for `sort` and `fl` parameters. Please see the section <<exporting-result-sets.adoc#exporting-result-sets,Exporting Result Sets>> for details.
+A full reference to all available source expressions is available in <<stream-source-reference.adoc#stream-source-reference,Stream Source Reference>>.
 
-== Types of Streaming Expressions
+=== Stream Decorators
 
-=== About Stream Sources
+Stream decorators wrap stream sources and other stream decorators to transform a stream.
 
-Stream sources originate streams. The most commonly used one of these is `search`, which does a query.
+A full reference to all available decorator expressions is available in <<stream-decorator-reference.adoc#stream-decorator-reference,Stream Decorator Reference>>.
 
-A full reference to all available source expressions is available in <<stream-source-reference.adoc#stream-source-reference,Stream Source Reference>>.
+=== Math Expressions
 
-=== About Stream Decorators
-Stream decorators wrap other stream functions or perform operations on a stream.
+Math expressions are a vector and matrix math library that can be combined with streaming expressions to perform analysis and build mathematical models
+of the result sets. From a language standpoint Math Expressions are a sub-langauge of streaming expressions that don't return streams of tuples. Instead
+they operate on and return numbers, vectors, matrices and mathematical models. The documentation will show how to combine Streaming Expressions and Math
+Expressions.
 
-A full reference to all available decorator expressions is available in <<stream-decorator-reference.adoc#stream-decorator-reference,Stream Decorator Reference>>.
+The Math Expressions user guide is available in <<>>
 
-=== About Stream Evaluators
+From a language standpoint Math Expressions are referred to as Stream Evaluators.
 
-Stream Evaluators can be used to evaluate (calculate) new values based on other values in a tuple. That newly evaluated value can be put into the tuple (as part of a `select(...)` clause), used to filter streams (as part of a `having(...)` clause), and for other things. Evaluators can contain field names, raw values, or other evaluators, giving you the ability to create complex evaluation logic, including conditional if/then choices.
+A full reference to all available evaluator expressions is available in <<stream-evaluator-reference.adoc#stream-evaluator-reference,Stream Evaluator Reference>>.
 
-In cases where you want to use raw values as part of an evaluation you will need to consider the order of how evaluators are parsed.
+=== Visualization
 
-1.  If the parameter can be parsed into a valid number, then it is considered a number. For example, `add(3,4.5)`
-2.  If the parameter can be parsed into a valid boolean, then it is considered a boolean. For example, `eq(true,false)`
-3.  If the parameter can be parsed into a valid evaluator, then it is considered an evaluator. For example, `eq(add(10,4),add(7,7))`
-4.  The parameter is considered a field name, even if it quoted. For example, `eq(fieldA,"fieldB")`
 
-If you wish to use a raw string as part of an evaluation, you will want to consider using the `raw(string)` evaluator. This will always return the raw value, no matter what is entered.
+Visualization of both Streaming Expressions and Math Expressions is done using Apache Zeppelin and the Zeppelin-Solr Interpreter.
+
+Visualizing Streaming expressions and setting up of Apache Zeppeling is documented in<<>>
+
+The Math Expressions user guide has in depth coverage of visualization techniques.
 
-A full reference to all available evaluator expressions is available in <<stream-evaluator-reference.adoc#stream-evaluator-reference,Stream Evaluator Reference>>.