You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ct...@apache.org on 2021/01/13 22:39:29 UTC

[lucene-solr] branch jira/solr-13105-toMerge updated: Last (almost) round of copy edit review

This is an automated email from the ASF dual-hosted git repository.

ctargett pushed a commit to branch jira/solr-13105-toMerge
in repository https://gitbox.apache.org/repos/asf/lucene-solr.git


The following commit(s) were added to refs/heads/jira/solr-13105-toMerge by this push:
     new 55e259b  Last (almost) round of copy edit review
55e259b is described below

commit 55e259bf00d4389f77b531c2cc5e4733d9853cbd
Author: Cassandra Targett <ct...@apache.org>
AuthorDate: Wed Jan 13 16:38:39 2021 -0600

    Last (almost) round of copy edit review
---
 .../solr-ref-guide/src/computational-geometry.adoc |  32 ++--
 solr/solr-ref-guide/src/curve-fitting.adoc         |  20 +-
 solr/solr-ref-guide/src/dsp.adoc                   |  23 +--
 solr/solr-ref-guide/src/loading.adoc               |   2 +-
 solr/solr-ref-guide/src/logs.adoc                  | 204 ++++++++++-----------
 solr/solr-ref-guide/src/machine-learning.adoc      | 189 +++++++++----------
 solr/solr-ref-guide/src/math-expressions.adoc      |   2 +-
 solr/solr-ref-guide/src/scalar-math.adoc           |   2 +-
 solr/solr-ref-guide/src/statistics.adoc            |  30 +--
 solr/solr-ref-guide/src/transform.adoc             |   2 +-
 10 files changed, 235 insertions(+), 271 deletions(-)

diff --git a/solr/solr-ref-guide/src/computational-geometry.adoc b/solr/solr-ref-guide/src/computational-geometry.adoc
index 4853a99..50d7ac6 100644
--- a/solr/solr-ref-guide/src/computational-geometry.adoc
+++ b/solr/solr-ref-guide/src/computational-geometry.adoc
@@ -32,15 +32,15 @@ set of 2D points. Border visualizations can be useful for understanding where da
 in relation to the border.
 
 In the examples below the `convexHull` function is used
-to visualize a boarder for a set of latitude and longitude points of rat sightings in the nyc311
-complaints database. An investigation of the boarder around the rat sightings can be done
+to visualize a border for a set of latitude and longitude points of rat sightings in the NYC311
+complaints database. An investigation of the border around the rat sightings can be done
 to better understand how rats may be entering or exiting the specific region.
 
 ==== Scatter Plot
 
 Before visualizing the convex hull its often useful to visualize the 2D points as a scatter plot.
 
-In this example the `random` function draws a sample of records from the nyc311 (complaints database) collection where
+In this example the `random` function draws a sample of records from the NYC311 (complaints database) collection where
 the complaint description matches "rat sighting" and the zip code is 11238. The latitude and longitude fields
 are then vectorized and plotted as a scatter plot with longitude on x-axis and latitude on the
 y-axis.
@@ -51,13 +51,13 @@ Notice from the scatter plot that many of the points appear to lie near the bord
 
 ==== Convex Hull Plot
 
-The `convexHull` function cam be used to visualize the boarder. The example uses the same points
-drawn from the nyc311 database. But instead of plotting the points directly the latitude and
+The `convexHull` function can be used to visualize the border. The example uses the same points
+drawn from the NYC311 database. But instead of plotting the points directly the latitude and
 longitude points are added as rows to a matrix. The matrix is then transposed with `transpose`
 function so that each row of the matrix contains a single latitude and longitude point.
 
-The `convexHull` function is then used calculate the convex hull for the matrix of points. The
-convex hull is set a variable called *hull*
+The `convexHull` function is then used calculate the convex hull for the matrix of points.
+The convex hull is set a variable called `hull`.
 
 Once the convex hull has been created the `getVertices` function can be used to
 retrieve the matrix of points in the scatter plot that comprise the convex border around the scatter plot.
@@ -73,10 +73,10 @@ convex hull.
 ==== Projecting and Clustering
 
 The once a convex hull as been calculated the `projectToBorder` can then be used to project
-points to the nearest point on the boarder. In the example below the `projectToBorder` function
+points to the nearest point on the border. In the example below the `projectToBorder` function
 is used to project the original scatter scatter plot points to the nearest border.
 
-The `projectToBorder` function returns a matrix of lat, lon points for the border projections. In
+The `projectToBorder` function returns a matrix of lat/lon points for the border projections. In
 the example the matrix of border points is then clustered into 7 clusters using kmeans clustering.
 The `zplot` function is then used to plot the clustered border points.
 
@@ -84,14 +84,14 @@ image::images/math-expressions/convex1.png[]
 
 Notice in the visualization its easy to see which spots along the border have the highest
 density of points. In the case or the rat sightings this information is useful in understanding
-which boarder points are closest for the rats to enter or exit from.
+which border points are closest for the rats to enter or exit from.
 
 ==== Plotting the Centroids
 
-Once the boarder points have been clustered its very easy to extract the centroids of the clusters
+Once the border points have been clustered its very easy to extract the centroids of the clusters
 and plot them on a map. The example below extracts the centroids from the clusters using the
-`getCentroids` function. `getCentroids` returns the matrix of lat, lon points which represent
-the centroids of border clusters. The `colAt` function can then be used to extract the lat, lon
+`getCentroids` function. `getCentroids` returns the matrix of lat/lon points which represent
+the centroids of border clusters. The `colAt` function can then be used to extract the lat/lon
 vectors so they can be plotted on a map using `zplot`.
 
 image::images/math-expressions/convex2.png[]
@@ -110,11 +110,11 @@ In the example below an enclosing disk is calculated for a randomly generated se
 
 Then the following functions are called on the enclosing disk:
 
--`getCenter`: Returns the 2D point that is the center of the disk.
+* `getCenter`: Returns the 2D point that is the center of the disk.
 
--`getRadius`: Returns the radius of the disk.
+* `getRadius`: Returns the radius of the disk.
 
--`getSupportPoints`: Returns the support points of the disk.
+* `getSupportPoints`: Returns the support points of the disk.
 
 [source,text]
 ----
diff --git a/solr/solr-ref-guide/src/curve-fitting.adoc b/solr/solr-ref-guide/src/curve-fitting.adoc
index b5613b0..966e888 100644
--- a/solr/solr-ref-guide/src/curve-fitting.adoc
+++ b/solr/solr-ref-guide/src/curve-fitting.adoc
@@ -23,7 +23,7 @@ These functions support constructing a curve through bivariate non-linear data.
 The `polyfit` function is a general purpose curve fitter used to model
 the non-linear relationship between two random variables.
 
-The `polyfit` function is passed *x* and *y* axes and fits a smooth curve to the data.
+The `polyfit` function is passed x- and y-axes and fits a smooth curve to the data.
 If only a single array is provided it is treated as the y-axis and a sequence is generated
 for the x-axis. A third parameter can be added that specifies the degree of the polynomial. If the degree is
 not provided a 3 degree polynomial is used by default. The higher
@@ -38,9 +38,9 @@ are the predicted curve.
 
 image::images/math-expressions/polyfit.png[]
 
-In the example above a random sample containing two fields, *filesize_d*
-and *response_d*, is drawn from the *logs* collection.
-The two fields are vectorized and set to the variables *x* and *y*.
+In the example above a random sample containing two fields, `filesize_d`
+and `response_d`, is drawn from the `logs` collection.
+The two fields are vectorized and set to the variables `x` and `y`.
 
 Then the `polyfit` function is used to fit a non-linear model to the data using a 5 degree
 polynomial. The `polyfit` function returns a model that is then directly plotted
@@ -50,7 +50,6 @@ The fitted model can also be used
 by the `predict` function in the same manner as linear regression. The example below
 uses the fitted model to predict a response time for a file size of 42000.
 
-
 image::images/math-expressions/polyfit-predict.png[]
 
 If an array of predictor values is provided an array of predictions will be returned.
@@ -73,17 +72,16 @@ image::images/math-expressions/polyfit-resid.png[]
 == Gaussian Curve Fitting
 
 The `gaussfit` function fits a smooth curve through a Gaussian peak. The `gaussfit`
-function takes an *x* and y-axis and fits a smooth gaussian curve to the data. If
+function takes an x- and y-axis and fits a smooth gaussian curve to the data. If
 only one vector of numbers is passed, `gaussfit` will treat it as the y-axis
-and will generate a sequence for the *x* access.
+and will generate a sequence for the x-axis.
 
 One of the interesting use cases for `gaussfit` is to visualize how well a regression
 model's residuals fit a normal distribution.
 
-One of the characteristics of a well
-fit regression model is that its residuals will ideally fit a normal distribution. We can
-test this by building a histogram of the residuals and then fitting a gaussian curve to the
-curve of the histogram.
+One of the characteristics of a well-fit regression model is that its residuals will ideally fit a normal distribution.
+We can
+test this by building a histogram of the residuals and then fitting a gaussian curve to the curve of the histogram.
 
 In the example below the residuals from a `polyfit` regression are modeled with the
 `hist` function to return a histogram with 32 bins. The `hist` function returns
diff --git a/solr/solr-ref-guide/src/dsp.adoc b/solr/solr-ref-guide/src/dsp.adoc
index 22d59aa..50923e8 100644
--- a/solr/solr-ref-guide/src/dsp.adoc
+++ b/solr/solr-ref-guide/src/dsp.adoc
@@ -28,7 +28,7 @@ The dot products are collected in a third vector which is the convolution of the
 
 === Moving Average Function
 
-Before looking at an example of convolution its useful to review the `movingAvg` function. The moving average
+Before looking at an example of convolution it's useful to review the `movingAvg` function. The moving average
 function computes a moving average by sliding a window across a vector and computing
 the average of the window at each shift. If that sounds similar to convolution, that's because the `movingAvg`
 function involves a sliding window approach similar to convolution.
@@ -36,7 +36,7 @@ function involves a sliding window approach similar to convolution.
 Below is an example of a moving average with a window size of 5. Notice that the original vector has 13 elements
 but the result of the moving average has only 9 elements. This is because the `movingAvg` function
 only begins generating results when it has a full window. The `ltrim` function is used to trim the
-first four elements from the original *y* array to line up with the moving average.
+first four elements from the original `y` array to line up with the moving average.
 
 image::images/math-expressions/conv1.png[]
 
@@ -70,7 +70,7 @@ The formula for computing a simple moving average using convolution is to make t
 size and make the values of the filter all the same and sum to 1. A moving average with a window size of 4
 can be computed by changing the filter to a length of 4 with each value being .25.
 
-*Changing the Weights*
+==== Changing the Weights
 
 The filter, which is sometimes called the *kernel*, can be viewed as a vector of weights. In the initial
 example all values in the filter have the same weight (.2). The weights in the filter can be changed to
@@ -125,8 +125,7 @@ image::images/math-expressions/delay.png[]
 
 The `oscillate` function generates a periodic oscillating signal which can be used to model and study sine waves.
 
-The `oscillate` function takes three parameters: *amplitude*, *angular frequency*
-and *phase* and returns a vector containing the y-axis points of a sine wave.
+The `oscillate` function takes three parameters: `amplitude`, `angular frequency`, and `phase` and returns a vector containing the y-axis points of a sine wave.
 
 The y-axis points were generated from an x-axis sequence of 0-127.
 
@@ -136,13 +135,11 @@ Below is an example of the `oscillate` function called with an amplitude of
 
 image::images/math-expressions/sinewave.png[]
 
-=== Sine Wave Interpolation, Extrapolation
+=== Sine Wave Interpolation & Extrapolation
 
 The `oscillate` function returns a function which can be used by the `predict` function to interpolate or extrapolate a sine wave.
-The example below extrapolates the sine wave to an x-axis sequence of 0-256.
-
 
-The extrapolated sine wave is plotted below:
+The example below extrapolates the sine wave to an x-axis sequence of 0-256.
 
 image::images/math-expressions/sinewave256.png[]
 
@@ -216,11 +213,11 @@ image::images/math-expressions/hidden-signal-autocorrelation.png[]
 
 == Discrete Fourier Transform
 
-The convolution based functions described above are operating on signals in the time domain. In the time
-domain the X axis is time and the Y axis is the quantity of some value at a specific point in time.
+The convolution-based functions described above are operating on signals in the time domain. In the time
+domain the x-axis is time and the y-axis is the quantity of some value at a specific point in time.
 
 The discrete Fourier Transform translates a time domain signal into the frequency domain.
-In the frequency domain the X axis is frequency, and Y axis is the accumulated power at a specific frequency.
+In the frequency domain the x-axis is frequency, and y-axis is the accumulated power at a specific frequency.
 
 The basic principle is that every time domain signal is composed of one or more signals (sine waves)
 at different frequencies. The discrete Fourier transform decomposes a time domain signal into its component
@@ -247,7 +244,7 @@ The `rowAt` function can be used to access the rows so they can be processed as
 In the first example the `fft` function is called on the sine wave used in the autocorrelation example.
 
 The results of the `fft` function is a matrix. The `rowAt` function is used to return the first row of
-the matrix which is a vector containing the real values of the fft response.
+the matrix which is a vector containing the real values of the `fft` response.
 
 The plot of the real values of the `fft` response is shown below. Notice there are two
 peaks on opposite sides of the plot. The plot is actually showing a mirrored response. The right side
diff --git a/solr/solr-ref-guide/src/loading.adoc b/solr/solr-ref-guide/src/loading.adoc
index 6b10b4b..3017eba 100644
--- a/solr/solr-ref-guide/src/loading.adoc
+++ b/solr/solr-ref-guide/src/loading.adoc
@@ -535,7 +535,7 @@ When this expression is sent to the `/stream` handler it responds with:
 }
 ----
 
-The example below shows the `cartesianProduct` function expanding the analyzed terms in the *term_s* field into
+The example below shows the `cartesianProduct` function expanding the analyzed terms in the `term_s` field into
 their own documents. Notice that the other fields from the document are maintained with each term. This allows each term
 to be indexed in a separate document so the relationships between terms and the other fields can be explored through
 graph expressions or aggregations.
diff --git a/solr/solr-ref-guide/src/logs.adoc b/solr/solr-ref-guide/src/logs.adoc
index 581a32e..af5ec6b 100644
--- a/solr/solr-ref-guide/src/logs.adoc
+++ b/solr/solr-ref-guide/src/logs.adoc
@@ -18,34 +18,33 @@
 
 This section of the user guide provides an introduction to Solr log analytics.
 
-NOTE: This is an appendix of the <<math-expressions.adoc#streaming-Expressions-and-math-expressions,Visual Guide to Streaming Expressions and Math Expressions>>. All the functions described below are convered in detail in the guide.
+NOTE: This is an appendix of the <<math-expressions.adoc#streaming-Expressions-and-math-expressions,Visual Guide to Streaming Expressions and Math Expressions>>. All the functions described below are covered in detail in the guide.
 See the <<math-start.adoc#math-start,Getting Started>> chapter to learn how to get started with visualizations and Apache Zeppelin.
 
 == Loading
 
-The out-of-the-box Solr log format can be loaded into a Solr index using the *postlogs* command line tool
-located in the *bin* directory of the Solr distribution.
+The out-of-the-box Solr log format can be loaded into a Solr index using the `bin/postlogs` command line tool
+located in the `bin/` directory of the Solr distribution.
 
 NOTE: If working from the source distribution the
-distribution must first be built before postlogs can be run.
+distribution must first be built before `postlogs` can be run.
 
-The *postlogs* script is designed to be run from the root directory of the Solr distribution.
+The `postlogs` script is designed to be run from the root directory of the Solr distribution.
 
-The *postlogs* script takes two parameters:
+The `postlogs` script takes two parameters:
 
-* Base URL (with collection): Example http://localhost:8983/solr/logs
+* Solr base URL (with collection): `http://localhost:8983/solr/logs`
 * File path to root of the logs directory: All files found under this directory (including sub-directories) will be indexed.
 If the path points to a single log file only that log file will be loaded.
 
-Below is a sample execution of the *postlogs* tool:
+Below is a sample execution of the `postlogs` tool:
 
 [source,text]
 ----
 ./bin/postlogs http://localhost:8983/solr/logs /var/logs/solrlogs
 ----
 
-The example above will index all the log files under /var/logs/solrlogs to the *logs* collection
-found at the base url http://localhost:8983/solr.
+The example above will index all the log files under `/var/logs/solrlogs` to the `logs` collection found at the base url `http://localhost:8983/solr`.
 
 == Exploring
 
@@ -56,12 +55,12 @@ covered in the logs, what shards and cores are in those collections and the type
 performed on those collections.
 
 Even with familiar Solr installations exploration is still extremely
-important while trouble shooting because it will often turn up surprises such as unknown errors or
+important while troubleshooting because it will often turn up surprises such as unknown errors or
 unexpected admin or indexing operations.
 
 === Sampling
 
-The first step in exploration is to take a random sample from the *logs* collection
+The first step in exploration is to take a random sample from the `logs` collection
 with the `random` function.
 
 In the example below the `random` function is run with one
@@ -70,7 +69,7 @@ parameter which is the name of the collection to sample.
 image::images/math-expressions/logs-sample.png[]
 
 The sample contains 500 random records with the their full field list. By looking
-at this sample we can quickly learn about the *fields* available in the *logs* collection.
+at this sample we can quickly learn about the *fields* available in the `logs` collection.
 
 === Time Period
 
@@ -127,14 +126,14 @@ Then a burst of log activity occurs from minute 27 to minute 52.
 
 This is then followed by a large spike of log activity.
 
-The example below breaks this down further by adding a query on the *type_s* field to only
+The example below breaks this down further by adding a query on the `type_s` field to only
 visualize *query* activity in the log.
 
 
 image::images/math-expressions/logs-time-series2.png[]
 
 Notice the query activity accounts for more then half of the burst of log records between
-minute 27 and minute 52. But the query activity does not account for the large spike in
+21:27 and 21:52. But the query activity does not account for the large spike in
 log activity that follows.
 
 We can account for that spike by changing the search to include only *update*, *commit*,
@@ -163,14 +162,14 @@ for specific types of query records.
 
 === Top Level Queries
 
-To find all the top level queries in the logs, add a query to limit results to log records with *distrib_s:true* as follows:
+To find all the top level queries in the logs, add a query to limit results to log records with `distrib_s:true` as follows:
 
 image::images/math-expressions/query-top-level.png[]
 
 
 === Shard Level Queries
 
-To find all the shard level queries that are not IDs queries, adjust the query to limit results to logs with *distrib_s:false AND ids_s:false*
+To find all the shard level queries that are not IDs queries, adjust the query to limit results to logs with `distrib_s:false AND ids_s:false`
 as follows:
 
 image::images/math-expressions/query-shard-level.png[]
@@ -178,7 +177,7 @@ image::images/math-expressions/query-shard-level.png[]
 
 === ID Queries
 
-To find all the *ids* queries, adjust the query to limit results to logs with *distrib_s:false AND ids_s:true*
+To find all the *ids* queries, adjust the query to limit results to logs with `distrib_s:false AND ids_s:true`
 as follows:
 
 image::images/math-expressions/query-ids.png[]
@@ -186,32 +185,31 @@ image::images/math-expressions/query-ids.png[]
 
 == Query Performance
 
-One of the important tasks of Solr log analytics is understanding how well a Solr cluster
-is performing.
+One of the important tasks of Solr log analytics is understanding how well a Solr cluster is performing.
 
-The *qtime_i* field contains the query time (QTime) in millis
-from the log records. There are number of powerful visualizations
- and statistical approaches for analyzing query performance.
+The `qtime_i` field contains the query time (QTime) in milliseconds
+from the log records.
+There are number of powerful visualizations and statistical approaches for analyzing query performance.
 
 
 === QTime Scatter Plot
 
-Scatter plots can be used to visualize random samples of the *qtime_i*
-field. The example below demonstrates a scatter plot of 500 random samples
-from the *ptest1* collection of log records.
+Scatter plots can be used to visualize random samples of the `qtime_i`
+field.
+The example below demonstrates a scatter plot of 500 random samples
+from the `ptest1` collection of log records.
 
-In this example, *qtime_i* is plotted on the *y-axis* and the *x-axis* is simply a sequence
-to spread the query times out across the plot.
+In this example, `qtime_i` is plotted on the y-axis and the x-axis is simply a sequence to spread the query times out across the plot.
 
-NOTE: The *x* field is included in the field list. The `random` function automatically
-generates a sequence for the x-axis when x is included in the field list.
+NOTE: The `x` field is included in the field list.
+The `random` function automatically generates a sequence for the x-axis when `x` is included in the field list.
 
 image::images/math-expressions/qtime-scatter.png[]
 
 From this scatter plot we can tell a number of important things about the query times:
 
 * The sample query times range from a low of 122 to a high of 643.
-* The mean appears to be just above 400 millis.
+* The mean appears to be just above 400 ms.
 * The query times tend to cluster closer to the mean and become less frequent as they move away
 from the mean.
 
@@ -219,21 +217,18 @@ from the mean.
 === Highest QTime Scatter Plot
 
 It's often useful to be able to visualize the highest query times recorded in the log data.
-This can be done by using the `search` function and sorting on *qtime_i desc*.
+This can be done by using the `search` function and sorting on `qtime_i desc`.
 
-In the example below the `search` function returns the highest 500 query times from the *ptest1*
-collection and sets the results to the variable *a*. Then the `col` function is used to extract
-the `qtime_i` column from the result set into a vector, which is set to variable *y*.
+In the example below the `search` function returns the highest 500 query times from the `ptest1` collection and sets the results to the variable `a`.
+Then the `col` function is used to extract the `qtime_i` column from the result set into a vector, which is set to variable `y`.
 
-Then the `zplot` function is used plot the query times on the *y-axis* of the scatter plot.
+Then the `zplot` function is used plot the query times on the y-axis of the scatter plot.
 
-NOTE: The `rev` function is used to reverse the query times vector so the visualization
-displays from lowest to highest query times.
+NOTE: The `rev` function is used to reverse the query times vector so the visualization displays from lowest to highest query times.
 
 image::images/math-expressions/qtime-highest-scatter.png[]
 
-From this plot we can see that the 500 highest query times start at 510
-millis and slowly move higher, until the last 10 spike upwards, culminating at the highest query time of 2529 millis.
+From this plot we can see that the 500 highest query times start at 510ms and slowly move higher, until the last 10 spike upwards, culminating at the highest query time of 2529ms.
 
 
 === QTime Distribution
@@ -241,145 +236,136 @@ millis and slowly move higher, until the last 10 spike upwards, culminating at t
 In this example a visualization is created which shows the
 distribution of query times rounded to the nearest second.
 
-The example below starts by taking a random sample of 10000 log records with a *type_s* of *query*.
-The results of the `random` function are assigned to the variable *a*.
+The example below starts by taking a random sample of 10000 log records with a `type_s`* of `query`.
+The results of the `random` function are assigned to the variable `a`.
 
-The `col` function is then used extract the *qtime_i* field from the results. The vector
-of query times is set to variable *b*.
+The `col` function is then used extract the `qtime_i` field from the results.
+The vector of query times is set to variable `b`.
 
 The `scalarDivide` function is then used to divide all elements of the query time vector by 1000.
-This converts the query times from milli-seconds to seconds. The result is set to variable
-*c*.
+This converts the query times from milli-seconds to seconds.
+The result is set to variable `c`.
 
 The `round` function then rounds all elements of the query times vector to the nearest second.
-This means all query times less than 500 millis will round to 0.
+This means all query times less than 500ms will round to 0.
 
 The `freqTable` function is then applied to the vector of query times rounded to
 the nearest second.
 
 The resulting frequency table is shown in the visualization below.
-The *x-axis* is the number of seconds. The *y-axis* is the number of query times
-that rounded to each second.
+The x-axis is the number of seconds.
+The y-axis is the number of query times that rounded to each second.
 
 image::images/math-expressions/qtime-dist.png[]
 
-Notice that roughly 93 percent of the query times rounded to 0, meaning they were under
-500 millis. About 6 percent round to 1 and the rest rounded to either 2 or 3 seconds.
+Notice that roughly 93 percent of the query times rounded to 0, meaning they were under 500ms.
+About 6 percent round to 1 and the rest rounded to either 2 or 3 seconds.
 
 
 === QTime Percentiles Plot
 
-A percentile plot is another powerful tool for understanding the distribution of query times
-in the logs. The example below demonstrates how to create and interpret percentile plots.
+A percentile plot is another powerful tool for understanding the distribution of query times in the logs.
+The example below demonstrates how to create and interpret percentile plots.
 
-In this example an `array` of percentiles is created and set to variable *p*.
+In this example an `array` of percentiles is created and set to variable `p`.
 
-Then a random sample of 10000 log records is drawn and set to variable *a*. The `col` function
-is then used to extract the *qtime_i* field from the sample results and this vector is set to
-variable *b*.
+Then a random sample of 10000 log records is drawn and set to variable `a`.
+The `col` function is then used to extract the `qtime_i` field from the sample results and this vector is set to variable `b`.
 
-The `percentile` function is then used to calculate the value at each percentile for the vector
-of query times. The array of percentiles set to variable *p* tells the `percentile` function
+The `percentile` function is then used to calculate the value at each percentile for the vector of query times.
+The array of percentiles set to variable `p` tells the `percentile` function
 which percentiles to calculate.
 
-Then the `zplot` function is used to plot the *percentiles* on the *x-axis* and
-the *query time* at each percentile on the *y-axis*.
+Then the `zplot` function is used to plot the *percentiles* on the x-axis and
+the *query time* at each percentile on the y-axis.
 
 image::images/math-expressions/query-qq.png[]
 
-From the plot we can see that the 80th percentile has a query time of 464. This means that 80% percent of queries
-are below 464 millis.
-
+From the plot we can see that the 80th percentile has a query time of 464ms.
+This means that 80% percent of queries are below 464ms.
 
 === QTime Time Series
 
 A time series aggregation can also be run to visualization how QTime changes over time.
 
-The example below shows a time series, area chart that visualizes *average query time* at
-15 second intervals for a 3 minute section of a log.
+The example below shows a time series, area chart that visualizes *average query time* at 15 second intervals for a 3 minute section of a log.
 
 image::images/math-expressions/qtime-series.png[]
 
 
-== Performance Trouble Shooting
+== Performance Troubleshooting
 
-If query analysis determines that queries are not performing as expected then log analysis can also be
-used to trouble shoot the cause of the slowness. The section below demonstrates several approaches for
-locating the source of query slowness.
+If query analysis determines that queries are not performing as expected then log analysis can also be used to troubleshoot the cause of the slowness.
+The section below demonstrates several approaches for locating the source of query slowness.
 
 === Slow Nodes
 
-In a distributed search the final search performance is only as fast as the slowest
-responding shard in the cluster. Therefore one slow node can be responsible for slow
-overall search time.
+In a distributed search the final search performance is only as fast as the slowest responding shard in the cluster.
+Therefore one slow node can be responsible for slow overall search time.
 
-The fields *core_s*, *replica_s* and *shard_s* are available in the log records.
+The fields `core_s`, `replica_s` and `shard_s` are available in the log records.
 These fields allow average query time to be calculated by *core*, *replica* or *shard*.
 
-The *core_s* field is particularly useful as its the most granular element and
+The `core_s` field is particularly useful as its the most granular element and
 the naming convention often includes the collection, shard and replica information.
 
-The example below uses the `facet` function to calculate *avg(qtime_i)* by core.
+The example below uses the `facet` function to calculate `avg(qtime_i)` by core.
 
 image::images/math-expressions/slow-nodes.png[]
 
-Notice in the results that the *core_s* field contains information about the
-*collection*, *shard*, and *replica*. The example also shows that qtime seems to be
-significantly higher for certain cores in the same collection. This should trigger a
-deeper investigation as to why those cores might be performing slower.
+Notice in the results that the `core_s` field contains information about the
+*collection*, *shard*, and *replica*.
+The example also shows that qtime seems to be significantly higher for certain cores in the same collection.
+This should trigger a deeper investigation as to why those cores might be performing slower.
 
 === Slow Queries
 
-If query analysis shows that most queries are performing well but there are outlier
-queries that are slow,
-one reason for this may be that specific queries are slow.
+If query analysis shows that most queries are performing well but there are outlier queries that are slow, one reason for this may be that specific queries are slow.
 
-The `q_s` and `q_t` fields both hold the value of the *q* parameter in the Solr parameters. The `q_s`
-field is a string field and the `q_t` field has been tokenized.
+The `q_s` and `q_t` fields both hold the value of the *q* parameter from Solr requests.
+The `q_s` field is a string field and the `q_t` field has been tokenized.
 
-The `search` function can be used to return the top N slowest queries in the logs by sorting
-the results by *qtime_i desc*. the example
+The `search` function can be used to return the top N slowest queries in the logs by sorting the results by `qtime_i desc`. the example
 below demonstrates this:
 
 image::images/math-expressions/slow-queries.png[]
 
-Once the queries have been retrieved they can be inspected and tried individually to determine if the
-query is consistently slow. If the query is shown to be slow a plan to improve the query performance
+Once the queries have been retrieved they can be inspected and tried individually to determine if the query is consistently slow.
+If the query is shown to be slow a plan to improve the query performance
 can be devised.
 
 === Commits
 
-Commits and activities that cause commits, such as full index replications, can result in
-slower query performance. Time series visualization can help to determine if commits are
+Commits and activities that cause commits, such as full index replications, can result in slower query performance.
+Time series visualization can help to determine if commits are
 related to degraded performance.
 
-The first step is to visualize the query performance issue. The time series below
-limits the log results to records that are type *query* and computes the *max(qtime_i)*  at ten minute intervals. The plot shows the day, hour and minute
-on the *x-axis* and *max(qtime_i)*  in millis on the *y-axis*. Notice there are some
-extreme spikes in max qtime_i that need to be understood.
+The first step is to visualize the query performance issue.
+The time series below limits the log results to records that are type `query` and computes the `max(qtime_i)` at ten minute intervals.
+The plot shows the day, hour and minute on the x-axis and `max(qtime_i)` in milliseconds on the y-axis.
+Notice there are some extreme spikes in max `qtime_i` that need to be understood.
 
 image::images/math-expressions/query-spike.png[]
 
 
 The next step is to generate a time series that counts commits across the same time intervals.
-The time series below uses the same *start*, *end* and *gap* as the initial time series. But
-this time series is computed for records that have a log type of *commit*. The count for the
-commits is calculated and plotted on *y-axis*.
+The time series below uses the same `start`, `end` and `gap` as the initial time series.
+But this time series is computed for records that have a type of `commit`.
+The count for the commits is calculated and plotted on y-axis.
 
-Notice that there are spikes in commit activity that appear near the spikes in max qtime_i.
+Notice that there are spikes in commit activity that appear near the spikes in max `qtime_i`.
 
 image::images/math-expressions/commit-series.png[]
 
 The final step is to overlay the two time series in the same plot.
 
 This is done by performing both time series and setting the results to variables, in this case
-*a* and *b*.
+`a` and `b`.
 
-Then the *date_dt* and *max(qtime_)* fields are extracted as vectors from the first time series and set to variables using the
-`col` function. And the count(*) field is extracted from the second time series.
+Then the `date_dt` and `max(qtime_i)` fields are extracted as vectors from the first time series and set to variables using the `col` function.
+And the `count(*)` field is extracted from the second time series.
 
-The `zplot` function is then used to plot the time stamp vector on the *x-axis* and the max qtimes and
-commit count vectors on *y-axis*.
+The `zplot` function is then used to plot the time stamp vector on the x-axis and the max qtimes and commit count vectors on y-axis.
 
 NOTE: The `minMaxScale` function is used to scale both vectors
 between 0 and 1 so they can be visually compared on the same plot.
@@ -387,20 +373,18 @@ between 0 and 1 so they can be visually compared on the same plot.
 image::images/math-expressions/overlay-series.png[]
 
 Notice in this plot that the commit count seems to be closely related to spikes
-in max qtime_i.
+in max `qtime_i`.
 
 == Errors
 
-The log index will contain any error records found in the logs. Error records will have a
-*type_s* field of *error*.
+The log index will contain any error records found in the logs. Error records will have a `type_s` field value of `error`.
 
 The example below searches for error records:
 
 image::images/math-expressions/search-error.png[]
 
 
-If the error is followed by a stack trace the stack trace will be present in the searchable field
-*stack_t*. The example below shows a search on the stack_t field and the stack trace presented in the
-result.
+If the error is followed by a stack trace the stack trace will be present in the searchable field `stack_t`.
+The example below shows a search on the `stack_t` field and the stack trace presented in the result.
 
 image::images/math-expressions/stack.png[]
diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc
index 876491e..1c9bc88 100644
--- a/solr/solr-ref-guide/src/machine-learning.adoc
+++ b/solr/solr-ref-guide/src/machine-learning.adoc
@@ -64,7 +64,7 @@ When this expression is sent to the `/stream` handler it responds with:
 }
 ----
 
-Below the distance is calculated using *Manhattan* distance.
+Below the distance is calculated using Manhattan distance.
 
 [source,text]
 ----
@@ -105,13 +105,12 @@ of the matrix.
 The example below demonstrates the power of distance matrices combined with 2 dimensional faceting.
 
 In this example the `facet2D` function is used to generate a two dimensional facet aggregation
-over the fields *complaint_type_s* and *zip_s* from the *nyc311* complaints database.
+over the fields `complaint_type_s` and `zip_s` from the `nyc311` complaints database.
 The *top 20* complaint types and the *top 25* zip codes for each complaint type are aggregated.
-The result is a stream of tuples each containing the fields *complaint_type_s*, *zip_s* and
-the count for the pair.
+The result is a stream of tuples each containing the fields `complaint_type_s`, `zip_s` and the count for the pair.
 
-The `pivot` function is then used to pivot the fields into a *matrix* with the *zip_s*
-field as the *rows* and the *complaint_type_s* field as the *columns*. The `count(*)` field populates
+The `pivot` function is then used to pivot the fields into a *matrix* with the `zip_s`
+field as the *rows* and the `complaint_type_s` field as the *columns*. The `count(*)` field populates
 the values in the cells of the matrix.
 
 The `distance` function is then used to compute the distance matrix for the columns
@@ -155,24 +154,24 @@ result set. The goal of the example is to find zip codes in the nyc311 complaint
 database that have similar complaint types to the zip code 10280.
 
 The first step in the example is to use the `facet2D` function to perform a two
-dimensional aggregation over the *zip_s* and *complaint_type_s* fields. In the example
+dimensional aggregation over the `zip_s` and `complaint_type_s` fields. In the example
 the top 119 zip codes and top 5 complaint types for each zip code are calculated
 for the borough of Manhattan. The result is a list of tuples each containing
-the *zip_s*, *complaint_type_s* and the *count* for the combination.
-
-The list of tuples is then *pivoted* into a matrix with the `pivot` function. The
-`pivot` function in this example returns a matrix with rows of zip codes
-and columns of complaint types. The `count(*)` field from the tuples
-populates the cells of the matrix. This matrix will be used as the secondary
-search matrix.
-
-The next step is to locate the vector for the 10280 zip code. This is done in
-three steps in the example. The first step is to retrieve the row labels from
-the matrix with the `getRowLabels` function. The row labels in this case are zip codes which were populated
-by the `pivot` function. Then the `indexOf` function is used
-to find the *index* of the "10280" zip code in the list of row labels. The `rowAt`
-function is then used to return the vector at that *index* from the matrix. This vector
-is the *search vector*.
+the `zip_s`, `complaint_type_s` and the `count(*)` for the combination.
+
+The list of tuples is then *pivoted* into a matrix with the `pivot` function.
+The `pivot` function in this example returns a matrix with rows of zip codes
+and columns of complaint types.
+The `count(*)` field from the tuples populates the cells of the matrix.
+This matrix will be used as the secondary search matrix.
+
+The next step is to locate the vector for the 10280 zip code.
+This is done in three steps in the example.
+The first step is to retrieve the row labels from the matrix with the `getRowLabels` function.
+The row labels in this case are zip codes which were populated by the `pivot` function.
+Then the `indexOf` function is used to find the *index* of the "10280" zip code in the list of row labels.
+The `rowAt` function is then used to return the vector at that *index* from the matrix.
+This vector is the *search vector*.
 
 Now that we have a matrix and search vector we can use the `knn` function to perform the search.
 In the example the `knn` function searches the matrix with the search vector with a K of 5, using
@@ -210,18 +209,18 @@ The `knnRegress` function is used to perform nearest neighbor regression.
 
 The example below shows the *regression plot* for KNN regression applied to a 2D scatter plot.
 
-In this example the `random` function is used to draw 500 random samples from the *logs* collection
-containing two fields *filesize_d* and *eresponse_d*. The sample is then vectorized with the
-*filesize_d* field stored in a vector assigned to variable *x* and the *eresponse_d* vector stored in
-variable *y*. The `knnRegress` function is then applied with 20 as the nearest neighbor parameter,
+In this example the `random` function is used to draw 500 random samples from the `logs` collection
+containing two fields `filesize_d` and `eresponse_d`. The sample is then vectorized with the
+`filesize_d` field stored in a vector assigned to variable *x* and the `eresponse_d` vector stored in
+variable `y`. The `knnRegress` function is then applied with `20` as the nearest neighbor parameter,
 which returns a KNN function which can be used to predict values.
-The `predict` function is then called on the KNN function to predict values for the original *x* vector.
-Finally `zplot` is used to plot the original *x* and *y* vectors along with the predictions.
+The `predict` function is then called on the KNN function to predict values for the original `x` vector.
+Finally `zplot` is used to plot the original `x` and `y` vectors along with the predictions.
 
 image::images/math-expressions/knnRegress.png[]
 
-Notice that the regression plot shows a non-linear relations ship between the *filesize_d*
-field and the *eresponse_d* field. Also note that KNN regression
+Notice that the regression plot shows a non-linear relations ship between the `filesize_d`
+field and the `eresponse_d` field. Also note that KNN regression
 plots a non-linear curve through the scatter plot. The larger the size
 of K (nearest neighbors), the smoother the line.
 
@@ -241,14 +240,14 @@ from 3 to 8.
 KNN regression can be used to predict wine quality for vectors containing
 the predictor values.
 
-In the example a search is performed on the *redwine* collection to
+In the example a search is performed on the `redwine` collection to
 return all the rows in the database of observations. Then the quality field and
 predictor fields are read into vectors and set to variables.
 
 The predictor variables are added as rows to a matrix which is
 transposed so each row in the matrix contains one observation with the 9
-predictor values. This is our observation matrix which is assigned to the variable
-*obs*.
+predictor values.
+This is our observation matrix which is assigned to the variable `obs`.
 
 Then the `knnRegress` function regresses the observations with quality outcomes.
 The value for K is set to 5 in the example, so the average quality of the 5
@@ -258,7 +257,7 @@ The `predict` function is then used to generate a vector of predictions
 for the entire observation set. These predictions will be used to determine
 how well the KNN regression performed over the observation data.
 
-The error or *residuals* for the regression are then calculated by
+The error, or *residuals*, for the regression are then calculated by
 subtracting the *predicted* quality from the *observed* quality.
 The `ebeSubtract` function is used to perform the element-by-element
 subtraction between the two vectors.
@@ -298,37 +297,34 @@ we can see the probability of getting prediction errors between -1 and 1 is quit
 
 *Additional KNN Regression Parameters*
 
-The `knnRegression` function has three additional parameters that make it suitable for many
-different regression scenarios.
-
-1) Any of the distance measures can be used for the regression simply by adding the function
-to the call. This allows for regression analysis over sparse vectors (cosine), dense vectors and
-geo-spatial lat/lon vectors (haversineMeters).
+The `knnRegression` function has three additional parameters that make it suitable for many different regression scenarios.
 
+. Any of the distance measures can be used for the regression simply by adding the function to the call.
+This allows for regression analysis over sparse vectors (`cosine`), dense vectors and geo-spatial lat/lon vectors (`haversineMeters`).
++
 Sample syntax:
-
++
 [source,text]
 ----
 r=knnRegress(obs, quality, 5, cosine()),
 ----
 
-2) The `robust` named parameter can be used to perform a regression analysis that is robust
-to outliers in the outcomes. When the `robust` named parameter is used the median outcome
-of the K nearest neighbors is used rather than the average.
-
+. The `robust` named parameter can be used to perform a regression analysis that is robust to outliers in the outcomes.
+When the `robust` parameter is used the median outcome of the k-nearest neighbors is used rather than the average.
++
 Sample syntax:
-
++
 [source,text]
 ----
 r=knnRegress(obs, quality, 5, robust="true"),
 ----
 
-3) The `scale` named parameter can be used to scale the columns of the observations and search vectors
+. The `scale` named parameter can be used to scale the columns of the observations and search vectors
 at prediction time. This can improve the performance of the KNN regression when the feature columns
 are at different scales causing the distance calculations to be place too much weight on the larger columns.
-
++
 Sample syntax:
-
++
 [source,text]
 ----
 r=knnRegress(obs, quality, 5, scale="true"),
@@ -338,21 +334,14 @@ r=knnRegress(obs, quality, 5, scale="true"),
 
 The `knnSearch` function returns the k-nearest neighbors
 for a document based on text similarity.
-Under the covers the `knnSearch` function
-uses Solr's More Like This query parser plugin. This capability uses the search
-engines query, term statistics, scoring and ranking capability to perform a fast,
-nearest neighbor search for similar documents over large distributed indexes.
-
-The results of this
-search can be used directly or provide *candidates* for machine learning operations such
-as a secondary knn vector search.
-
-The example below shows the `knnSearch` function on a movie reviews data set. The
-search returns the 50 documents most similar to a specific document id (*83e9b5b0...*) based on
-the similarity of the *review_t* field. The *mindf* and *maxdf* specify the min and max
-document frequency of the terms used to perform the search. These parameters can make the
-query faster by eliminating high frequency terms and also improves accuracy by
-removing noise terms from the search.
+Under the covers the `knnSearch` function uses Solr's <<other-parsers.adoc#more-like-this-query-parser,More Like This>> query parser plugin.
+This capability uses the search engine's query, term statistics, scoring, and ranking capability to perform a fast, nearest neighbor search for similar documents over large distributed indexes.
+
+The results of this search can be used directly or provide *candidates* for machine learning operations such as a secondary KNN vector search.
+
+The example below shows the `knnSearch` function on a movie reviews data set. The search returns the 50 documents most similar to a specific document ID (`83e9b5b0...`) based on the similarity of the `review_t` field.
+The `mindf` and `maxdf` specify the minimum and maximum document frequency of the terms used to perform the search.
+These parameters can make the query faster by eliminating high frequency terms and also improves accuracy by removing noise terms from the search.
 
 image::images/math-expressions/knnSearch.png[]
 
@@ -362,44 +351,41 @@ to read in a table.
 
 == DBSCAN
 
-DBSCAN clustering is a powerful density based clustering algorithm which is particularly well
-suited for geospatial clustering. DBSCAN uses two parameters to filter result sets to
-clusters of specific density:
+DBSCAN clustering is a powerful density-based clustering algorithm which is particularly well suited for geospatial clustering.
+DBSCAN uses two parameters to filter result sets to clusters of specific density:
 
-* eps (Epsilon): Defines the distance between points to be considered as neighbors
+* `eps` (Epsilon): Defines the distance between points to be considered as neighbors
 
-* min points: The minimum number of points needed in a cluster for it to be returned.
+* `min` points: The minimum number of points needed in a cluster for it to be returned.
 
 
 === 2D Cluster Visualization
 
-The `zplot` function has direct support for plotting 2D clusters by using the *clusters* named parameter.
+The `zplot` function has direct support for plotting 2D clusters by using the `clusters` named parameter.
 
 The example below uses DBSCAN clustering and cluster visualization to find
 the *hot spots* on a map for rat sightings in the NYC 311 complaints database.
 
-In this example the `random` function draws a sample of records from the nyc311 collection where
+In this example the `random` function draws a sample of records from the `nyc311` collection where
 the complaint description matches "rat sighting" and latitude is populated in the record.
-The latitude and longitude fields
-are then vectorized and added as rows to a matrix. The matrix is transposed so each row contains a single latitude, longitude
-point. The `dbscan` function is then used to cluster the latitude and longitude points. Notice that the
-`dbscan` function in the example has four parameters.
+The latitude and longitude fields are then vectorized and added as rows to a matrix.
+The matrix is transposed so each row contains a single latitude, longitude
+point.
+The `dbscan` function is then used to cluster the latitude and longitude points.
+Notice that the `dbscan` function in the example has four parameters.
 
-* obs : The observation matrix of lat/lon points
+* `obs` : The observation matrix of lat/lon points
 
-* eps : The distance between points to be considered a cluster. 100 meters in the example.
+* `eps` : The distance between points to be considered a cluster. 100 meters in the example.
 
-* min points: The min points in a cluster for the cluster to be returned by the function. 5 in the example.
+* `min points`: The minimum points in a cluster for the cluster to be returned by the function. `5` in the example.
 
-* distance measure: An optional distance measure used to determine the
-distance between points. The default is Euclidean distance. The example uses *haversineMeters*
-which returns the distance in meters which is much more meaningful for geospatial use
-cases.
+* `distance measure`: An optional distance measure used to determine the
+distance between points. The default is Euclidean distance.
+The example uses `haversineMeters` which returns the distance in meters which is much more meaningful for geospatial use cases.
 
-Finally, the `zplot` function
-is used to visualize the clusters on a map with Zeppelin-Solr.
-The map below has been zoomed to a specific area of Brooklyn with a
-high density of rat sightings.
+Finally, the `zplot` function is used to visualize the clusters on a map with Zeppelin-Solr.
+The map below has been zoomed to a specific area of Brooklyn with a high density of rat sightings.
 
 image::images/math-expressions/dbscan1.png[]
 
@@ -431,12 +417,11 @@ We'll see that sampling itself is a powerful noise reduction tool which helps vi
 This is because there is a higher probability that samples will be drawn from higher density clusters and a lower
 probability that samples will be drawn from lower density clusters.
 
-In this example the `random` function draws a sample of 1500 records from the nyc311 (complaints database) collection where
+In this example the `random` function draws a sample of 1500 records from the `nyc311` (complaints database) collection where
 the complaint description matches "rat sighting" and latitude is populated in the record. The latitude and longitude fields
 are then vectorized and added as rows to a matrix. The matrix is transposed so each row contains a single latitude, longitude
 point. The `kmeans` function is then used to cluster the latitude and longitude points into 21 clusters.
-Finally, the `zplot` function
-is used to visualize the clusters as a scatter plot.
+Finally, the `zplot` function is used to visualize the clusters as a scatter plot.
 
 image::images/math-expressions/2DCluster1.png[]
 
@@ -453,11 +438,11 @@ surrounded by less dense but still high activity clusters.
 
 === Plotting the Centroids
 
-The centroids of each cluster can then be plotted on a *map* to visualize the center of the
+The centroids of each cluster can then be plotted on a map to visualize the center of the
 clusters. In the example below the centroids are extracted from the clusters using the `getCentroids`
 function, which returns a matrix of the centroids.
 
-The centroids matrix contains 2D lan/lon points. The `colAt` function can then be used
+The centroids matrix contains 2D lat/lon points. The `colAt` function can then be used
 to extract the latitude and longitude columns by index from the matrix so they can be
 plotted with `zplot`. A map visualization is used below to display the centroids.
 
@@ -481,15 +466,15 @@ NOTE: The example below works with TF-IDF _term vectors_.
 The section <<term-vectors.adoc#term-vectors,Text Analysis and Term Vectors>> offers
 a full explanation of this features.
 
-In the example the `search` function returns documents where the *review_t* field matches the phrase "star wars".
+In the example the `search` function returns documents where the `review_t` field matches the phrase "star wars".
 The `select` function is run over the result set and applies the `analyze` function
-which uses the Lucene/Solr analyzer attached to the schema field *text_bigrams* to re-analyze the *review_t*
-field. This analyzer returns bigrams which are then annotated to documents in a field called *terms*.
+which uses the Lucene/Solr analyzer attached to the schema field `text_bigrams` to re-analyze the `review_t`
+field. This analyzer returns bigrams which are then annotated to documents in a field called `terms`.
 
-The `termVectors` function then creates TD-IDF term vectors from the bigrams stored in the *terms* field.
+The `termVectors` function then creates TD-IDF term vectors from the bigrams stored in the `terms` field.
 The `kmeans` function is then used to cluster the bigram term vectors into 5 clusters.
-Finally the top 5 features are extracted from the centroids and returned. Notice
-that the features are all bigram phrases with semantic significance.
+Finally the top 5 features are extracted from the centroids and returned.
+Notice that the features are all bigram phrases with semantic significance.
 
 [source,text]
 ----
@@ -638,7 +623,7 @@ This expression returns the following response:
 
 The `fuzzyKmeans` function is a soft clustering algorithm which
 allows vectors to be assigned to more then one cluster. The `fuzziness` parameter
-is a value between 1 and 2 that determines how fuzzy to make the cluster assignment.
+is a value between `1` and `2` that determines how fuzzy to make the cluster assignment.
 
 After the clustering has been performed the `getMembershipMatrix` function can be called
 on the clustering result to return a matrix describing the probabilities
@@ -646,7 +631,7 @@ of cluster membership for each vector.
 This matrix can be used to understand relationships between clusters.
 
 In the example below `fuzzyKmeans` is used to cluster the movie reviews matching the phrase "star wars".
-But instead of looking at the clusters or centroids the `getMembershipMatrix` is used to return the
+But instead of looking at the clusters or centroids, the `getMembershipMatrix` is used to return the
 membership probabilities for each document. The membership matrix is comprised of a row for each
 vector that was clustered. There is a column in the matrix for each cluster.
 The values in the matrix contain the probability that a specific vector belongs to a specific cluster.
@@ -654,9 +639,9 @@ The values in the matrix contain the probability that a specific vector belongs
 In the example the `distance` function is then used to create a *distance matrix* from the columns of the
 membership matrix. The distance matrix is then visualized with the `zplot` function as a heat map.
 
-In the example cluster1 and cluster5 have the shortest distance between the clusters.
+In the example `cluster1` and `cluster5` have the shortest distance between the clusters.
 Further analysis of the features in both clusters can be performed to understand
-the relationship between cluster1 and cluster5.
+the relationship between `cluster1` and `cluster5`.
 
 image::images/math-expressions/fuzzyk.png[]
 
@@ -673,7 +658,7 @@ When operating on a matrix the rows of the matrix are scaled.
 === Min/Max Scaling
 
 The `minMaxScale` function scales a vector or matrix between a minimum and maximum value.
-By default it will scale between 0 and 1 if min/max values are not provided.
+By default it will scale between `0` and `1` if min/max values are not provided.
 
 Below is a plot of a sine wave, with an amplitude of 1, before and
 after it has been scaled between -5 and 5.
diff --git a/solr/solr-ref-guide/src/math-expressions.adoc b/solr/solr-ref-guide/src/math-expressions.adoc
index 9ab7e56..af79bc7 100644
--- a/solr/solr-ref-guide/src/math-expressions.adoc
+++ b/solr/solr-ref-guide/src/math-expressions.adoc
@@ -1,5 +1,5 @@
 = Streaming Expressions and Math Expressions
-:page-children: visualization, math-start, loading, search-sample, transform, scalar-math, vector-math, variables, matrix-math, term-vectors, statistics, probability-distributions, simulations, time-series, regression, numerical-analysis, curve-fitting, dsp, machine-learning, computational-geometry, logs
+:page-children: visualization, math-start, loading, search-sample, transform, scalar-math, vector-math, variables, matrix-math, term-vectors, probability-distributions, statistics, simulations, time-series, regression, numerical-analysis, curve-fitting, dsp, machine-learning, computational-geometry, logs
 :page-show-toc: false
 
 // Licensed to the Apache Software Foundation (ASF) under one
diff --git a/solr/solr-ref-guide/src/scalar-math.adoc b/solr/solr-ref-guide/src/scalar-math.adoc
index 4db6a38..696aa00 100644
--- a/solr/solr-ref-guide/src/scalar-math.adoc
+++ b/solr/solr-ref-guide/src/scalar-math.adoc
@@ -98,7 +98,7 @@ expression. The `select` function is selecting the `response_d` field
 and computing a new field called `new_response` using the `mult` math
 expression.
 
-The first parameter of the `mult` expression is the *response_d* field.
+The first parameter of the `mult` expression is the `response_d` field.
 The second parameter is the scalar value 10. This multiplies the value
 of the `response_d` field in each tuple by 10.
 
diff --git a/solr/solr-ref-guide/src/statistics.adoc b/solr/solr-ref-guide/src/statistics.adoc
index b212724..d391e4b 100644
--- a/solr/solr-ref-guide/src/statistics.adoc
+++ b/solr/solr-ref-guide/src/statistics.adoc
@@ -26,7 +26,7 @@ numeric array. The `describe` function returns a single *tuple* with name/value
 pairs containing the descriptive statistics.
 
 Below is a simple example that selects a random sample of documents from the *logs* collection,
-vectorizes the *response_d* field in the result set and uses the `describe` function to
+vectorizes the `response_d` field in the result set and uses the `describe` function to
 return descriptive statistics about the vector.
 
 [source,text]
@@ -89,7 +89,7 @@ The `hist` function creates a histogram designed for usage with continuous data.
 
 In the example below a histogram is used to visualize a random sample of
 response times from the logs collection. The example retrieves the
-random sample with the `random` function and creates a vector from the *response_d* field
+random sample with the `random` function and creates a vector from the `response_d` field
 in the result set. Then the `hist` function is applied to the vector
 to return a histogram with 22 bins. The `hist` function returns a
 list of tuples with summary statistics for each bin.
@@ -199,10 +199,10 @@ of rounded *differences* in daily opening stock prices for the stock ticker *amz
 This example is interesting because it shows a multi-step process to arrive
 at the result. The first step is to *search* for records in the the *stocks*
 collection with a ticker of *amzn*. Notice that the result set is sorted by
-date ascending and it returns the *open_d* field which is the opening price for
+date ascending and it returns the `open_d` field which is the opening price for
 the day.
 
-The *open_d* field is then vectorized and set to variable *b*, which now contains
+The `open_d` field is then vectorized and set to variable *b*, which now contains
 a vector of opening prices ordered by date ascending.
 
 The `diff` function is then used to calculate the *first difference* for the
@@ -279,8 +279,8 @@ rounded to integers. The most frequently occurring value is 0 with 1494 occurren
 == Percentiles
 
 The `percentile` function returns the estimated value for a specific percentile in
-a sample set. The example below returns a random sample containing the *response_d* field
-from the logs collection. The *response_d* field is vectorized and the 20th percentile
+a sample set. The example below returns a random sample containing the `response_d` field
+from the logs collection. The `response_d` field is vectorized and the 20th percentile
 is calculated for the vector:
 
 [source,text]
@@ -311,7 +311,7 @@ When this expression is sent to the `/stream` handler it responds with:
 
 The `percentile` function can also compute an array of percentile values.
 The example below is computing the 20th, 40th, 60th and 80th percentiles for a random sample
-of the *response_d* field:
+of the `response_d` field:
 
 [source,text]
 ----
@@ -356,8 +356,8 @@ In this example the distribution of daily stock price changes for two stock tick
 *amzn*, are visualized with a quantile plot.
 
 The example first creates an array of values representing the percentiles that will be calculated and sets this array
-to variable *p*. Then random samples of the *change_d* field are drawn for the tickers *amzn* and *goog*. The *change_d* field
-represents the change in stock price for one day. Then the *change_d* field is vectorized for both samples and placed
+to variable *p*. Then random samples of the `change_d` field are drawn for the tickers *amzn* and *goog*. The `change_d` field
+represents the change in stock price for one day. Then the `change_d` field is vectorized for both samples and placed
 in the variables *amzn* and *goog*. The `percentile` function is then used to calculate the percentiles for both vectors. Notice that
 the variable *p* is used to specify the list of percentiles that are calculated.
 
@@ -395,7 +395,7 @@ Three correlation types are supported:
 The type of correlation is specified by adding the *type* named parameter in the
 function call.
 
-In the example below a random sample containing two fields, *filesize_d* and *response_d*, is drawn from
+In the example below a random sample containing two fields, `filesize_d` and `response_d`, is drawn from
 the logs collection using the `random` function. The fields are vectorized into the
 variables *x* and *y* and then *Spearman's* correlation for
 the two vectors is calculated using the `corr` function.
@@ -414,13 +414,13 @@ of the matrix.
 The example below demonstrates the power of correlation matrices combined with 2 dimensional faceting.
 
 In this example the `facet2D` function is used to generate a two dimensional facet aggregation
-over the fields *complaint_type_s* and *zip_s* from the *nyc311* complaints database.
+over the fields `complaint_type_s` and `zip_s` from the *nyc311* complaints database.
 The *top 20* complaint types and the *top 25* zip codes for each complaint type are aggregated.
-The result is a stream of tuples each containing the fields *complaint_type_s*, *zip_s* and
+The result is a stream of tuples each containing the fields `complaint_type_s`, `zip_s` and
 the count for the pair.
 
-The `pivot` function is then used to pivot the fields into a *matrix* with the *zip_s*
-field as the *rows* and the *complaint_type_s* field as the *columns*. The `count(*)` field populates
+The `pivot` function is then used to pivot the fields into a *matrix* with the `zip_s`
+field as the *rows* and the `complaint_type_s` field as the *columns*. The `count(*)` field populates
 the values in the cells of the matrix.
 
 The `corr` function is then used correlate the *columns* of the matrix. This produces a correlation matrix
@@ -449,7 +449,7 @@ Covariance is an unscaled measure of correlation.
 
 The `cov` function calculates the covariance of two vectors of data.
 
-In the example below a random sample containing two fields, *filesize_d* and *response_d*, is drawn from
+In the example below a random sample containing two fields, `filesize_d` and `response_d`, is drawn from
 the logs collection using the `random` function. The fields are vectorized into the
 variables *x* and *y* and then the covariance for
 the two vectors is calculated using the `cov` function.
diff --git a/solr/solr-ref-guide/src/transform.adoc b/solr/solr-ref-guide/src/transform.adoc
index 0b9d2df..fe2bc30 100644
--- a/solr/solr-ref-guide/src/transform.adoc
+++ b/solr/solr-ref-guide/src/transform.adoc
@@ -98,7 +98,7 @@ to test if a field in the record matches a specific
 regular expression. This allows for sophisticated regex matching over search results.
 
 The example below uses the `matches` function to return all records where
-the *complaint_type_s* field ends with *Commercial*.
+the `complaint_type_s` field ends with *Commercial*.
 
 image::images/math-expressions/search-matches.png[]