You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flagon.apache.org by po...@apache.org on 2019/04/27 18:18:43 UTC
[incubator-flagon] branch FLAGON-344 updated: [FLAGON-379] Finished Scaling guide (just need to add image references), made minor related changes to other docs referenced in scaling guide

This is an automated email from the ASF dual-hosted git repository.

poorejc pushed a commit to branch FLAGON-344
in repository https://gitbox.apache.org/repos/asf/incubator-flagon.git


The following commit(s) were added to refs/heads/FLAGON-344 by this push:
     new 74a7ce4  [FLAGON-379] Finished Scaling guide (just need to add image references), made minor related changes to other docs referenced in scaling guide
74a7ce4 is described below

commit 74a7ce4685a9c785dd8f44111fd825a7a9b3bc43
Author: poorejc <po...@apache.org>
AuthorDate: Sat Apr 27 14:18:13 2019 -0400

    [FLAGON-379] Finished Scaling guide (just need to add image references), made minor related changes to other docs referenced in scaling guide
---
 site/_docs/stack/index.md     |   2 +-
 site/_docs/stack/scaling.md   | 167 ++++++++++++++++++++++++++++++++++++++++--
 site/_docs/useralejs/index.md |   2 +-
 3 files changed, 161 insertions(+), 10 deletions(-)

diff --git a/site/_docs/stack/index.md b/site/_docs/stack/index.md
index ffc943f..c9a4918 100644
--- a/site/_docs/stack/index.md
+++ b/site/_docs/stack/index.md
@@ -27,7 +27,7 @@ Then, **build UserALE.js**.
   $ npm run build
   ```
 
-The build process produced a minified version of UserALE.js and a Web Extension package, giving you two options depending on your needs. 
+The build process produces a minified version of UserALE.js and a Web Extension package, giving you two options depending on your needs. You can skip the build process if you just want to explore UserALE.js; just use the minified script found in our [repo](https://github.com/apache/incubator-flagon-useralejs/tree/master/build).
 
 **Option 1: Include Apache UserALE.js in your project:**
 
diff --git a/site/_docs/stack/scaling.md b/site/_docs/stack/scaling.md
index 719cad9..e7405bd 100644
--- a/site/_docs/stack/scaling.md
+++ b/site/_docs/stack/scaling.md
@@ -28,7 +28,7 @@ It's important to note that the burden of scale isn't placed wholly on your Apac
 
 When thinking about how to scale your Apache Flagon Elastic stack, it's important to note that Elasticsearch isn't a database, its a datastore, which stores documents. Elastic is built on top of [Lucene](http://lucene.apache.org/). That means that Apache Flagon "logs" aren't logs once they're indexed in Elastic, they become searchable documents. That's a huge strength (and why we chose Elastic), but it also means that assumptions about resource consumptions based purely on records and fi [...]
 
-1. Given that documents are the atomic unit of storage in Elastic, *document generation rate* is the most important consideration to scaling. Default [Apache UserALE.js parameters]({{ '/docs/useralejs' | prepend: site.baseurl }}) produce a lot of [data]({{ '/docs/useralejs/dataschema' | prepend: site.baseurl }}), even from single users in Apache UserALE.js. In fact, we used say that "drinking from the fire-hose" didn't quite do our data-rate justice--we used to say that opening up UserAL [...]
+1. Given that documents are the atomic unit of storage in Elastic, *document generation rate* is the most important consideration to scaling. Default [Apache UserALE.js parameters]({{ '/docs/useralejs' | prepend: site.baseurl }}) produce a lot of [data]({{ '/docs/useralejs/dataschema' | prepend: site.baseurl }}), even from single users in Apache UserALE.js. In fact, we used say that "drinking from the fire-hose" didn't quite do our data-rate justice--we used to say that opening up UserAL [...]
 
     ![alt text][logBreakdown]
 
@@ -51,7 +51,7 @@ To generate log data, we'll be using our [UserALE.js Example](https://github.com
     
     Important: as noted in the instructions, you'll need to have collected some log data to establish the index.
 
-1. **Once you've started up the ELK stack, take a look at the `userale` index stats.**  
+1. **Once you've started up Elasticsearch, Logstash, Kibana, and metricbead, and confirm that you can see logs in Kibana, take a look at the `userale` index stats.**  
     ```shell
     #Index Stats using Elastic's _stats API
     $ curl localhost:9200/index_name/_stats?pretty=true
@@ -74,18 +74,169 @@ To generate log data, we'll be using our [UserALE.js Example](https://github.com
               "deleted" : 0
             },
             "store" : {
-              "size_in_bytes" : 241212 #this is size of the index in bytes (we've collect .24 MB).
+              "size_in_bytes" : 241212 #this is size of the index in bytes (.24 MB).
     ...
     ```
+    Let's call this value a baseline for your benchmarking. 
+    
     As you continue your benchmarking, the `userale` index "size_in_bytes" will be one of your key metrics.
-1. 
+    
+1. **Next, let's see how much data UserALE.js produces with parameters set to default on your page or app.** We'll call this: "drinking from the flame-thrower". 
 
+    Drop in a UserALE.js script-tag into your project (see [instructions]({{ '/docs/useralejs' | prepend: site.baseurl }})).
+    
+    Here is an example of the script tag we're using in the UserALE.js Example page for this test
+    ```
+    <script
+      src="file:/// ... /UserALEtest/userale-1.1.0.min.js"
+      data-url="http://localhost:8100/"
+      data-user="example-user"
+      data-version="1.1.1"
+      data-tool="Apache UserALE.js Example"
+    ></script>
+    ```
+    To get a conservative upper-bound, generate as many behaviors in your page/app as you can (go nuts with mouse-overs, clicks and scrolls). 
+    
+    We did just that for 5 minutes solid. Here's what our `userale` index looks now (#note annotations):
+   
+    ```shell
+     "indices" : {
+       "userale" : {
+         "uuid" : "0h0Wxe2cSwqMALs4QCJ8Tw",
+         "primaries" : {
+           "docs" : {
+             "count" : 3282, #new userale document count
+             "deleted" : 0
+           },
+           "store" : {
+             "size_in_bytes" : 820978 #new size of the index (.82 MB)
+    ```
+    **Some simple math gives +1,998 documents (~2000) and +579,866 bytes (~.58 MB) generated with UserALE.js by one user in 5 mins**. Continuing our ultra-conservative benchmarking, lets assume this data generation rate over an 8 hour period each day, which gives **~56 MB per day**. At 20 working days in an average month, that gives **1.1 GB per month**. 
+    
+    Again, **this is ultra-conservative**; no one uses pages or applications this way (even games). That's a lot of data for a single-user, but these are valuable, worst-case-scenario upper-bounds to start with. 
+    
+    If you're a scientist or researcher, these figures might be fine, especially if you want to pour over this data. If you're just interested in how people are moving through your content, this is probably overkill. The biggest culprit: take a look at our `Apache Flagon Page Usage Dashboard` to see.
+    
+    ![alt text][mouseOverBench1]
+    
+    Mouseovers accounted for a lot of the data we just produced. Raw mouseover data is written to log very frequently and generates a lot of documents in Elastic. 
 
-[Elastic's Stats API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html)
-[Overview of Elastic's APIs](https://www.datadoghq.com/blog/collect-elasticsearch-metrics/#index-stats-api)
+1. **Now, let's take the simplest approach to scaling back UserALE.js and see how this changes your data generation rate**.
+    
+    What we're doing is using the UserALE.js `configs` that can be passed through the script tag to effectively "downsample" certain channels (e.g., mouseovers) that generate a lot of documents quickly.
+    
+    Here's what our script tag looks like now: 
+    ```
+    <script
+      src="file:/// ... /UserALEtest/userale-1.1.0.min.js"
+      data-url="http://localhost:8100/"
+      data-user="example-user"
+      data-version="1.1.1"
+      data-resolution=1000 #We've increased the delay between collection of "high-resolution" events (e.g., mouseover, scrolls) by 100%.
+      data-tool="Apache UserALE.js Example"
+    ></script>
+    ```
+    Next, replicate your benchmarking, behaving in a similar way for the same amount of time.
+    
+    Here's what our `userale` index looks like now after another 5 minutes of vigorous behavior. 
+        
+    ```shell
+     "indices" : {
+       "userale" : {
+         "uuid" : "0h0Wxe2cSwqMALs4QCJ8Tw",
+         "primaries" : {
+           "docs" : {
+             "count" : 4800, #new userale document count
+             "deleted" : 0
+           },
+           "store" : {
+             "size_in_bytes" : 115978 #new size of the index (1.1 MB)  
+    ```
+    **Maths give us +1518 documents (~500 fewer than our first benchmark) and +295,000 bytes (~.30 MB; ~40% less growth in the store size than previous) generated by one user in 5 minutes.** That would be **28.8 MB per day** or **~576 MB per working month**, per user. We've cut down our data generation considerably by modifying one parameter in our script tag.
+
+    Why? look at the new proportion of mouseover events... we shaved those down by ~50%. As that event stream has the highest data generation, net-net we generated about 25% fewer documents with the same amount and duration of behavior:
+    
+    ![alt text][mouseOverBench2] 
+    
+1.  **Still too much data? Below are some other things that you can do to curb the growth of your `userale` index** as you continue benchmarking with your page/app.
+
+    * If you don't need all the different types of events UserALE.js gathers by default, you can easily curate the list of events you by modifying [UserALE.js source](https://github.com/apache/incubator-flagon-useralejs/tree/master/src) (`attachHandlers.js`), then building a custom UserALE.js script.
+    * If you don't need both raw logs (specific records of events) and interval logs (aggregates of specific events over a time interval), you can easily drop intervals by modifying UserALE.js [UserALE.js source](https://github.com/apache/incubator-flagon-useralejs/tree/master/src) (`attachHandlers.js` or `packageLogs.js`), then building a custom UserALE.js script.
+    * You can reduce the number of meta-data fields written to each UserALE.js log, by modifying [UserALE.js source](https://github.com/apache/incubator-flagon-useralejs/tree/master/src) (`packageLogs.js`), then building a custom UserALE.js script.
+    * If you want different levels of logging granularity (more or less data) for different pages within your site/app, you can build different versions of UserALE.js to service different pages. Just name each version differently and point to the one you want with script-tags, page-by-page.
+    * You can use our [API]({{ '/docs/useralejs/API' | prepend: site.baseurl }}) for surgical precision in how specific elements (targets) on your page generate data.
+    
+### Other Tools to Support Benchmarking for Scaling
 
+1. In our benchmarking guide, we primarily used [Elastic's `Stats` API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html). This provides very verbose output, which is pretty useful for understanding all of your indexes, indexing behavior, and how data is being distributed across nodes. You can use other [Elastic APIs](https://www.datadoghq.com/blog/collect-elasticsearch-metrics/#index-stats-api) for different views of what's going on inside your Apache F [...]
+
+    http://localhost:9200/_cat/indices?format=json&bytes=b&pretty
+    
+    Output is very simple and index sizes stack on top of one another
+    ```shell
+    [
+      {
+        "health" : "green",
+        "status" : "open",
+        "index" : ".kibana_1",
+        "uuid" : "FnI_6AQYQEWp2mIxSlM8HQ",
+        "pri" : "1",
+        "rep" : "0",
+        "docs.count" : "184",
+        "docs.deleted" : "13",
+        "store.size" : "366457",
+        "pri.store.size" : "366457"
+      },
+      {
+        "health" : "yellow",
+        "status" : "open",
+        "index" : "metricbeat-6.6.2-2019.04.22",
+        "uuid" : "pDYNmzsxTFu9Z0Tc1_GdLw",
+        "pri" : "5",
+        "rep" : "1",
+        "docs.count" : "4687",
+        "docs.deleted" : "0",
+        "store.size" : "6309920",
+        "pri.store.size" : "6309920"
+      },
+      {
+        "health" : "green",
+        "status" : "open",
+        "index" : "userale",
+        "uuid" : "0h0Wxe2cSwqMALs4QCJ8Tw",
+        "pri" : "1",
+        "rep" : "0",
+        "docs.count" : "14018",
+        "docs.deleted" : "0",
+        "store.size" : "2235963",
+        "pri.store.size" : "2235963"
+      },
+      {
+        "health" : "yellow",
+        "status" : "open",
+        "index" : "metricbeat-6.6.2-2019.04.27",
+        "uuid" : "wTyUBXvNRMOwpR4lDF9BNA",
+        "pri" : "5",
+        "rep" : "1",
+        "docs.count" : "107124",
+        "docs.deleted" : "0",
+        "store.size" : "63263644",
+        "pri.store.size" : "63263644"
+      }
+    ]
+    ```
+1. There are a lot of other things going on inside your Apache Flagon Elastic stack. Note above that depending on what you deploy (e.g., metricbeat) you may have additional indices that need monitoring. For one, Metricbeat indices grow fast and you'll want to benchmark that too. The Kibana index will grow as well, but not as much; it depends on how how many additional objects you build (e.g., visualizations, indicies, dashboards). Both the `STATS` and `CAT` APIs are very useful for seein [...]
+
+1. There are also a lot of things going on inside your server or Docker containers, depending on how you deploy your Apache Flagon Elastic Stack. This is why we've configured a metricbeat service to run with Flagon. In the same way that you can benchmark your data generation rate while you generate behavioral logs in your page/app, you can also look at how your Apache Flagon stack is utilizing disk and compute resources. When you spin up our single-node Apache Flagon container, make sure [...]
+
+    ![alt text][metricBeat]
+    
+### Wonky Things that Can and Will Happen as You Benchmark
 
-[http://localhost:9200/userale/_stats?pretty=true](http://localhost:9200/userale/_stats?pretty=true)
+1. You just finished a benchmarking session after modifying UserALE.js to produce less data. What you find is that your new store size is either dramatically bigger than your last benchmark or smaller (which should be impossible). What happened is a thing called [merging](https://www.elastic.co/guide/en/elasticsearch/guide/current/merge-process.html). It is a good thing, but it can be a confusing thing while you're benchmarking against your store size. Essentially, each Elastic shard is  [...]
 
+### Summary 
+Benchmarking and adjusting your data-rate so that you can scale how you want to is made very easy in Apache Flagon. We combine easily deployed and modified capabilities with the power of Elastic's APIs and visualization capabilities. Again, our single-node container is not a scaling solution. It's an ingredient, a potent one, that not only serves as the building block for your cluster, but also a benchmarking tool so that your Apache Flagon cluster meets your needs for capability, scale  [...]
+             
+Subscribe to our [dev list](dev-subscribe@flagon.incubator.apache.org) and join the conversation!
 
-Subscribe to our [dev list](dev-subscribe@flagon.incubator.apache.org) and join the conversation!
\ No newline at end of file
diff --git a/site/_docs/useralejs/index.md b/site/_docs/useralejs/index.md
index 64e6afd..85dbfbc 100644
--- a/site/_docs/useralejs/index.md
+++ b/site/_docs/useralejs/index.md
@@ -10,7 +10,7 @@ priority: 0
 *Note:* Work on UserALE.js' documentation is ongoing.  Contributions are welcome. To get involved, see our [Contributing]({{ '/docs/contributing' | prepend: site.baseurl }}) guide.  
 ### Include UserALE.js in your project
 
-To start logging with Apache UserALE.js, you can include our script directly in your project. More details can be found in our [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md).
+To start logging with Apache UserALE.js, you can include our script directly in your project. More details can be found in our [README](https://github.com/apache/incubator-flagon-useralejs/blob/master/README.md). You can use our [build process]({{ '/docs/useralejs/build' | prepend: site.baseurl }}) to create our script, or just use a sample in our [repos](https://github.com/apache/incubator-flagon-useralejs/tree/master/build).
 
 To include UserALE.js in a specific project, you'll need to deploy a version of our minified UserALE.js script in an accessible location (e.g., webserver), and simply include this script tag on the page: