You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by gi...@apache.org on 2019/06/13 15:17:16 UTC

[incubator-druid-website] 46/49: remove druid io mentions (#11)

This is an automated email from the ASF dual-hosted git repository.

gian pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-druid-website.git

commit 30fbb1ef85b153c66789efa88f8b05e700295cbb
Author: Vadim Ogievetsky <va...@gmail.com>
AuthorDate: Wed Jun 12 21:52:41 2019 -0700

    remove druid io mentions (#11)
---
 blog/2011/05/20/druid-part-deux.html               |   22 +-
 ...-right-cardinality-estimation-for-big-data.html |   10 +-
 blog/2013/08/30/loading-data.html                  |    2 +-
 blog/2013/11/04/querying-your-data.html            |   12 +-
 blog/2014/02/03/rdruid-and-twitterstream.html      |   48 +-
 ...oglog-optimizations-for-real-world-systems.html |    2 +-
 blog/2014/03/12/batch-ingestion.html               |   18 +-
 blog/2014/04/15/intro-to-pydruid.html              |    4 +-
 ...ff-on-the-rise-of-the-real-time-data-stack.html |    2 +-
 blog/2015/11/03/seeking-new-committers.html        |    2 +-
 blog/2016/06/28/druid-0-9-1.html                   |    2 +-
 blog/2016/12/01/druid-0-9-2.html                   |    2 +-
 blog/2017/04/18/druid-0-10-0.html                  |    2 +-
 blog/2017/08/22/druid-0-10-1.html                  |    2 +-
 blog/2017/12/04/druid-0-11-0.html                  |    2 +-
 blog/2018/03/08/druid-0-12-0.html                  |    2 +-
 blog/2018/06/08/druid-0-12-1.html                  |    2 +-
 blog/index.html                                    |    6 +-
 community/index.html                               |    6 +-
 .../ingestion/hadoop-vs-native-batch.html          |    4 +-
 .../ingestion/hadoop-vs-native-batch.html          |    4 +-
 .../ingestion/hadoop-vs-native-batch.html          |    4 +-
 docs/latest/ingestion/hadoop-vs-native-batch.html  |    4 +-
 downloads.html                                     |    2 +-
 feed/index.xml                                     | 2023 --------------------
 robots.txt                                         |    2 +-
 technology.html                                    |   10 +-
 27 files changed, 87 insertions(+), 2114 deletions(-)

diff --git a/blog/2011/05/20/druid-part-deux.html b/blog/2011/05/20/druid-part-deux.html
index 912b6b3..020a780 100644
--- a/blog/2011/05/20/druid-part-deux.html
+++ b/blog/2011/05/20/druid-part-deux.html
@@ -137,22 +137,22 @@
         <h1>Druid, Part Deux: Three Principles for Fast, Distributed OLAP</h1>
         <p class="text-muted">by <span class="author text-uppercase">Eric Tschetter</span> · May 20, 2011</p>
 
-        <p>In a <a href="http://druid.io/blog/2011/04/30/introducing-druid.html">previous blog
+        <p>In a <a href="/blog/2011/04/30/introducing-druid.html">previous blog
 post</a> we introduced the
 distributed indexing and query processing infrastructure we call Druid. In that
 post, we characterized the performance and scaling challenges that motivated us
 to build this system in the first place. Here, we discuss three design
 principles underpinning its architecture.</p>
 
-<p><strong>1. Partial Aggregates + In-Memory + Indexes =&gt; Fast Queries</strong> </p>
+<p><strong>1. Partial Aggregates + In-Memory + Indexes =&gt; Fast Queries</strong></p>
 
 <p>We work with two representations of our data: <em>alpha</em> represents the raw,
 unaggregated event logs, while <em>beta</em> is its partially aggregated derivative.
 This <em>beta</em> is the basis against which all further queries are evaluated:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>2011-01-01T01:00:00Z  ultratrimfast.com  google.com  Male    USA  1800  25  15.70 
-2011-01-01T01:00:00Z  bieberfever.com    google.com  Male    USA  2912  42  29.18 
-2011-01-01T02:00:00Z  ultratrimfast.com  google.com  Male    UK   1953  17  17.31 
-2011-01-01T02:00:00Z  bieberfever.com    google.com  Male    UK   3194  170 34.01 
+<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>2011-01-01T01:00:00Z  ultratrimfast.com  google.com  Male    USA  1800  25  15.70
+2011-01-01T01:00:00Z  bieberfever.com    google.com  Male    USA  2912  42  29.18
+2011-01-01T02:00:00Z  ultratrimfast.com  google.com  Male    UK   1953  17  17.31
+2011-01-01T02:00:00Z  bieberfever.com    google.com  Male    UK   3194  170 34.01
 </code></pre></div>
 <p>This is the most compact representation that preserves the finest grain of data,
 while enabling on-the-fly computation of all O(2^n) possible dimensional
@@ -168,7 +168,7 @@ calculation (using AND &amp; OR operations) of rows matching a search query. The
 inverted index enables us to scan a limited subset of rows to compute final
 query results – and these scans are themselves distributed, as we discuss next.</p>
 
-<p><strong>2. Distributed Data + Parallelizable Queries =&gt; Horizontal Scalability</strong> </p>
+<p><strong>2. Distributed Data + Parallelizable Queries =&gt; Horizontal Scalability</strong></p>
 
 <p>Druid’s performance depends on having memory — lots of it. We achieve the requisite
 memory scale by dynamically distributing data across a cluster of nodes. As the
@@ -200,13 +200,13 @@ dramatically improves performance.</p>
 <ul>
 <li>Segments are read-only, so they can simultaneously serve multiple servers. If
 we have a hotspot in a particular index, we can replicate that index to
-multiple servers and load balance across them.<br></li>
+multiple servers and load balance across them.</li>
 <li>We can provide tiered classes of service for our data, with servers occupying
-different points in the “query latency vs. data size” spectrum </li>
+different points in the “query latency vs. data size” spectrum</li>
 <li>Our clusters can span data center boundaries</li>
 </ul>
 
-<p><strong>3. Real-Time Analytics: Immutable Past, Append-Only Future</strong> </p>
+<p><strong>3. Real-Time Analytics: Immutable Past, Append-Only Future</strong></p>
 
 <p>Our system for real-time analytics is centered, naturally, on time. Because past events
 happen once and never change, they need not be re-writable. We need only be
@@ -225,7 +225,7 @@ for addressing segments, rather than loading them all into memory. This
 provides access to long-range data while maintaining the high-performance that
 our customers expect for near-term data.</p>
 
-<h2 id="summary">Summary##</h2>
+<h2 id="summary">Summary</h2>
 
 <p>Druid’s power resides in providing users fast, arbitrarily deep
 exploration of large-scale transaction data. Queries over billions of rows,
diff --git a/blog/2012/05/04/fast-cheap-and-98-right-cardinality-estimation-for-big-data.html b/blog/2012/05/04/fast-cheap-and-98-right-cardinality-estimation-for-big-data.html
index 7357552..f1c87ce 100644
--- a/blog/2012/05/04/fast-cheap-and-98-right-cardinality-estimation-for-big-data.html
+++ b/blog/2012/05/04/fast-cheap-and-98-right-cardinality-estimation-for-big-data.html
@@ -183,10 +183,10 @@ extract a probability distribution for the likelihood of a specific
 phenomenon.  The phenomenon we care about is the maximum index of a 1 bit. 
 Specifically, we expect the following to be true:</p>
 
-<p>50% of hashed values will look like this: 1xxxxxxx…x<br>
-25% of hashed values will look like this: 01xxxxxx…x<br>
-12.5% of hashed values will look like this: 001xxxxxxxx…x<br>
-6.25% of hashed values will look like this: 0001xxxxxxxx…x<br>
+<p>50% of hashed values will look like this: 1xxxxxxx…x
+25% of hashed values will look like this: 01xxxxxx…x
+12.5% of hashed values will look like this: 001xxxxxxxx…x
+6.25% of hashed values will look like this: 0001xxxxxxxx…x
 …</p>
 
 <p>So, naively speaking, we expect that if we were to hash 8 unique things, one of
@@ -227,7 +227,7 @@ ability to compute cardinalities.  We wanted to be able to take advantage of
 the space savings and row reduction of summarization while still being able to
 compute cardinalities:  this is where HyperLogLog comes in.</p>
 
-<p>In <a href="http://druid.io/">Druid</a>, our summarization process applies the hash
+<p>In <a href="/">Druid</a>, our summarization process applies the hash
 function (<a href="http://sites.google.com/site/murmurhash/">Murmur 128</a>) and computes
 the intermediate HyperLogLog format (i.e. the list of buckets of
 <code>max(index of 1)</code>) and stores that in a column.  Thus, for every row in our
diff --git a/blog/2013/08/30/loading-data.html b/blog/2013/08/30/loading-data.html
index 0b987eb..57a4518 100644
--- a/blog/2013/08/30/loading-data.html
+++ b/blog/2013/08/30/loading-data.html
@@ -140,7 +140,7 @@
 
 <p>Druid is a rockin&#39; exploratory analytical data store capable of offering interactive query of big data in realtime - as data is ingested. Druid drives 10&#39;s of billions of events per day for the <a href="http://www.metamarkets.com">Metamarkets</a> platform, and Metamarkets is committed to building Druid in open source.</p>
 
-<p>To learn more check out the first blog in this series <a href="http://druid.io/blog/2013/08/06/twitter-tutorial.html">Understanding Druid Via Twitter Data</a></p>
+<p>To learn more check out the first blog in this series <a href="/blog/2013/08/06/twitter-tutorial.html">Understanding Druid Via Twitter Data</a></p>
 
 <p>Checkout Druid at XLDB on Sept 9th <a href="https://conf-slac.stanford.edu/xldb-2013/tutorials#amC">XLDB</a></p>
 
diff --git a/blog/2013/11/04/querying-your-data.html b/blog/2013/11/04/querying-your-data.html
index 8ebe3da..a1d2ce9 100644
--- a/blog/2013/11/04/querying-your-data.html
+++ b/blog/2013/11/04/querying-your-data.html
@@ -181,7 +181,7 @@ com.metamx.druid.http.ComputeMain
 </code></pre></div>
 <h1 id="querying-your-data">Querying Your Data</h1>
 
-<p>Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to <a href="http://druid.io/blog/2013/08/30/loading-data.html">Loading Your Data</a>. Having done that, its time to query our data!</p>
+<p>Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to <a href="/blog/2013/08/30/loading-data.html">Loading Your Data</a>. Having done that, its time to query our data!</p>
 
 <h2 id="querying-different-nodes">Querying Different Nodes</h2>
 
@@ -258,7 +258,7 @@ com.metamx.druid.http.ComputeMain
 
 <h2 id="querying-against-the-realtime-spec">Querying Against the realtime.spec</h2>
 
-<p>How are we to know what queries we can run? Although <a href="http://druid.io/docs/latest/Querying.html">Querying</a> is a helpful index, to get a handle on querying our data we need to look at our Realtime node&#39;s realtime.spec file:</p>
+<p>How are we to know what queries we can run? Although <a href="/docs/latest/Querying.html">Querying</a> is a helpful index, to get a handle on querying our data we need to look at our Realtime node&#39;s realtime.spec file:</p>
 <div class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="p">[{</span>
   <span class="nt">&quot;schema&quot;</span> <span class="p">:</span> <span class="p">{</span> <span class="nt">&quot;dataSource&quot;</span><span class="p">:</span><span class="s2">&quot;druidtest&quot;</span><span class="p">,</span>
                <span class="nt">&quot;aggregators&quot;</span><span class="p">:[</span> <span class="p">{</span><span class="nt">&quot;type&quot;</span><span class="p">:</span><span class="s2">&quot;count&quot;</span><span class="p">,</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span><span class="s2">&quot;impressions&quot;</span><span class="p">},</span>
@@ -295,7 +295,7 @@ com.metamx.druid.http.ComputeMain
 
 <h3 id="aggregations">aggregations</h3>
 
-<p>Note the <a href="http://druid.io/docs/latest/Aggregations.html">aggregations</a> in our query:</p>
+<p>Note the <a href="/docs/latest/Aggregations.html">aggregations</a> in our query:</p>
 <div class="highlight"><pre><code class="language-json" data-lang="json"><span></span>    <span class="s2">&quot;aggregations&quot;</span><span class="err">:</span> <span class="p">[</span>
         <span class="p">{</span><span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;count&quot;</span><span class="p">,</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;rows&quot;</span><span class="p">},</span>
         <span class="p">{</span><span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;longSum&quot;</span><span class="p">,</span> <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;imps&quot;</span><span class="p">,</span> <span class="nt">&quot;fieldName&quot;</span><span class="p">:</span> <span class="s2">&quot;impressions&quot;</span><span class="p">},</span>
@@ -308,7 +308,7 @@ com.metamx.druid.http.ComputeMain
 </code></pre></div>
 <h3 id="dimensions">dimensions</h3>
 
-<p>Lets look back at our actual records (from <a href="http://druid.io/blog/2013/08/30/loading-data.html">Loading Your Data</a>:</p>
+<p>Lets look back at our actual records (from <a href="/blog/2013/08/30/loading-data.html">Loading Your Data</a>:</p>
 <div class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="p">{</span><span class="nt">&quot;utcdt&quot;</span><span class="p">:</span> <span class="s2">&quot;2010-01-01T01:01:01&quot;</span><span class="p">,</span> <span class="nt">&quot;wp&quot;</span><span class="p">:</span> <span class="mi">1000</span><span class="p">,</span> <span class="nt">&quot;gender&quot;</span><span class="p">:</span> <span class="s2">&quot;male&quot;</span><span class=" [...]
 <span class="p">{</span><span class="nt">&quot;utcdt&quot;</span><span class="p">:</span> <span class="s2">&quot;2010-01-01T01:01:02&quot;</span><span class="p">,</span> <span class="nt">&quot;wp&quot;</span><span class="p">:</span> <span class="mi">2000</span><span class="p">,</span> <span class="nt">&quot;gender&quot;</span><span class="p">:</span> <span class="s2">&quot;female&quot;</span><span class="p">,</span> <span class="nt">&quot;age&quot;</span><span class="p">:</span> <span cl [...]
 <span class="p">{</span><span class="nt">&quot;utcdt&quot;</span><span class="p">:</span> <span class="s2">&quot;2010-01-01T01:01:03&quot;</span><span class="p">,</span> <span class="nt">&quot;wp&quot;</span><span class="p">:</span> <span class="mi">3000</span><span class="p">,</span> <span class="nt">&quot;gender&quot;</span><span class="p">:</span> <span class="s2">&quot;male&quot;</span><span class="p">,</span> <span class="nt">&quot;age&quot;</span><span class="p">:</span> <span clas [...]
@@ -408,11 +408,11 @@ com.metamx.druid.http.ComputeMain
   <span class="p">}</span>
 <span class="p">}</span> <span class="p">]</span>
 </code></pre></div>
-<p>Check out <a href="http://druid.io/docs/latest/Filters.html">Filters</a> for more.</p>
+<p>Check out <a href="/docs/latest/Filters.html">Filters</a> for more.</p>
 
 <h2 id="learn-more">Learn More</h2>
 
-<p>Finally, you can learn more about querying at <a href="http://druid.io/docs/latest/Querying.html">Querying</a>!</p>
+<p>Finally, you can learn more about querying at <a href="/docs/latest/Querying.html">Querying</a>!</p>
 
       </div>
     </div>
diff --git a/blog/2014/02/03/rdruid-and-twitterstream.html b/blog/2014/02/03/rdruid-and-twitterstream.html
index 7945dce..c66cde3 100644
--- a/blog/2014/02/03/rdruid-and-twitterstream.html
+++ b/blog/2014/02/03/rdruid-and-twitterstream.html
@@ -134,7 +134,7 @@
         <h1>RDruid and Twitterstream</h1>
         <p class="text-muted">by <span class="author text-uppercase">Igal Levy</span> · February  3, 2014</p>
 
-        <p>What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&#39;s what combining the prowess of a <a href="http://druid.io">Druid database</a> with the power of <a href="http://www.r-project.org">R</a> can do.</p>
+        <p>What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&#39;s what combining the prowess of a <a href="">Druid database</a> with the power of <a href="http://www.r-project.org">R</a> can do.</p>
 
 <p>In this blog, we&#39;ll look at how to bring streamed realtime data into R using nothing more than a laptop, an Internet connection, and open-source applications. And we&#39;ll do it with <em>only one</em> Druid node.</p>
 
@@ -149,7 +149,7 @@ We also recommend using <a href="http://www.rstudio.com/">RStudio</a> as the R I
 
 <h2 id="set-up-the-twitterstream">Set Up the Twitterstream</h2>
 
-<p>First, register with the Twitter API. Log in at the <a href="https://dev.twitter.com/apps/new">Twitter developer&#39;s site</a> (you can use your normal Twitter credentials) and fill out the form for creating an application; use any website and callback URL to complete the form. </p>
+<p>First, register with the Twitter API. Log in at the <a href="https://dev.twitter.com/apps/new">Twitter developer&#39;s site</a> (you can use your normal Twitter credentials) and fill out the form for creating an application; use any website and callback URL to complete the form.</p>
 
 <p>Make note of the API credentials that are then generated. Later you&#39;ll need to enter them when prompted by the Twitter-example startup script, or save them in a <code>twitter4j.properties</code> file (nicer if you ever restart the server). If using a properties file, save it under <code>$DRUID_HOME/examples/twitter</code>. The file should contains the following (using your real keys):</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>oauth.consumerKey=&lt;yourTwitterConsumerKey&gt;
@@ -162,7 +162,7 @@ oauth.accessTokenSecret=&lt;yourTwitterAccessTokenSecret&gt;
 <p>From the Druid home directory, start the Druid Realtime node:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>$DRUID_HOME/run_example_server.sh
 </code></pre></div>
-<p>When prompted, you&#39;ll choose the &quot;twitter&quot; example. If you&#39;re using the properties file, the server should start right up. Otherwise, you&#39;ll have to answer the prompts with the credentials you obtained from Twitter. </p>
+<p>When prompted, you&#39;ll choose the &quot;twitter&quot; example. If you&#39;re using the properties file, the server should start right up. Otherwise, you&#39;ll have to answer the prompts with the credentials you obtained from Twitter.</p>
 
 <p>After the Realtime node starts successfully, you should see &quot;Connected_to_Twitter&quot; printed, as well as messages similar to the following:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>2014-01-13 19:35:59,646 INFO [chief-twitterstream] druid.examples.twitter.TwitterSpritzerFirehoseFactory - nextRow() has returned 1,000 InputRows
@@ -187,7 +187,7 @@ library(ggplot2)
 </code></pre></div>
 <h2 id="querying-the-realtime-node">Querying the Realtime Node</h2>
 
-<p><a href="http://druid.io/docs/latest/Tutorial:-All-About-Queries.html">Druid queries</a> are in the format of JSON objects, but in R they&#39;ll have a different format. Let&#39;s look at this with a simple query that will give the time range of the Twitter data currently in our Druid node:</p>
+<p><a href="/docs/latest/Tutorial:-All-About-Queries.html">Druid queries</a> are in the format of JSON objects, but in R they&#39;ll have a different format. Let&#39;s look at this with a simple query that will give the time range of the Twitter data currently in our Druid node:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>&gt; druid.query.timeBoundary(druid, dataSource=&quot;twitterstream&quot;, intervals=interval(ymd(20140101), ymd(20141231)), verbose=&quot;true&quot;)
 </code></pre></div>
 <p>Let&#39;s break this query down:</p>
@@ -231,21 +231,21 @@ Content-Length: 151
 &lt; Transfer-Encoding: chunked
 * Server Jetty(8.1.11.v20130520) is not blacklisted
 &lt; Server: Jetty(8.1.11.v20130520)
-&lt; 
+&lt;
 * Connection #2 to host localhost left intact
-                  minTime                   maxTime 
-&quot;2014-01-25 00:52:00 UTC&quot; &quot;2014-01-25 01:35:00 UTC&quot; 
+                  minTime                   maxTime
+&quot;2014-01-25 00:52:00 UTC&quot; &quot;2014-01-25 01:35:00 UTC&quot;
 </code></pre></div>
 <p>At the very end comes the response to our query, a minTime and maxTime, the boundaries to our data set.</p>
 
 <h3 id="more-complex-queries">More Complex Queries</h3>
 
 <p>Now lets look at some real Twitter data. Say we are interested in the number of tweets per language during that time period. We need to do an aggregation via a groupBy query (see RDruid help in RStudio):</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;, 
-                    interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)), 
-                    granularity=granularity(&quot;P1D&quot;), 
-                    aggregations = (tweets = sum(metric(&quot;tweets&quot;))), 
-                    dimensions = &quot;lang&quot;, 
+<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;,
+                    interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)),
+                    granularity=granularity(&quot;P1D&quot;),
+                    aggregations = (tweets = sum(metric(&quot;tweets&quot;))),
+                    dimensions = &quot;lang&quot;,
                     verbose=&quot;true&quot;)
 </code></pre></div>
 <p>We see some new arguments in this query:</p>
@@ -306,7 +306,7 @@ Content-Length: 489
 &lt; Transfer-Encoding: chunked
 * Server Jetty(8.1.11.v20130520) is not blacklisted
 &lt; Server: Jetty(8.1.11.v20130520)
-&lt; 
+&lt;
 * Connection #3 to host localhost left intact
     timestamp tweets  lang
 1  2014-01-25   6476    ar
@@ -359,12 +359,12 @@ Content-Length: 489
 <div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>ggplot(major_tweet_langs, aes(x=lang, y=tweets)) + geom_bar(stat=&quot;identity&quot;)
 </code></pre></div>
 <p>You can refine this query with more aggregations and post aggregations (math within the results). For example, to find out how many rows in Druid the data for each of those languages takes, use:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;, 
-                    interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)), 
-                    granularity=granularity(&quot;all&quot;), 
-                    aggregations = list(rows = druid.count(), 
-                                        tweets = sum(metric(&quot;tweets&quot;))), 
-                    dimensions = &quot;lang&quot;, 
+<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;,
+                    interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)),
+                    granularity=granularity(&quot;all&quot;),
+                    aggregations = list(rows = druid.count(),
+                                        tweets = sum(metric(&quot;tweets&quot;))),
+                    dimensions = &quot;lang&quot;,
                     verbose=&quot;true&quot;)
 </code></pre></div>
 <h2 id="metrics-and-dimensions">Metrics and Dimensions</h2>
@@ -380,10 +380,10 @@ Content-Length: 489
 </ul>
 
 <p>Some interesting analyses on current events could be done using these dimensions and metrics. For example, you could filter on specific hashtags for events that happen to be spiking at the time:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;, 
-                interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)), 
-                granularity=granularity(&quot;P1D&quot;), 
-                aggregations = (tweets = sum(metric(&quot;tweets&quot;))), 
+<div class="highlight"><pre><code class="language-text" data-lang="text"><span></span>druid.query.groupBy(druid, dataSource=&quot;twitterstream&quot;,
+                interval(ymd(&quot;2014-01-01&quot;), ymd(&quot;2015-01-01&quot;)),
+                granularity=granularity(&quot;P1D&quot;),
+                aggregations = (tweets = sum(metric(&quot;tweets&quot;))),
                 filter =
                     dimension(&quot;first_hashtag&quot;) %~% &quot;academyawards&quot; |
                     dimension(&quot;first_hashtag&quot;) %~% &quot;oscars&quot;,
@@ -391,7 +391,7 @@ Content-Length: 489
 </code></pre></div>
 <p>See the <a href="https://github.com/metamx/RDruid/wiki/Examples">RDruid wiki</a> for more examples.</p>
 
-<p>The point to remember is that this data is being streamed into Druid and brought into R via RDruid in realtime. For example, with an R script the data could be continuously queried, updated, and analyzed. </p>
+<p>The point to remember is that this data is being streamed into Druid and brought into R via RDruid in realtime. For example, with an R script the data could be continuously queried, updated, and analyzed.</p>
 
       </div>
     </div>
diff --git a/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html b/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html
index 4af9635..61ee637 100644
--- a/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html
+++ b/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html
@@ -252,7 +252,7 @@ millions of HLL objects with minimal measurable loss in accuracy.</p>
 <p>We first discussed the concept of representing HLL buckets in either a sparse
 or dense format in our <a href="http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data">first blog post</a>. Since that time,
 Google has also written a <a href="http://research.google.com/pubs/pub40671.html">great paper</a> on the matter. Data undergoes
-a <a href="http://druid.io/blog/2011/05/20/druid-part-deux.html">summarization process</a> when it is ingested in Druid. It is
+a <a href="/blog/2011/05/20/druid-part-deux.html">summarization process</a> when it is ingested in Druid. It is
 unnecessarily expensive to store raw event data and instead, Druid rolls
 ingested data up to some time granularity.</p>
 
diff --git a/blog/2014/03/12/batch-ingestion.html b/blog/2014/03/12/batch-ingestion.html
index 57664af..73268f9 100644
--- a/blog/2014/03/12/batch-ingestion.html
+++ b/blog/2014/03/12/batch-ingestion.html
@@ -148,7 +148,7 @@
 
 <ul>
 <li>The R package <code>waterData</code> from USGS. This package allows us to retrieve and analyze hydrologic data from USGS. We can then export that data from within the R environment, then set up Druid to ingest it.</li>
-<li>The R package <code>RDruid</code> which we&#39;ve <a href="http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html">blogged about before</a>. This package allows us to query Druid from within the R environment.</li>
+<li>The R package <code>RDruid</code> which we&#39;ve <a href="/blog/2014/02/03/rdruid-and-twitterstream.html">blogged about before</a>. This package allows us to query Druid from within the R environment.</li>
 </ul>
 
 <h2 id="extracting-the-streamflow-data">Extracting the Streamflow Data</h2>
@@ -166,7 +166,7 @@
 <li>staid, or site identification number, which is entered as a string due to the fact that some IDs have leading 0s. This value was obtained from the interactive map discussed above.</li>
 <li>code, which specifies the type of sensor data we&#39;re interested in (if available). Our chosen code specifies measurement of discharge, in cubic feet per second. You can learn about codes at the <a href="http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes">USGS Water Resources site</a>.</li>
 <li>stat, which specifies the type of statistic we&#39;re looking for&mdash;in this case, the mean daily flow (mean is the default statistic). The USGS provides <a href="http://help.waterdata.usgs.gov/codes-and-parameters">a page summarizing various types of codes and parameters</a>.</li>
-<li>start and end dates. </li>
+<li>start and end dates.</li>
 </ul>
 
 <p>The information on the specific site and sensor should provide information on the type of data available and the start-end dates for the full historical record.</p>
@@ -202,7 +202,7 @@
 <div class="highlight"><pre><code class="language-r" data-lang="r"><span></span>write.table<span class="p">(</span>napa_flow_subset<span class="p">,</span> file<span class="o">=</span><span class="s">&quot;~/napa-flow.tsv&quot;</span><span class="p">,</span> sep<span class="o">=</span><span class="s">&quot;\t&quot;</span><span class="p">,</span> col.names <span class="o">=</span> <span class="bp">F</span><span class="p">,</span> row.names <span class="o">=</span> <span class="bp">F</span [...]
 </code></pre></div>
 <p>And here&#39;s our file:</p>
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ head ~/napa-flow.tsv 
+<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ head ~/napa-flow.tsv
 <span class="s2">&quot;11458000&quot;</span>  <span class="m">90</span>  <span class="m">1963</span>-01-01
 <span class="s2">&quot;11458000&quot;</span>  <span class="m">87</span>  <span class="m">1963</span>-01-02
 <span class="s2">&quot;11458000&quot;</span>  <span class="m">85</span>  <span class="m">1963</span>-01-03
@@ -220,7 +220,7 @@
 
 <h3 id="configure-the-indexing-task">Configure the Indexing Task</h3>
 
-<p>Druid has an indexing service that can load data. Since there&#39;s a relatively small amount of data to ingest, we&#39;re going to use the <a href="http://druid.io/docs/latest/Batch-ingestion.html">basic Druid indexing service</a> to ingest it. (Another option to ingest data uses a Hadoop cluster and is set up in a similar way, but that is more than we need for this job.) We must create a task (in JSON format) that specifies the work the indexing service will do:</p>
+<p>Druid has an indexing service that can load data. Since there&#39;s a relatively small amount of data to ingest, we&#39;re going to use the <a href="/docs/latest/Batch-ingestion.html">basic Druid indexing service</a> to ingest it. (Another option to ingest data uses a Hadoop cluster and is set up in a similar way, but that is more than we need for this job.) We must create a task (in JSON format) that specifies the work the indexing service will do:</p>
 <div class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="p">{</span>
   <span class="nt">&quot;type&quot;</span> <span class="p">:</span> <span class="s2">&quot;index&quot;</span><span class="p">,</span>
   <span class="nt">&quot;dataSource&quot;</span> <span class="p">:</span> <span class="s2">&quot;usgs&quot;</span><span class="p">,</span>
@@ -257,9 +257,9 @@
 <p>The taks is saved to a file, <code>usgs_index_task.json</code>. Note a few things about this task:</p>
 
 <ul>
-<li><p>granularitySpec sets <a href="http://druid.io/docs/latest/Concepts-and-Terminology.html">segment</a> granularity to MONTH, rather than using the default DAY, even though each row of our data is a daily reading. We do this to avoid having Druid create a segment per row of data. That&#39;s a lot of extra work (note the interval is &quot;1963-01-01/2013-12-31&quot;), and we simply don&#39;t need that much granularity to make sense of this data for a broad view. Setting the granularit [...]
+<li><p>granularitySpec sets <a href="/docs/latest/Concepts-and-Terminology.html">segment</a> granularity to MONTH, rather than using the default DAY, even though each row of our data is a daily reading. We do this to avoid having Druid create a segment per row of data. That&#39;s a lot of extra work (note the interval is &quot;1963-01-01/2013-12-31&quot;), and we simply don&#39;t need that much granularity to make sense of this data for a broad view. Setting the granularity to MONTH caus [...]
 
-<p>A different granularity setting for the data itself (<a href="http://druid.io/docs/latest/Tasks.html">indexGranularity</a>) controls how the data is rolled up before it is chunked into segments. This granularity, which defaults to &quot;MINUTE&quot;, won&#39;t affect our data, which consists of daily values.</p></li>
+<p>A different granularity setting for the data itself (<a href="/docs/latest/Tasks.html">indexGranularity</a>) controls how the data is rolled up before it is chunked into segments. This granularity, which defaults to &quot;MINUTE&quot;, won&#39;t affect our data, which consists of daily values.</p></li>
 <li><p>We specify aggregators that Druid will use as <em>metrics</em> to summarize the data. &quot;count&quot; is a built-in metric that counts the raw number of rows on ingestion, and the Druid rows (after rollups) after processing. We&#39;ve added a metric to summarize &quot;val&quot; from our water data.</p></li>
 <li><p>The firehose section specifies out data source, which in this case is a file. If our data existed in multiple files, we could have set &quot;filter&quot; to &quot;*.tsv&quot;.</p></li>
 <li><p>We have to specify the timestamp column so Druid knows.</p></li>
@@ -268,7 +268,7 @@
 
 <h2 id="start-a-druid-cluster-and-post-the-task">Start a Druid Cluster and Post the Task</h2>
 
-<p>Before submitting this task, we must start a small Druid cluster consisting of the indexing service, a Coordinator node, and a Historical node. Instructions on how to set up and start a Druid cluster are in the <a href="http://druid.io/docs/latest/Tutorial:-Loading-Your-Data-Part-1.html">Druid documentation</a>.</p>
+<p>Before submitting this task, we must start a small Druid cluster consisting of the indexing service, a Coordinator node, and a Historical node. Instructions on how to set up and start a Druid cluster are in the <a href="/docs/latest/Tutorial:-Loading-Your-Data-Part-1.html">Druid documentation</a>.</p>
 
 <p>Once the cluster is ready, the task is submitted to the indexer&#39;s REST service (showing the relative path to the task file):</p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ curl -X <span class="s1">&#39;POST&#39;</span> -H <span class="s1">&#39;Content-Type:application/json&#39;</span> -d @examples/usgs/usgs_index_task.json localhost:8087/druid/indexer/v1/task
@@ -285,7 +285,7 @@
 
 <p>We can also verify the data by querying Druid. Here&#39;s a simple time-boundary query:</p>
 <div class="highlight"><pre><code class="language-json" data-lang="json"><span></span><span class="p">{</span>
-    <span class="nt">&quot;queryType&quot;</span><span class="p">:</span> <span class="s2">&quot;timeBoundary&quot;</span><span class="p">,</span> 
+    <span class="nt">&quot;queryType&quot;</span><span class="p">:</span> <span class="s2">&quot;timeBoundary&quot;</span><span class="p">,</span>
     <span class="nt">&quot;dataSource&quot;</span><span class="p">:</span> <span class="s2">&quot;usgs&quot;</span>
 <span class="p">}</span>
 </code></pre></div>
@@ -301,7 +301,7 @@
   <span class="p">}</span>
 <span class="p">}</span> <span class="p">]</span>
 </code></pre></div>
-<p>You can learn about submitting more complex queries in the <a href="http://druid.io/docs/latest/Tutorial:-All-About-Queries.html">Druid documentation</a>.</p>
+<p>You can learn about submitting more complex queries in the <a href="/docs/latest/Tutorial:-All-About-Queries.html">Druid documentation</a>.</p>
 
 <h2 id="what-to-try-next-something-more-akin-to-a-production-system">What to Try Next: Something More Akin to a Production System</h2>
 
diff --git a/blog/2014/04/15/intro-to-pydruid.html b/blog/2014/04/15/intro-to-pydruid.html
index 6962cef..abf8db8 100644
--- a/blog/2014/04/15/intro-to-pydruid.html
+++ b/blog/2014/04/15/intro-to-pydruid.html
@@ -134,7 +134,7 @@
         <h1>Introduction to pydruid</h1>
         <p class="text-muted">by <span class="author text-uppercase">Igal Levy</span> · April 15, 2014</p>
 
-        <p>We&#39;ve already written about pairing <a href="http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html">R with RDruid</a>, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the <a href="http://www.scipy.org/stackspec.html">SciPy Stack</a>. To pair SciPy&#39;s analytic power with the advantages of querying time-series data in Druid, we created the pydruid connector. This allows Python users to query Druid&mdash [...]
+        <p>We&#39;ve already written about pairing <a href="/blog/2014/02/03/rdruid-and-twitterstream.html">R with RDruid</a>, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the <a href="http://www.scipy.org/stackspec.html">SciPy Stack</a>. To pair SciPy&#39;s analytic power with the advantages of querying time-series data in Druid, we created the pydruid connector. This allows Python users to query Druid&mdash;and export the [...]
 
 <h2 id="getting-started">Getting Started</h2>
 
@@ -150,7 +150,7 @@
 
 <h2 id="run-the-druid-wikipedia-example">Run the Druid Wikipedia Example</h2>
 
-<p><a href="http://druid.io/downloads.html">Download Druid</a> and unpack Druid. If you are not familiar with Druid, see this <a href="http://druid.io/docs/latest/Tutorial:-A-First-Look-at-Druid.html">introductory tutorial</a>.</p>
+<p><a href="/downloads.html">Download Druid</a> and unpack Druid. If you are not familiar with Druid, see this <a href="/docs/latest/Tutorial:-A-First-Look-at-Druid.html">introductory tutorial</a>.</p>
 
 <p>From the Druid home directory, start the Druid Realtime node:</p>
 <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="nv">$DRUID_HOME</span>/run_example_server.sh
diff --git a/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack.html b/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack.html
index 5f04acd..cd2f08c 100644
--- a/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack.html
+++ b/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack.html
@@ -146,7 +146,7 @@ organized a panel that same night to continue the conversation.</p>
 
 <p>The discussion featured key contributors to several open source technologies:
 Andy Feng (<a href="http://storm.incubator.apache.org/">Storm</a>), Eric Tschetter
-(<a href="http://druid.io/">Druid</a>), Jun Rao (<a href="http://kafka.apache.org/">Kafka</a>), and
+(<a href="/">Druid</a>), Jun Rao (<a href="http://kafka.apache.org/">Kafka</a>), and
 Matei Zaharia (<a href="http://spark.apache.org/">Spark</a>). It was moderated by
 VentureBeat Staff Writer Jordan Novet and hosted by Zack Bogue of the <a href="http://www.foundersden.com/">Founders
 Den</a> and <a href="http://dcvc.com/">Data Collective</a>.</p>
diff --git a/blog/2015/11/03/seeking-new-committers.html b/blog/2015/11/03/seeking-new-committers.html
index a51d044..6fa7aeb 100644
--- a/blog/2015/11/03/seeking-new-committers.html
+++ b/blog/2015/11/03/seeking-new-committers.html
@@ -150,7 +150,7 @@ committers from diverse organizations. If you are a Druid user who is
 passionate about druid and wants to get involved more, then please send your
 pull requests to improve documentation, bug fixes, tests and proposed/accepted
 features. Also, feel free to let us know about your interest by contacting
-<a href="http://druid.io/community/">existing committers</a> or post in the <a href="https://groups.google.com/forum/#!forum/druid-development">development
+<a href="/community/">existing committers</a> or post in the <a href="https://groups.google.com/forum/#!forum/druid-development">development
 list</a>.</p>
 
 <p>To get started developing on Druid, we’ve created and tagged a set of <a href="https://github.com/apache/incubator-druid/labels/Difficulty%20-%20Easy">beginner
diff --git a/blog/2016/06/28/druid-0-9-1.html b/blog/2016/06/28/druid-0-9-1.html
index cbe09ff..dbb8a9d 100644
--- a/blog/2016/06/28/druid-0-9-1.html
+++ b/blog/2016/06/28/druid-0-9-1.html
@@ -141,7 +141,7 @@ over the previous 0.9.0 release, from over 30 contributors. Major new features i
 experimental Kafka indexing service to support exactly-once consumption from Apache Kafka, support
 for cluster-wide query-time lookups (QTL), and an improved segment balancing algorithm.</p>
 
-<p>You can download the release here: <a href="http://druid.io/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here: <a href="https://github.com/apache/incubator-druid/releases/druid-0.9.1.1">https://github.com/apache/incubator-druid/releases/druid-0.9.1.1</a></p>
 
diff --git a/blog/2016/12/01/druid-0-9-2.html b/blog/2016/12/01/druid-0-9-2.html
index dc9f513..ed15196 100644
--- a/blog/2016/12/01/druid-0-9-2.html
+++ b/blog/2016/12/01/druid-0-9-2.html
@@ -143,7 +143,7 @@ performance improvements for HyperUnique and DataSketches, a query cache impleme
 Caffeine, a new lookup extension exposing fine grained caching strategies, support for reading ORC
 files, and new aggregators for variance and standard deviation.</p>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.9.2">https://github.com/apache/incubator-druid/releases/druid-0.9.2</a></p>
diff --git a/blog/2017/04/18/druid-0-10-0.html b/blog/2017/04/18/druid-0-10-0.html
index 747f5d0..bf09afe 100644
--- a/blog/2017/04/18/druid-0-10-0.html
+++ b/blog/2017/04/18/druid-0-10-0.html
@@ -143,7 +143,7 @@ support, a revamp of the &quot;index&quot; task, a new &quot;like&quot; filter,
 ability to run the coordinator and overlord as a single service, better
 performing defaults, and eight new extensions.</p>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.10.0">https://github.com/apache/incubator-druid/releases/druid-0.10.0</a></p>
diff --git a/blog/2017/08/22/druid-0-10-1.html b/blog/2017/08/22/druid-0-10-1.html
index ae0aabe..bbef287 100644
--- a/blog/2017/08/22/druid-0-10-1.html
+++ b/blog/2017/08/22/druid-0-10-1.html
@@ -152,7 +152,7 @@
 <li>Various improvements to Druid SQL</li>
 </ul>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.10.1">https://github.com/apache/incubator-druid/releases/druid-0.10.1</a></p>
diff --git a/blog/2017/12/04/druid-0-11-0.html b/blog/2017/12/04/druid-0-11-0.html
index a044e02..2493170 100644
--- a/blog/2017/12/04/druid-0-11-0.html
+++ b/blog/2017/12/04/druid-0-11-0.html
@@ -151,7 +151,7 @@
 <li>Various improvements to Druid SQL</li>
 </ul>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.11.0">https://github.com/apache/incubator-druid/releases/druid-0.11.0</a></p>
diff --git a/blog/2018/03/08/druid-0-12-0.html b/blog/2018/03/08/druid-0-12-0.html
index 6f39bfc..aff6a6f 100644
--- a/blog/2018/03/08/druid-0-12-0.html
+++ b/blog/2018/03/08/druid-0-12-0.html
@@ -153,7 +153,7 @@
 <li>Various improvements to Druid SQL</li>
 </ul>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.12.0">https://github.com/apache/incubator-druid/releases/druid-0.12.0</a></p>
diff --git a/blog/2018/06/08/druid-0-12-1.html b/blog/2018/06/08/druid-0-12-1.html
index 4015a3b..cf2fea6 100644
--- a/blog/2018/06/08/druid-0-12-1.html
+++ b/blog/2018/06/08/druid-0-12-1.html
@@ -150,7 +150,7 @@
 <li>Fix a bug of different segments of the same segment id in Kafka indexing</li>
 </ul>
 
-<p>You can download the release here: <a href="/downloads.html">http://druid.io/downloads.html</a></p>
+<p>You can download the release here: <a href="/downloads.html">/downloads.html</a></p>
 
 <p>The full release notes are here:
 <a href="https://github.com/apache/incubator-druid/releases/druid-0.12.1">https://github.com/apache/incubator-druid/releases/druid-0.12.1</a></p>
diff --git a/blog/index.html b/blog/index.html
index 32100f9..936ecea 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -254,7 +254,7 @@ organized a panel that same night to continue the conversation.</p>
       <div class="blog-listing">
         <h2><a href="/blog/2014/04/15/intro-to-pydruid.html">Introduction to pydruid</a></h2>
         <p class="author text-uppercase text-muted">Igal Levy · April 15, 2014</p>
-        <p><p>We&#39;ve already written about pairing <a href="http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html">R with RDruid</a>, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the <a href="http://www.scipy.org/stackspec.html">SciPy Stack</a>. To pair SciPy&#39;s analytic power with the advantages of querying time-series data in Druid, we created the pydruid connector. This allows Python users to query Druid&md [...]
+        <p><p>We&#39;ve already written about pairing <a href="/blog/2014/02/03/rdruid-and-twitterstream.html">R with RDruid</a>, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the <a href="http://www.scipy.org/stackspec.html">SciPy Stack</a>. To pair SciPy&#39;s analytic power with the advantages of querying time-series data in Druid, we created the pydruid connector. This allows Python users to query Druid&mdash;and export  [...]
 </p>
         <a class="btn btn-default btn-xs" href="/blog/2014/04/15/intro-to-pydruid.html">Read More</a>
       </div>
@@ -295,7 +295,7 @@ as unique users and device IDs with maximum performance and accuracy.</p>
       <div class="blog-listing">
         <h2><a href="/blog/2014/02/03/rdruid-and-twitterstream.html">RDruid and Twitterstream</a></h2>
         <p class="author text-uppercase text-muted">Igal Levy · February  3, 2014</p>
-        <p><p>What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&#39;s what combining the prowess of a <a href="http://druid.io">Druid database</a> with the power of <a href="http://www.r-project.org">R</a> can do.</p>
+        <p><p>What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&#39;s what combining the prowess of a <a href="">Druid database</a> with the power of <a href="http://www.r-project.org">R</a> can do.</p>
 </p>
         <a class="btn btn-default btn-xs" href="/blog/2014/02/03/rdruid-and-twitterstream.html">Read More</a>
       </div>
@@ -475,7 +475,7 @@ engagement and growth, via metrics such as “daily active users.”</p>
       <div class="blog-listing">
         <h2><a href="/blog/2011/05/20/druid-part-deux.html">Druid, Part Deux: Three Principles for Fast, Distributed OLAP</a></h2>
         <p class="author text-uppercase text-muted">Eric Tschetter · May 20, 2011</p>
-        <p><p>In a <a href="http://druid.io/blog/2011/04/30/introducing-druid.html">previous blog
+        <p><p>In a <a href="/blog/2011/04/30/introducing-druid.html">previous blog
 post</a> we introduced the
 distributed indexing and query processing infrastructure we call Druid. In that
 post, we characterized the performance and scaling challenges that motivated us
diff --git a/community/index.html b/community/index.html
index db2ea42..ec32ddb 100644
--- a/community/index.html
+++ b/community/index.html
@@ -129,14 +129,10 @@
 
 <p>Most discussion about Druid happens over email and GitHub.</p>
 
-<p>The Druid community is in the process of migrating to Apache by way of the Apache Incubator. As we proceed
-along this path, our site will move from http://druid.io/ to https://druid.apache.org/, and our mailing lists
-and Git repositories will be migrated as well.</p>
-
 <ul>
 <li><strong>User mailing list</strong> <a href="https://groups.google.com/forum/#!forum/druid-user">druid-user@googlegroups.com</a> for general discussion</li>
 <li><strong>Development mailing list</strong> <a href="https://lists.apache.org/list.html?dev@druid.apache.org">dev@druid.apache.org</a> for discussion about project development</li>
-<li><strong>GitHub</strong> <a href="https://github.com/apache/druid">druid-io/druid</a> issues and pull requests (watch to subscribe)</li>
+<li><strong>GitHub</strong> <a href="https://github.com/apache/druid">apache/druid</a> issues and pull requests (watch to subscribe)</li>
 <li><strong>Meetups</strong> <a href="https://www.meetup.com/topics/apache-druid/">Druid meetups</a> for different meetup groups around the world.</li>
 <li><strong>IRC</strong> <code>#druid-dev</code> on irc.freenode.net</li>
 </ul>
diff --git a/docs/0.14.0-incubating/ingestion/hadoop-vs-native-batch.html b/docs/0.14.0-incubating/ingestion/hadoop-vs-native-batch.html
index 2b96402..6792f80 100644
--- a/docs/0.14.0-incubating/ingestion/hadoop-vs-native-batch.html
+++ b/docs/0.14.0-incubating/ingestion/hadoop-vs-native-batch.html
@@ -182,14 +182,14 @@ ingestion method.</p>
 <td>No dependency</td>
 </tr>
 <tr>
-<td>Supported <a href="http://druid.io/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
+<td>Supported <a href="/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
 <td>Perfect rollup</td>
 <td>Best-effort rollup</td>
 <td>Both perfect and best-effort rollup</td>
 </tr>
 <tr>
 <td>Supported partitioning methods</td>
-<td><a href="http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
+<td><a href="/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
 <td>N/A</td>
 <td>Hash-based partitioning (when <code>forceGuaranteedRollup</code> = true)</td>
 </tr>
diff --git a/docs/0.14.1-incubating/ingestion/hadoop-vs-native-batch.html b/docs/0.14.1-incubating/ingestion/hadoop-vs-native-batch.html
index b406c2b..6fcf388 100644
--- a/docs/0.14.1-incubating/ingestion/hadoop-vs-native-batch.html
+++ b/docs/0.14.1-incubating/ingestion/hadoop-vs-native-batch.html
@@ -182,14 +182,14 @@ ingestion method.</p>
 <td>No dependency</td>
 </tr>
 <tr>
-<td>Supported <a href="http://druid.io/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
+<td>Supported <a href="/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
 <td>Perfect rollup</td>
 <td>Best-effort rollup</td>
 <td>Both perfect and best-effort rollup</td>
 </tr>
 <tr>
 <td>Supported partitioning methods</td>
-<td><a href="http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
+<td><a href="/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
 <td>N/A</td>
 <td>Hash-based partitioning (when <code>forceGuaranteedRollup</code> = true)</td>
 </tr>
diff --git a/docs/0.14.2-incubating/ingestion/hadoop-vs-native-batch.html b/docs/0.14.2-incubating/ingestion/hadoop-vs-native-batch.html
index 4784aa9..9aae132 100644
--- a/docs/0.14.2-incubating/ingestion/hadoop-vs-native-batch.html
+++ b/docs/0.14.2-incubating/ingestion/hadoop-vs-native-batch.html
@@ -180,14 +180,14 @@ ingestion method.</p>
 <td>No dependency</td>
 </tr>
 <tr>
-<td>Supported <a href="http://druid.io/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
+<td>Supported <a href="/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
 <td>Perfect rollup</td>
 <td>Best-effort rollup</td>
 <td>Both perfect and best-effort rollup</td>
 </tr>
 <tr>
 <td>Supported partitioning methods</td>
-<td><a href="http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
+<td><a href="/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
 <td>N/A</td>
 <td>Hash-based partitioning (when <code>forceGuaranteedRollup</code> = true)</td>
 </tr>
diff --git a/docs/latest/ingestion/hadoop-vs-native-batch.html b/docs/latest/ingestion/hadoop-vs-native-batch.html
index d535db9..861554f 100644
--- a/docs/latest/ingestion/hadoop-vs-native-batch.html
+++ b/docs/latest/ingestion/hadoop-vs-native-batch.html
@@ -180,14 +180,14 @@ ingestion method.</p>
 <td>No dependency</td>
 </tr>
 <tr>
-<td>Supported <a href="http://druid.io/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
+<td>Supported <a href="/docs/latest/ingestion/index.html#roll-up-modes">rollup modes</a></td>
 <td>Perfect rollup</td>
 <td>Best-effort rollup</td>
 <td>Both perfect and best-effort rollup</td>
 </tr>
 <tr>
 <td>Supported partitioning methods</td>
-<td><a href="http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
+<td><a href="/docs/latest/ingestion/hadoop.html#partitioning-specification">Both Hash-based and range partitioning</a></td>
 <td>N/A</td>
 <td>Hash-based partitioning (when <code>forceGuaranteedRollup</code> = true)</td>
 </tr>
diff --git a/downloads.html b/downloads.html
index ae27b9d..119349e 100644
--- a/downloads.html
+++ b/downloads.html
@@ -8,7 +8,7 @@
 <meta name="author" content="Apache Software Foundation">
 
 <title>Druid | Download</title>
-<link rel="canonical" href="http://druid.io/downloads.html" />
+<link rel="canonical" href="http://apache.druid.com/downloads.html" />
 <link rel="alternate" type="application/atom+xml" href="/feed">
 <link rel="shortcut icon" href="/img/favicon.png">
 
diff --git a/feed/index.xml b/feed/index.xml
deleted file mode 100644
index dfa6877..0000000
--- a/feed/index.xml
+++ /dev/null
@@ -1,2023 +0,0 @@
-<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom">
-	
-  <title type="text" xml:lang="en">Druid</title>
-  <subtitle>Real²time Exploratory Analytics on Large Datasets</subtitle>
-  <link type="application/atom+xml" href="http://druid.io/feed/" rel="self"/>
-  <link type="text/html" href="http://druid.io/" rel="alternate"/>
-	<updated>2019-06-12T12:31:12-07:00</updated>
-        <id>http://druid.io/</id>
-	
-	
-	<entry>
-		<title>Druid 0.12.1 release</title>
-		<link href="http://druid.io/blog/2018/06/08/druid-0-12-1.html"/>
-		<updated>2018-06-08T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2018/06/08/druid-0-12-1</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.12.1&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.12.1&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.12.1 contains stability improvements and bug fixes from 10 contributors.&lt;/p&gt;
-
-&lt;p&gt;Major improvements include:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;Large performance improvements for coordinator&amp;#39;s loadstatus API&lt;/li&gt;
-&lt;li&gt;More memory limiting for HttpPostEmitter&lt;/li&gt;
-&lt;li&gt;Fix several issues of Kerberos Authentication&lt;/li&gt;
-&lt;li&gt;Fix SQLMetadataSegmentManager to allow successive start and stop&lt;/li&gt;
-&lt;li&gt;Fix default interval handling in SegmentMetadataQuery&lt;/li&gt;
-&lt;li&gt;Support HTTP OPTIONS request&lt;/li&gt;
-&lt;li&gt;Fix a bug of different segments of the same segment id in Kafka indexing&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.12.1&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.12.1&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;Thanks to everyone who contributed to this release!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.12.0 release</title>
-		<link href="http://druid.io/blog/2018/03/08/druid-0-12-0.html"/>
-		<updated>2018-03-08T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2018/03/08/druid-0-12-0</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.12.0&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.12.0&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.12.0 contains over a hundred performance improvements, stability improvements, and bug fixes from almost 40 contributors. This release adds major improvements to the Kafka indexing service.&lt;/p&gt;
-
-&lt;p&gt;Major new features include:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;Kafka indexing service incremental publishing&lt;/li&gt;
-&lt;li&gt;Prioritized task locking&lt;/li&gt;
-&lt;li&gt;Improved automatic segment management&lt;/li&gt;
-&lt;li&gt;Test stats post-aggregators&lt;/li&gt;
-&lt;li&gt;Numeric quantiles sketch aggregator&lt;/li&gt;
-&lt;li&gt;Basic auth extension&lt;/li&gt;
-&lt;li&gt;Query request queuing improvements&lt;/li&gt;
-&lt;li&gt;Parse batch support&lt;/li&gt;
-&lt;li&gt;Various performance improvements&lt;/li&gt;
-&lt;li&gt;Various improvements to Druid SQL&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.12.0&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.12.0&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;Thanks to everyone who contributed to this release!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.11.0 release</title>
-		<link href="http://druid.io/blog/2017/12/04/druid-0-11-0.html"/>
-		<updated>2017-12-04T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2017/12/04/druid-0-11-0</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.11.0&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.11.0&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.11.0 contains over a hundred performance improvements, stability improvements, and bug fixes from almost 40 contributors. This release adds two major security features, TLS support and extension points for authentication and authorization.&lt;/p&gt;
-
-&lt;p&gt;Major new features include:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;TLS (a.k.a. SSL) support&lt;/li&gt;
-&lt;li&gt;Extension points for authentication and authorization&lt;/li&gt;
-&lt;li&gt;Double columns support&lt;/li&gt;
-&lt;li&gt;cachingCost Balancer Strategy&lt;/li&gt;
-&lt;li&gt;jq expression support in JSON parser&lt;/li&gt;
-&lt;li&gt;Redis cache extension&lt;/li&gt;
-&lt;li&gt;GroupBy performance improvements&lt;/li&gt;
-&lt;li&gt;Various improvements to Druid SQL&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.11.0&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.11.0&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;Thanks to everyone who contributed to this release!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.10.1 release</title>
-		<link href="http://druid.io/blog/2017/08/22/druid-0-10-1.html"/>
-		<updated>2017-08-22T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2017/08/22/druid-0-10-1</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.10.1&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.10.1&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.10.1 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;Large performance improvements and additional query metrics for TopN queries&lt;/li&gt;
-&lt;li&gt;The ability to push down limit clauses for GroupBy queries&lt;/li&gt;
-&lt;li&gt;More accurate query timeout handling&lt;/li&gt;
-&lt;li&gt;Hadoop indexing support for the Amazon S3A filesystem&lt;/li&gt;
-&lt;li&gt;Support for ingesting Protobuf data&lt;/li&gt;
-&lt;li&gt;A new Firehose that can read input via HTTP&lt;/li&gt;
-&lt;li&gt;Improved disk space management when indexing from cloud stores&lt;/li&gt;
-&lt;li&gt;Various improvements to coordinator lookups management&lt;/li&gt;
-&lt;li&gt;A new Kafka metrics emitter&lt;/li&gt;
-&lt;li&gt;A new dimension comparison filter&lt;/li&gt;
-&lt;li&gt;Various improvements to Druid SQL&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.10.1&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.10.1&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;Thanks to everyone who contributed to this release!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.10.0 release</title>
-		<link href="http://druid.io/blog/2017/04/18/druid-0-10-0.html"/>
-		<updated>2017-04-18T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2017/04/18/druid-0-10-0</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.10.0&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.10.0&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.10.0 contains hundreds of performance improvements, stability
-improvements, and bug fixes from over 40 contributors. Major new features
-include a built-in SQL layer, numeric dimensions, Kerberos authentication
-support, a revamp of the &amp;quot;index&amp;quot; task, a new &amp;quot;like&amp;quot; filter, large columns,
-ability to run the coordinator and overlord as a single service, better
-performing defaults, and eight new extensions.&lt;/p&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.10.0&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.10.0&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;Thanks to everyone who contributed to this release!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.9.2 release</title>
-		<link href="http://druid.io/blog/2016/12/01/druid-0-9-2.html"/>
-		<updated>2016-12-01T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2016/12/01/druid-0-9-2</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.9.2&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.9.2&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.9.2 contains hundreds of performance improvements, stability improvements, and bug fixes
-from over 30 contributors. Major new features include ability to disable rollup at ingestion time, a
-new groupBy engine, ability to filter on longs, new encoding options for long-typed columns,
-performance improvements for HyperUnique and DataSketches, a query cache implementation based on
-Caffeine, a new lookup extension exposing fine grained caching strategies, support for reading ORC
-files, and new aggregators for variance and standard deviation.&lt;/p&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here:
-&lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.9.2&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.9.2&lt;/a&gt;&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid 0.9.1.1 release</title>
-		<link href="http://druid.io/blog/2016/06/28/druid-0-9-1.html"/>
-		<updated>2016-06-28T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2016/06/28/druid-0-9-1</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.9.1.1&lt;/a&gt;!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;re excited to announce the general availability of our latest release, &lt;a href=&quot;/downloads.html&quot;&gt;Druid 0.9.1.1&lt;/a&gt;!&lt;/p&gt;
-
-&lt;p&gt;Druid 0.9.1.1 contains hundreds of performance improvements, stability improvements, and bug fixes
-over the previous 0.9.0 release, from over 30 contributors. Major new features include an
-experimental Kafka indexing service to support exactly-once consumption from Apache Kafka, support
-for cluster-wide query-time lookups (QTL), and an improved segment balancing algorithm.&lt;/p&gt;
-
-&lt;p&gt;You can download the release here: &lt;a href=&quot;http://druid.io/downloads.html&quot;&gt;http://druid.io/downloads.html&lt;/a&gt;&lt;/p&gt;
-
-&lt;p&gt;The full release notes are here: &lt;a href=&quot;https://github.com/apache/incubator-druid/releases/druid-0.9.1.1&quot;&gt;https://github.com/apache/incubator-druid/releases/druid-0.9.1.1&lt;/a&gt;&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Announcing New Committers</title>
-		<link href="http://druid.io/blog/2016/01/06/announcing-new-committers.html"/>
-		<updated>2016-01-06T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2016/01/06/announcing-new-committers</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;Happy New Year everyone! We’re excited to announce that we’ve added 8 new
-committers to Druid. These committers have been making sustained contributions
-to the project, and we look forward to working with them to continue to develop
-the project in 2016.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;Happy New Year everyone! We’re excited to announce that we’ve added 8 new
-committers to Druid. These committers have been making sustained contributions
-to the project, and we look forward to working with them to continue to develop
-the project in 2016.&lt;/p&gt;
-
-&lt;p&gt;Please welcome:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;Bingkun Guo - Yahoo&lt;/li&gt;
-&lt;li&gt;David Lim - Imply&lt;/li&gt;
-&lt;li&gt;Jonathan Wei - Imply&lt;/li&gt;
-&lt;li&gt;Lijin Bin - Alibaba&lt;/li&gt;
-&lt;li&gt;Mohamed Slim Bouguerra - Yahoo&lt;/li&gt;
-&lt;li&gt;Navis Ryu - SKTelecom&lt;/li&gt;
-&lt;li&gt;Parag Jain - Yahoo&lt;/li&gt;
-&lt;li&gt;Robin Sahner - Yahoo&lt;/li&gt;
-&lt;/ul&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Seeking New Committers</title>
-		<link href="http://druid.io/blog/2015/11/03/seeking-new-committers.html"/>
-		<updated>2015-11-03T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2015/11/03/seeking-new-committers</id>
-                <author><name>Druid PMC</name></author>
-                <summary type="html">&lt;p&gt;We are excited to announce that we have formalized the governance of Druid to
-be a community led project! Druid has been informally community led for some
-time, with committers from various organizations regularly adding new features,
-improving performance, and making things easier to use. Project committers vote
-on proposals, review/write pull requests, provide community support, and help
-guide the technical direction of the project. You can find more information on
-the project’s goals and governance on our recently updated &lt;a href=&quot;http://druid.io/community/&quot;&gt;Druid webpage&lt;/a&gt;. Druid depends upon its vibrant community of users
-for their feedback with respect to features, documentation and very helpful bug
-reports.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We are excited to announce that we have formalized the governance of Druid to
-be a community led project! Druid has been informally community led for some
-time, with committers from various organizations regularly adding new features,
-improving performance, and making things easier to use. Project committers vote
-on proposals, review/write pull requests, provide community support, and help
-guide the technical direction of the project. You can find more information on
-the project’s goals and governance on our recently updated &lt;a href=&quot;http://druid.io/community/&quot;&gt;Druid webpage&lt;/a&gt;. Druid depends upon its vibrant community of users
-for their feedback with respect to features, documentation and very helpful bug
-reports.&lt;/p&gt;
-
-&lt;p&gt;To ensure that best and unbiased interests of the project are always
-represented, and help Druid grow, we would like to have an even bigger pool of
-committers from diverse organizations. If you are a Druid user who is
-passionate about druid and wants to get involved more, then please send your
-pull requests to improve documentation, bug fixes, tests and proposed/accepted
-features. Also, feel free to let us know about your interest by contacting
-&lt;a href=&quot;http://druid.io/community/&quot;&gt;existing committers&lt;/a&gt; or post in the &lt;a href=&quot;https://groups.google.com/forum/#!forum/druid-development&quot;&gt;development
-list&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;To get started developing on Druid, we’ve created and tagged a set of &lt;a href=&quot;https://github.com/apache/incubator-druid/labels/Difficulty%20-%20Easy&quot;&gt;beginner
-friendly
-issues&lt;/a&gt; on
-Github. Please use the &lt;a href=&quot;https://groups.google.com/forum/#!forum/druid-development&quot;&gt;development
-list&lt;/a&gt; to discuss the
-best ways to get started on any particular issue. Of course, feel free to
-create your own issues and submit proposals for anything you’d like to see in
-Druid.&lt;/p&gt;
-
-&lt;p&gt;We look forward to adding new committers on Druid and working together to make
-the project great!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Towards a Community Led Druid</title>
-		<link href="http://druid.io/blog/2015/02/20/towards-a-community-led-druid.html"/>
-		<updated>2015-02-20T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2015/02/20/towards-a-community-led-druid</id>
-                <author><name>Fangjin Yang, Xavier Léauté, and Eric Tschetter</name></author>
-                <summary type="html">&lt;p&gt;We are very happy to announce that Druid has changed its license to Apache 2.0.
-We believe this is a change the community will welcome. As engineers, we love
-to see the things we make get used and attempt to provide value to the broader
-open source world that we have benefitted from for so long. By switching to the
-Apache license, we believe this change will better promote the growth of the Druid
-community.  We hope to send a clear message that we
-are all equal participants in the Druid community, a sentiment that is very
-important to us.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We are very happy to announce that Druid has changed its license to Apache 2.0.
-We believe this is a change the community will welcome. As engineers, we love
-to see the things we make get used and attempt to provide value to the broader
-open source world that we have benefitted from for so long. By switching to the
-Apache license, we believe this change will better promote the growth of the Druid
-community.  We hope to send a clear message that we
-are all equal participants in the Druid community, a sentiment that is very
-important to us.&lt;/p&gt;
-
-&lt;p&gt;In addition to the license change, we are going to work towards a community led
-governance model for Druid. We hope to establish a community led committee and
-add committers from different organizations as soon as possible. If you are
-interested in more actively contributing to Druid, please let us know! We
-strongly believe that Druid should be as open as possible and we hope that this
-change will enable many different organizations to help guide the roadmap and
-direction of the project. &lt;/p&gt;
-
-&lt;p&gt;Finally, we’d like to take this chance to thank the entire Druid community.
-Whether you ask questions, file bugs, start technical discussions, submit
-feedback on proposals or issues, contribute code or docs, or talk about Druid,
-you are an integral part of this project.  As much as we might like to think
-that the code makes the project, the people are always most important and we
-could not be where we are today without all of you.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Five Tips for a F’ing Great Logo</title>
-		<link href="http://druid.io/blog/2014/07/23/five-tips-for-a-f-ing-great-logo.html"/>
-		<updated>2014-07-23T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2014/07/23/five-tips-for-a-f-ing-great-logo</id>
-                <author><name>David Hertog &amp; Fangjin Yang</name></author>
-                <summary type="html">&lt;p&gt;Everyone wants a great logo, but it’s notoriously difficult work—prone to
-miscommunications, heated debates and countless revisions. Still, after three
-years we couldn’t put it off any longer. Druid needed a visual identity, so we
-partnered with the talented folks at &lt;a href=&quot;http://focuslabllc.com/&quot;&gt;Focus Lab&lt;/a&gt; for
-help.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;Everyone wants a great logo, but it’s notoriously difficult work—prone to
-miscommunications, heated debates and countless revisions. Still, after three
-years we couldn’t put it off any longer. Druid needed a visual identity, so we
-partnered with the talented folks at &lt;a href=&quot;http://focuslabllc.com/&quot;&gt;Focus Lab&lt;/a&gt; for
-help.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/2014-07-23-logo/image00.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;Our old logo (left) was...lacking. Much better now, right? &lt;/p&gt;
-
-&lt;p&gt;Despite our fears, we cranked this out with Focus in a speedy three week
-sprint. Not only was the process drama-free, it was actually fun. The goal of
-this post is to give you some insight into how we did it and to share a few
-things that helped us support them in doing great work.&lt;/p&gt;
-
-&lt;h2 id=&quot;1-we-started-on-the-same-page&quot;&gt;1. We started on the same page&lt;/h2&gt;
-
-&lt;p&gt;Before the kickoff, Focus asked us to fill out &lt;a href=&quot;https://docs.google.com/a/metamarkets.com/document/d/1AqEGLWeqTsuFiykOPDbwpptz9R40IIFPjn01OuG6e98/edit#heading=h.ffuked8l0n9w&quot;&gt;this
-questionnaire&lt;/a&gt;
-about our brand, our mission, and what we’re looking for in a logo. This
-exercise forced the Druid team into alignment about a variety of things,
-including our
-preferred style. We figured this out by asking each team member to pick a few
-logos they like and talk about why. Though we didn’t agree on everything, we
-did find common ground: logos that were simple, modern, and polished. Focus
-Lab took this and created a mood board (below), which served as a directional
-guide for the project’s aesthetic.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/2014-07-23-logo/image03.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;2-we-picked-a-central-theme-and-stuck-with-it&quot;&gt;2. We picked a central theme and stuck with it&lt;/h2&gt;
-
-&lt;p&gt;Logos (like companies) fail when they try to do too many things at once.  The
-best logos succinctly convey one theme. Apple: Knowledge, Google: Playfulness,
-Ferrari: Power. Druid is fast, scalable, built for analytics, and open source.
-All of these themes are important, but we agreed that speed was the &lt;em&gt;most&lt;/em&gt;
-important, so we asked Focus to prioritize it in their designs.  From there,
-they got to work brainstorming:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/2014-07-23-logo/image02.jpg&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;3-we-sought-internal-consensus-at-major-milestones&quot;&gt;3. We sought internal consensus at major milestones&lt;/h2&gt;
-
-&lt;p&gt;At the end of week one, Focus brought us two very different concepts. We all
-preferred the one below, but before saying so we polled a few folks internally.
-Doing this at key milestones was critical because it helped to give everyone a
-sense of ownership in the project and it ensured we weren’t walking down the
-wrong road. What could be worse than going through this whole thing only to
-learn that a major stakeholder hates what you’ve come up with?&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/2014-07-23-logo/image04.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;4-we-structured-our-feedback-around-objectives-not-solutions&quot;&gt;4. We structured our feedback around objectives, not solutions&lt;/h2&gt;
-
-&lt;p&gt;Thankfully, everyone liked the above concept. It evoked a speeding bullet
-displacing air, and as an added bonus, it resembled a capital “D” for Druid.
-That said, it didn’t feel fast enough, and we were concerned that the graphic
-mark was too wide to fit inside a square avatar, such as a Twitter profile
-photo. Our first instinct was to start suggesting design tweaks: “What if you
-reduce the thickness of the lines?,” “Would it look better if you removed a
-line?,” “What about tilting it at an angle?” Giving designers this kind of
-feedback is tempting but counterproductive, so we focused on articulating the
-problems, and let them handle the solutions.&lt;/p&gt;
-
-&lt;p&gt;Over three weeks of refining these ideas (and experimenting with a few out of
-    the box ones), we arrived at the final, Round 3 version pictured below.
-There’s no doubt that it’s the fastest, and best-looking of the bunch. &lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/2014-07-23-logo/image05.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;5-we-treated-our-designers-as-an-extension-of-our-team&quot;&gt;5. We treated our designers as an extension of our team&lt;/h2&gt;
-
-&lt;p&gt;There are lots of good designers out there, but great ones are rare. The
-difference between the two isn’t just design talent (though that’s important),
-it&amp;#39;s in knowing how to guide a client through the process. That said, the
-world’s best guide can’t help you find something if you don’t know what you’re
-looking for. We learned that coming to the table prepared, and then
-collaborating with Focus as if they were members of our team made the
-difference between success and meh.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Open Source Leaders Sound Off on the Rise of the Real-Time Data Stack</title>
-		<link href="http://druid.io/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack.html"/>
-		<updated>2014-05-07T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2014/05/07/open-source-leaders-sound-off-on-the-rise-of-the-real-time-data-stack</id>
-                <author><name>Fangjin Yang &amp; Gian Merlino</name></author>
-                <summary type="html">&lt;p&gt;In February we were honored to speak at the O’Reilly Strata conference about
-building a robust, flexible, and completely open source data analytics stack.
-If you couldn’t make it, you can watch the &lt;a href=&quot;https://www.youtube.com/watch?v=kJMYVpnW_AQ&quot;&gt;video
-here&lt;/a&gt;. Preparing for our talk got
-us thinking about all the brilliant folks working on similar problems, so we
-organized a panel that same night to continue the conversation.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;In February we were honored to speak at the O’Reilly Strata conference about
-building a robust, flexible, and completely open source data analytics stack.
-If you couldn’t make it, you can watch the &lt;a href=&quot;https://www.youtube.com/watch?v=kJMYVpnW_AQ&quot;&gt;video
-here&lt;/a&gt;. Preparing for our talk got
-us thinking about all the brilliant folks working on similar problems, so we
-organized a panel that same night to continue the conversation.&lt;/p&gt;
-
-&lt;p&gt;The discussion featured key contributors to several open source technologies:
-Andy Feng (&lt;a href=&quot;http://storm.incubator.apache.org/&quot;&gt;Storm&lt;/a&gt;), Eric Tschetter
-(&lt;a href=&quot;http://druid.io/&quot;&gt;Druid&lt;/a&gt;), Jun Rao (&lt;a href=&quot;http://kafka.apache.org/&quot;&gt;Kafka&lt;/a&gt;), and
-Matei Zaharia (&lt;a href=&quot;http://spark.apache.org/&quot;&gt;Spark&lt;/a&gt;). It was moderated by
-VentureBeat Staff Writer Jordan Novet and hosted by Zack Bogue of the &lt;a href=&quot;http://www.foundersden.com/&quot;&gt;Founders
-Den&lt;/a&gt; and &lt;a href=&quot;http://dcvc.com/&quot;&gt;Data Collective&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/img/oss-panel.png&quot; alt=&quot;Panelists discuss their projects&quot; title=&quot;OSS Panel&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;&lt;em&gt;From left to right: Jordan Novet, Andy Feng, Zack Bogue, Eric Tschetter, Jun Rao, Matei Zaharia. [Photo credit: Xavier Léauté]&lt;/em&gt;&lt;/p&gt;
-
-&lt;p&gt;To a packed house, Andy emphasized the importance of building a strong
-community around open source projects while Eric addressed big data uses cases
-and the challenges inherent in working with open source technologies. Jun
-shared his thoughts on the potential for a future generic data analytics stack
-and Matei spoke about the advantages of building a company using Spark and the
-benefits of “riding the Hadoop wave.” Watch the video and check out the slides.&lt;/p&gt;
-
-&lt;p&gt;Thanks to Zack, Jordan, all the panelists, and everyone who attended for
-sharing their knowledge with the community. We look forward to seeing you at
-the next one!&lt;/p&gt;
-
-&lt;p&gt;In the meantime, you can catch the Druid team on the road this summer. We’re
-speaking at a handful of conferences including
-&lt;a href=&quot;http://www.gluecon.com/2014/speakers/&quot;&gt;Gluecon&lt;/a&gt; on May 22 in Denver, CO, &lt;a href=&quot;http://www.sigmod2014.org/program_sigmod.shtml#ind2&quot;&gt;ACM
-SIGMOD&lt;/a&gt; in Snowbird, UT on
-June 24, and
-&lt;a href=&quot;http://www.oscon.com/oscon2014/public/schedule/detail/34076&quot;&gt;Oscon&lt;/a&gt; in
-Portland, OR on July 23.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Introduction to pydruid</title>
-		<link href="http://druid.io/blog/2014/04/15/intro-to-pydruid.html"/>
-		<updated>2014-04-15T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2014/04/15/intro-to-pydruid</id>
-                <author><name>Igal Levy</name></author>
-                <summary type="html">&lt;p&gt;We&amp;#39;ve already written about pairing &lt;a href=&quot;http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html&quot;&gt;R with RDruid&lt;/a&gt;, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the &lt;a href=&quot;http://www.scipy.org/stackspec.html&quot;&gt;SciPy Stack&lt;/a&gt;. To pair SciPy&amp;#39;s analytic power with the advantages of querying time-series data in [...]
-</summary>
-		<content type="html">&lt;p&gt;We&amp;#39;ve already written about pairing &lt;a href=&quot;http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html&quot;&gt;R with RDruid&lt;/a&gt;, but Python has powerful and free open-source analysis tools too. Collectively, these are often referred to as the &lt;a href=&quot;http://www.scipy.org/stackspec.html&quot;&gt;SciPy Stack&lt;/a&gt;. To pair SciPy&amp;#39;s analytic power with the advantages of querying time-series data in Druid, we cre [...]
-
-&lt;h2 id=&quot;getting-started&quot;&gt;Getting Started&lt;/h2&gt;
-
-&lt;p&gt;pydruid should run with Python 2.x, and is known to run with Python 2.7.5.&lt;/p&gt;
-
-&lt;p&gt;Install pydruid in the same way as you&amp;#39;d install any other Python module on your system. The simplest way is:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;pip install pydruid
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;You should also install Pandas to execute the simple examples below:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;pip install pandas
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;When you import pydruid into your example, it will try to load Pandas as well.&lt;/p&gt;
-
-&lt;h2 id=&quot;run-the-druid-wikipedia-example&quot;&gt;Run the Druid Wikipedia Example&lt;/h2&gt;
-
-&lt;p&gt;&lt;a href=&quot;http://druid.io/downloads.html&quot;&gt;Download Druid&lt;/a&gt; and unpack Druid. If you are not familiar with Druid, see this &lt;a href=&quot;http://druid.io/docs/latest/Tutorial:-A-First-Look-at-Druid.html&quot;&gt;introductory tutorial&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;From the Druid home directory, start the Druid Realtime node:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$DRUID_HOME&lt;/span&gt;/run_example_server.sh
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;When prompted, choose the &amp;quot;wikipedia&amp;quot; example. After the Druid realtime node is done starting up, messages should appear that start with the following:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;2014-04-03 18:01:32,852 INFO [wikipedia-incremental-persist] ...
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;These messages confirm that the realtime node is ingesting data from the Wikipedia edit stream, and that data can be queried.&lt;/p&gt;
-
-&lt;h2 id=&quot;write-execute-and-submit-a-pydruid-query&quot;&gt;Write, Execute, and Submit a pydruid Query&lt;/h2&gt;
-
-&lt;p&gt;Let&amp;#39;s say we want to see the top few languages for Wikipedia articles, in terms of number of edits. This is the query we could post directly to Druid:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;queryType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;topN&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wikipedia&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimension&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;threshold&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;metric&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;granularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;selector&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimension&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;namespace&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;value&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;article&amp;quot;&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregations&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
-    &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-      &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;longSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-      &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-      &lt;span class=&quot;nt&quot;&gt;&amp;quot;fieldName&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;
-    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;intervals&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;2013-06-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The results should appear similar to the following:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2014-04-03T17:59:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;result&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;en&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4726&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;fr&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1273&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;de&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;857&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;ja&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;176&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Due to limitations in the way the wikipedia example is set up, you may see a limited number of results appear.&lt;/p&gt;
-
-&lt;p&gt;Here&amp;#39;s that same query in Python:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pydruid.client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PyDruid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;http://localhost:8083&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;druid/v2/&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;top_langs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;topn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;datasource&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wikipedia&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;granularity&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;intervals&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2013-06-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;dimension&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dimension&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;namespace&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;article&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;aggregations&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;longsum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)},&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;metric&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;threshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;top_langs&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Do this if you want to see the raw JSON&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Let&amp;#39;s break this query down:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;query &amp;ndash; The &lt;code&gt;query&lt;/code&gt; object is instantiated with the location of the Druid realtime node. &lt;code&gt;query&lt;/code&gt; exposes various querying methods, including &lt;code&gt;topn&lt;/code&gt;.&lt;/li&gt;
-&lt;li&gt;datasource &amp;ndash; This identifies the datasource. If Druid were ingesting from more than one datasource, this ID would identify the one we want.&lt;/li&gt;
-&lt;li&gt;granularity &amp;ndash; The rollup granularity, which could be set to a specific value such as &lt;code&gt;minute&lt;/code&gt; or &lt;code&gt;hour&lt;/code&gt;. We want to see the sum count across the entire interval, and so we choose &lt;code&gt;all&lt;/code&gt;.&lt;/li&gt;
-&lt;li&gt;intervals &amp;ndash; The interval of time we&amp;#39;re interested in. The value given is extended beyond our actual endpoints to make sure we cover all of the data.&lt;/li&gt;
-&lt;li&gt;dimension &amp;ndash; The dimension we&amp;#39;re interested in, which happens to be language. Language is an attribute of the &lt;a href=&quot;http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds&quot;&gt;Wikipedia recent-changes feed&amp;#39;s metadata&lt;/a&gt;.&lt;/li&gt;
-&lt;li&gt;filter &amp;ndash; Filters are used to specify a selector. In this case, we&amp;#39;re selecting pages that have a namespace dimension with the value &lt;code&gt;article&lt;/code&gt; (therefore excluding edits to Wikipedia pages that aren&amp;#39;t articles).&lt;/li&gt;
-&lt;li&gt;aggregations &amp;ndash; We&amp;#39;re interested in obtaining the total count of edited pages, per the language dimension, and we map it to a type of aggregation available in pydruid (longsum). We also rename this &lt;code&gt;count&lt;/code&gt; metric to &lt;code&gt;edit_count&lt;/code&gt;.&lt;/li&gt;
-&lt;li&gt;metric &amp;ndash; Names the metric to sort on.&lt;/li&gt;
-&lt;li&gt;threshold &amp;ndash; Sets the maximum number of aggregated results to return.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;See the &lt;a href=&quot;https://pythonhosted.org/pydruid/&quot;&gt;pydruid documentation&lt;/a&gt; for more information about queries.&lt;/p&gt;
-
-&lt;h2 id=&quot;bringing-the-data-into-pandas&quot;&gt;Bringing the Data Into Pandas&lt;/h2&gt;
-
-&lt;p&gt;Now that Druid is returning data, we&amp;#39;ll pass that data to a Pandas dataframe, which allows us to analyze and visualize it:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pydruid.client&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;
-
-&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pylab&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Need to have matplotlib installed&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PyDruid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;http://localhost:8083&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;druid/v2/&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;top_langs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;topn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;datasource&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wikipedia&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;granularity&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;intervals&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2013-06-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;dimension&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;language&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dimension&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;namespace&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;article&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;aggregations&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;longsum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)},&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;metric&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;edit_count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;threshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;top_langs&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# Do this if you want to see the raw JSON&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;export_pandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# Client will import Pandas, no need to do so separately.&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;drop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;timestamp&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/sp [...]
-
-&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class= [...]
-
-&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;language&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&g [...]
-
-&lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Printing the results gives:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;   edit_count language
-1         834       en
-2         256       de
-3         185       fr
-4          38       ja
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The bar graph will look something like this:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/img/wiki-edit-lang-plot.png&quot; alt=&quot;Bar graph showing Wikipedia edits by language&quot; title=&quot;Wikipedia Edits by Language&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;If you were to repeat the query, you should see larger numbers under edit_count, since the Druid realtime node is continuing to ingest data from Wikipedia.&lt;/p&gt;
-
-&lt;h2 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h2&gt;
-
-&lt;p&gt;In this blog, we showed how you can run ad-hoc queries against a data set that is being streamed into Druid. And while this is only a small example of pydruid and the power of Python, it serves as an effective introductory demonstration of the  benefits of pairing Druid&amp;#39;s ability to make data available in real-time with SciPi&amp;#39;s powerful analytics tools.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Benchmarking Druid</title>
-		<link href="http://druid.io/blog/2014/03/17/benchmarking-druid.html"/>
-		<updated>2014-03-17T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2014/03/17/benchmarking-druid</id>
-                <author><name>Xavier Léauté</name></author>
-                <summary type="html">&lt;p&gt;We often get asked how fast Druid is. Despite having published some benchmark
-numbers in &lt;a href=&quot;/blog/2012/01/19/scaling-the-druid-data-store.html&quot;&gt;previous blog posts&lt;/a&gt;, as well as in our &lt;a href=&quot;https://speakerdeck.com/druidio/&quot;&gt;talks&lt;/a&gt;,
-until now, we have not actually published any data to back those claims up in in a
-reproducible way. This post intends to address this and make it easier for
-anyone to evaluate Druid and compare it to other systems out there.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;We often get asked how fast Druid is. Despite having published some benchmark
-numbers in &lt;a href=&quot;/blog/2012/01/19/scaling-the-druid-data-store.html&quot;&gt;previous blog posts&lt;/a&gt;, as well as in our &lt;a href=&quot;https://speakerdeck.com/druidio/&quot;&gt;talks&lt;/a&gt;,
-until now, we have not actually published any data to back those claims up in in a
-reproducible way. This post intends to address this and make it easier for
-anyone to evaluate Druid and compare it to other systems out there.&lt;/p&gt;
-
-&lt;p&gt;Hopefully this blog post will help people get an idea of where Druid stands in
-terms of query performance, how it performs under different scenarios, and what
-the limiting factors may be under different configurations.&lt;/p&gt;
-
-&lt;p&gt;The objective of our benchmark is to showcase how Druid performs on the types
-of workload it was designed for. We chose to benchmark Druid against MySQL
-mainly because of its popularity, and to provide a point of comparison with
-a storage engine that most users will be familiar with.&lt;/p&gt;
-
-&lt;p&gt;All the code to run the benchmarks as well as the &lt;a href=&quot;https://github.com/druid-io/druid-benchmark/tree/master/results&quot;&gt;raw result data&lt;/a&gt; are
-available on GitHub in the &lt;a href=&quot;https://github.com/apache/incubator-druid-benchmark&quot;&gt;druid-benchmark&lt;/a&gt; repository.&lt;/p&gt;
-
-&lt;blockquote&gt;
-&lt;p&gt;We would like to encourage our readers to run the benchmarks themselves and
-share results for different data stores and hardware setups, as well as any
-other optimizations that may prove valuable to the rest of the community or
-make this benchmark more representative.&lt;/p&gt;
-&lt;/blockquote&gt;
-
-&lt;h2 id=&quot;the-data&quot;&gt;The Data&lt;/h2&gt;
-
-&lt;p&gt;Our objective is to make this benchmark reproducible, so we want a data set
-that is readily available or that could easily be re-generated. The &lt;a href=&quot;http://www.tpc.org/tpch/&quot;&gt;TPC-H
-benchmark&lt;/a&gt; is commonly used to assess database performance, and the generated
-data set can be of any size, which makes it attractive to understand how Druid
-performs at various scales.&lt;/p&gt;
-
-&lt;p&gt;The majority of the data consists of time-based event records, which are
-relatively simple to map to the Druid data model and also suits the type of
-workload Druid was designed for.&lt;/p&gt;
-
-&lt;p&gt;The events in question span several years of daily data and include a varied set
-of dimensions and metrics, including both very high cardinality and low
-cardinality dimensions. For instance, the &lt;code&gt;l_partkey&lt;/code&gt; column has 20,272,236
-unique values, and &lt;code&gt;l_commitdate&lt;/code&gt; has 2466 distinct dates in the 100GB dat
-set.&lt;/p&gt;
-
-&lt;p&gt;The 1GB and 100GB data sets represent a total of 6,001,215 rows and 600,037,902
-rows respectively.&lt;/p&gt;
-
-&lt;h2 id=&quot;the-queries&quot;&gt;The Queries&lt;/h2&gt;
-
-&lt;p&gt;Since Druid was built to solve a specific type of problem, we chose a set of
-benchmarks typical of Druid&amp;#39;s workload that covers the majority of queries we
-observe in production. Why not use the TPC-H benchmark queries, you may ask?
-Most of those queries do not directly apply to Druid, and we would have to
-largely modify the queries or the data to fit the Druid model.&lt;/p&gt;
-
-&lt;p&gt;We put together three sets queries:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt; Simple &lt;code&gt;select count(*)&lt;/code&gt; queries over very large time ranges, covering
-almost all the data set&lt;/li&gt;
-&lt;li&gt; Aggregate queries with one or several metrics, spanning the entire set of
-rows, as well subsets on both time ranges and filtered dimension values.&lt;/li&gt;
-&lt;li&gt; Top-N queries on both high and low cardinality dimensions, with various
-number of aggretions and filters.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;The SQL equivalent of the Druid queries is shown below. Druid queries are
-generated directly in our &lt;a href=&quot;https://github.com/apache/incubator-druid-benchmark/blob/master/benchmark-druid.R&quot;&gt;benchmarking script&lt;/a&gt; using &lt;a href=&quot;https://github.com/metamx/RDruid/&quot;&gt;RDruid&lt;/a&gt;.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=queries.sql&quot;&gt;&lt;/script&gt;
-
-&lt;h2 id=&quot;generating-the-data&quot;&gt;Generating the data&lt;/h2&gt;
-
-&lt;p&gt;We used the suite of &lt;a href=&quot;http://www.tpc.org/tpch/spec/tpch_2_16_1.zip&quot;&gt;TPC-H tools&lt;/a&gt; suite of tools to generate two datasets, with
-a target size of 1GB and 100GB respectively. The actual size of the resulting
-table are 725MB and 74GB respectively, since we only generate the largest table
-in the benchmark data set. The 100GB dataset is split into 1GB chunks to make
-it easier to process.&lt;/p&gt;
-
-&lt;p&gt;All the generated data is available for download directly, so anyone can reproduce
-our results, without having to go throught the trouble of compiling and running
-the TPC-H tools.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=download-data.sh&quot;&gt;&lt;/script&gt;
-
-&lt;blockquote&gt;
-&lt;p&gt;If you would like to generate the datasets from scratch, or try out different
-sizes, download and compile the &lt;a href=&quot;http://www.tpc.org/tpch/spec/tpch_2_16_1.zip&quot;&gt;TPC-H tools&lt;/a&gt;, and use the &lt;code&gt;dbgen&lt;/code&gt; tool to
-generate the &lt;code&gt;lineitem&lt;/code&gt; table.&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;&lt;span&gt;&lt;/span&gt;./dbgen -TL -s1         &lt;span class=&quot;c1&quot;&gt;# 1GB&lt;/span&gt;
-./dbgen -TL -s100 -C100 &lt;span class=&quot;c1&quot;&gt;# 100GB / 100 chunks&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;There is also a &lt;a href=&quot;https://github.com/apache/incubator-druid-benchmark/blob/master/generate-data.sh&quot;&gt;&lt;code&gt;generate-data.sh&lt;/code&gt;&lt;/a&gt; script in our repository to help write
-compressed data directly when generating large data sets.&lt;/p&gt;
-&lt;/blockquote&gt;
-
-&lt;h2 id=&quot;setup&quot;&gt;Setup&lt;/h2&gt;
-
-&lt;h3 id=&quot;druid-cluster&quot;&gt;Druid Cluster&lt;/h3&gt;
-
-&lt;p&gt;We are running the benchmark against Druid 0.6.62, which includes bug fixes for
-some date parsing and case-sensitivity issues that cropped up while loading the
-benchmark data.&lt;/p&gt;
-
-&lt;p&gt;Druid Compute nodes are running Amazon EC2 &lt;code&gt;m3.2xlarge&lt;/code&gt; instances (8 cores, Intel Xeon
-E5-2670 v2 @ 2.50GHz with 160GB SSD and 30GB of RAM) and broker nodes use
-&lt;code&gt;c3.2xlarge&lt;/code&gt; nodes (8 cores, Intel Xeon E5-2680 v2 @ 2.80GHz and 15GB of RAM).
-Both run a standard Ubuntu 12.04.4 LTS and a 3.11 kernel.&lt;/p&gt;
-
-&lt;p&gt;For the 1GB data set, we run queries directly against a single compute node,
-since the broker is unnecessary in that type of setup. For the 100GB data
-set we first run against a single compute node and then scale out the
-cluster to 6 compute nodes and issue queries against one broker node.&lt;/p&gt;
-
-&lt;p&gt;Compute nodes are configured with 8 processing threads (one per core), as
-well as the default 1GB compute buffer, and 6GB of JVM heap. That leaves about
-15GB of memory for memory mapping segment data, if we allow about 1GB for the
-operating system and other overhead.&lt;/p&gt;
-
-&lt;p&gt;Broker nodes are configured with 12GB of JVM heap, and query chunking has been
-disabled. This ensures queries do not get split up into sequential queries
-and always run fully parallelized.&lt;/p&gt;
-
-&lt;blockquote&gt;
-&lt;p&gt;Note: Interval chunking is turned on by default to prevent long interval
-queries from taking up all compute resources at once. By default the maximum
-interval that a single chunk can span is set to 1 month, which works well
-for most production data sets Druid is being used for.&lt;/p&gt;
-
-&lt;p&gt;Interval chunking can be disabled by setting &lt;code&gt;druid.query.chunkPeriod&lt;/code&gt; and
-&lt;code&gt;druid.query.topN.chunkPeriod&lt;/code&gt; to a very large value compared to the time
-range of the data (in this case we used &lt;code&gt;P10Y&lt;/code&gt;).&lt;/p&gt;
-&lt;/blockquote&gt;
-
-&lt;p&gt;Besides those settings, no other particular performance optimizations have been
-made, and segment replication has been turned off in the datasource &lt;a href=&quot;/docs/latest/Rule-Configuration.html&quot;&gt;load
-rules&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;Complete &lt;a href=&quot;https://github.com/druid-io/druid-benchmark/tree/master/config&quot;&gt;Druid and JVM configuration parameters&lt;/a&gt; are published in our
-&lt;a href=&quot;https://github.com/apache/incubator-druid-benchmark&quot;&gt;repository&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h3 id=&quot;mysql&quot;&gt;MySQL&lt;/h3&gt;
-
-&lt;p&gt;Our MySQL setup is an Amazon RDS instance (version 5.6.13) running on the same
-instance type as Druid compute nodes (&lt;code&gt;m3.2xlarge&lt;/code&gt;) using the MyISAM engine.&lt;/p&gt;
-
-&lt;p&gt;We used the default Amazon settings, although we experimented with enabling
-memory mapping (&lt;code&gt;myisam_use_mmap&lt;/code&gt;). However, this appeared to degrade
-performance significantly, so our results are with memory mapping turned off.&lt;/p&gt;
-
-&lt;blockquote&gt;
-&lt;p&gt;Note: We also ran some test against the InnoDB engine, but it appeared to be
-quite a bit slower when compared to MyISAM. This was the case for all the
-benchmark queries except for the &lt;code&gt;count_star_interval&lt;/code&gt; query, even when
-setting &lt;code&gt;innodb_buffer_pool_size&lt;/code&gt; to very large values.&lt;/p&gt;
-&lt;/blockquote&gt;
-
-&lt;h2 id=&quot;loading-the-data&quot;&gt;Loading the data&lt;/h2&gt;
-
-&lt;h3 id=&quot;druid&quot;&gt;Druid&lt;/h3&gt;
-
-&lt;p&gt;We use the &lt;a href=&quot;/docs/latest/Indexing-Service.html&quot;&gt;Druid indexing service&lt;/a&gt; configured to use
-an Amazon EMR Hadoop cluster to load the data and create the necessary Druid
-segments. The data is being loaded off of S3, so you will have to adjust the
-input paths in the &lt;a href=&quot;https://github.com/apache/incubator-druid-benchmark/blob/master/lineitem.task.json&quot;&gt;task descriptor files&lt;/a&gt; to point to your
-own hadoop input path, as well as provide your own hadoop coordinates artifact.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=load-druid.sh&quot;&gt;&lt;/script&gt;
-
-&lt;p&gt;For the larger data set we configure the &lt;a href=&quot;http://druid.io/docs/latest/Tasks.html&quot;&gt;hadoop index task&lt;/a&gt; to
-create monthly segments, each of which is sharded into partitions of at most
-5,000,000 rows if necessary.  We chose those settings in order to achieve
-similar segment sizes for both data sets, thus giving us roughly constant
-segment scan time which gives us good scaling properties and makes comparison
-easier.&lt;/p&gt;
-
-&lt;p&gt;The resulting Druid segments consist of:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt; a single 589MB segment for the 1GB data set,&lt;/li&gt;
-&lt;li&gt; 161 segments totaling 83.4GB (average segment size of 530MB) for the 100GB
-data set.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;In our case the indexing service took about 25 minutes per segment for both
-datasets.  The additional sharding step for the larger data set only adds a few
-minutes, so with the right amount of Hadoop resources, loading could take as
-little as half an hour.&lt;/p&gt;
-
-&lt;h1 id=&quot;mysql&quot;&gt;MySQL&lt;/h1&gt;
-
-&lt;p&gt;Loading the data into MySQL is fairly simple using the client&amp;#39;s local file
-option.  Assuming the &lt;code&gt;tpch&lt;/code&gt; database already exists on the server, the
-following command creates the necessary table, indices, loads the data and
-optimizes the table. Keep in mind you will first need to uncompress the data
-files prior to loading them.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=load-mysql.sh&quot;&gt;&lt;/script&gt;
-
-&lt;p&gt;Loading the data itself is relatively fast, but it may take several hours to
-create the necessary indices and optimizing the table on the larger data set.
-In our case it took several attempts to complete the indexing and table
-optimization steps.&lt;/p&gt;
-
-&lt;h2 id=&quot;running-the-benchmarks&quot;&gt;Running the Benchmarks&lt;/h2&gt;
-
-&lt;h3 id=&quot;druid&quot;&gt;Druid&lt;/h3&gt;
-
-&lt;p&gt;Running the Druid benchmark requires &lt;a href=&quot;http://cran.rstudio.com/&quot;&gt;R&lt;/a&gt;, as well as a couple of packages,
-including &lt;a href=&quot;https://github.com/metamx/RDruid/&quot;&gt;&lt;code&gt;RDruid&lt;/code&gt;&lt;/a&gt;, &lt;code&gt;microbenchmark&lt;/code&gt;, as well as &lt;code&gt;ggplot2&lt;/code&gt; if you would
-like to generate the plots.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=benchmark-druid.sh&quot;&gt;&lt;/script&gt;
-
-&lt;h3 id=&quot;mysql&quot;&gt;MySQL&lt;/h3&gt;
-
-&lt;p&gt;The SQL queries for the benchmark are stored in the &lt;code&gt;queries-mysql.sql&lt;/code&gt; file, and we provide a convenient script to run all or part of the benchmark.&lt;/p&gt;
-
-&lt;script src=&quot;https://gist.github.com/xvrl/9552286.js?file=benchmark-mysql.sh&quot;&gt;&lt;/script&gt;
-
-&lt;h2 id=&quot;benchmark-results&quot;&gt;Benchmark Results&lt;/h2&gt;
-
-&lt;h3 id=&quot;1gb-data-set&quot;&gt;1GB data set&lt;/h3&gt;
-
-&lt;p&gt;In a single node configuration, with a single segment, Druid will only use a
-single processing thread, so neither MySQL nor Druid benefit from more than one
-core in this case.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/druid-benchmark-1gb-median.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;We see that Druid performs almost all the queries in less than one second and
-is anywhere between 2x and 15x faster than vanilla MySQL. On &lt;code&gt;select count(*)&lt;/code&gt;
-queries it achieves scan rates of 53,539,211 rows/second, and 36,246,533
-rows/second for aggregate &lt;code&gt;select sum(float)&lt;/code&gt; queries.&lt;/p&gt;
-
-&lt;p&gt;Since Druid uses column-oriented storage, it performs better on queries using
-fewer columns, and as more columns become part of the query it is expected to
-lose some of its advantage compared to row-oriented storage engines.&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;# 1GB data set
-# median query times (seconds) over 100 runs
-
-                query     druid mysql
-  count_star_interval 0.1112114 1.770
-              sum_all 0.6074772 2.400
-       sum_all_filter 0.5058156 2.415
-         sum_all_year 0.6100049 3.440
-            sum_price 0.1655666 2.070
-   top_100_commitdate 0.4150540 3.880
-        top_100_parts 0.5897905 3.850
-top_100_parts_details 1.3785018 4.540
- top_100_parts_filter 0.7332013 3.550
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;100gb-data-set-on-a-single-node&quot;&gt;100GB data set on a single node&lt;/h3&gt;
-
-&lt;p&gt;For the much larger data set, on a single node, Druid can take advantage of all
-8 cores, parallelizing workload across multiple segments at once. Since only
-about 15GB of RAM is available for segment data, not all of it can be paged into
-memory at once, especially when querying multiple columns at a time. Segments
-will get paged in and out by the operating system, and having SSD storage in this
-case significantly helps to reduce the amount of time spent paging data in.&lt;/p&gt;
-
-&lt;p&gt;Druid really shines at this scale, even on a single node. The comparison may be
-a little unfair, since MySQL can only take advantage of a single core per
-query, but even considering that, Druid is still between 45x and 145x faster.&lt;/p&gt;
-
-&lt;p&gt;Compared to the 1GB data set, a simple &lt;code&gt;select sum(float)&lt;/code&gt; query takes only 26
-times as long, even though we have 100 times the number of rows.&lt;/p&gt;
-
-&lt;p&gt;The most expensive type of queries for Druid are Top-N queries over very high
-cardinality dimensions (exceeding 20 million for the &lt;code&gt;top_x_parts&lt;/code&gt; queries).
-Those types of queries may require multiple passes over the data in case
-compute buffers are not large enough to hold all the results.&lt;/p&gt;
-
-&lt;p&gt;MySQL essentially becomes unusable for interactive queries at this scale. Most
-queries take at least 10 minutes to complete, while more expensive ones can
-take several hours.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/druid-benchmark-100gb-median.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;# 100GB data set (single node)
-# median query times (seconds) - 20 runs Druid, 3-5 runs MySQL
-
-                query          druid    mysql
-  count_star_interval       2.632399   177.95
-              sum_all      14.503592   663.93
-       sum_all_filter      10.202358   590.96
-         sum_all_year      14.481295   673.97
-            sum_price       4.240469   624.21
-   top_100_commitdate       7.402270   706.64
-        top_100_parts     113.565130  9961.12
-top_100_parts_details     181.108950 12173.46
- top_100_parts_filter      57.717676  5516.37
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h1 id=&quot;scaling-up-druid&quot;&gt;Scaling up Druid&lt;/h1&gt;
-
-&lt;p&gt;Druid makes it very straightforward to scale the cluster and take advantage of
-additional nodes. Simply firing up more compute nodes will trigger the
-coordinator to redistribute the data among the additional nodes, and within a
-few minutes, the workload will be distributed, without requiring any downtime.&lt;/p&gt;
-
-&lt;p&gt;We see that Druid scales almost linearly for queries that involve mainly column
-scans, with queries performing 5x to 6x faster than on a single core.&lt;/p&gt;
-
-&lt;p&gt;For Top-N queries the speedup is less, between 4x and 5x, which is expected,
-since a fair amount of merging work has to be done at the broker level to merge
-results for those type of queries on high-cardinality dimensions.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/assets/druid-benchmark-scaling.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;# median query times (seconds) over 20 runs
-
-                query druid (1 node) druid (6 nodes)
-  count_star_interval       2.632399       0.4061503
-              sum_all      14.503592       2.2583412
-       sum_all_filter      10.202358       1.9062494
-         sum_all_year      14.481295       2.2554939
-            sum_price       4.240469       0.6515721
-   top_100_commitdate       7.402270       1.4426543
-        top_100_parts     113.565130      22.9146193
-top_100_parts_details     181.108950      32.9310563
- top_100_parts_filter      57.717676      14.3942355
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;conclusions-and-future-work&quot;&gt;Conclusions and future work&lt;/h2&gt;
-
-&lt;p&gt;In publishing a reproducible benchmark, as well as our data and methodology, we
-hope we gave more tangible evidence of Druid&amp;#39;s performance characteristics, as
-well as a reference comparison with a more familiar database. We hope the
-community will contribute benchmarks for other data stores in the future.&lt;/p&gt;
-
-&lt;p&gt;Unsurprisingly, a conventional data store such as MySQL quickly breaks down at
-the scale of data that is increasingly becoming the norm these days.  Druid was
-designed to solve a specific set of problems where other generic solutions stop
-working.&lt;/p&gt;
-
-&lt;p&gt;We have shown that Druid performs well whether in single or multi-node
-configurations and is able to take full advantage of modern hardware with many
-cores and large amounts of memory. Its ability to quickly scale horizontally
-allows to adapt to various workloads, with query performance scaling almost
-linearly for typical production workloads.&lt;/p&gt;
-
-&lt;p&gt;That being said, Druid still requires a fair amount of knowledge to choose
-optimal configuration settings and pick good segment size/sharding properties.
-We are planning to write a blog post dedicated to performance tuning where we
-will address those questions more directly.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Batch-Loading Sensor Data into Druid</title>
-		<link href="http://druid.io/blog/2014/03/12/batch-ingestion.html"/>
-		<updated>2014-03-12T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2014/03/12/batch-ingestion</id>
-                <author><name>Igal Levy</name></author>
-                <summary type="html">&lt;p&gt;Sensors are everywhere these days, and that means sensor data is big data. Ingesting and analyzing sensor data at speed is an interesting problem, especially when scale is desired. In this post, we&amp;#39;ll access some real-world sensor data, and show how Druid can be used to store that data and make it available for immediate querying.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;Sensors are everywhere these days, and that means sensor data is big data. Ingesting and analyzing sensor data at speed is an interesting problem, especially when scale is desired. In this post, we&amp;#39;ll access some real-world sensor data, and show how Druid can be used to store that data and make it available for immediate querying.&lt;/p&gt;
-
-&lt;h2 id=&quot;finding-sensor-data&quot;&gt;Finding Sensor Data&lt;/h2&gt;
-
-&lt;p&gt;The United States Geological Survey (USGS) has millions of sensors for all types of physical and natural phenomena, many of which concern water. If you live anywhere where water is a concern, which is pretty much everywhere (considering that both too little or too much H&lt;sub&gt;2&lt;/sub&gt;O can be an issue), this is interesting data. You can learn about USGS sensors in a variety of ways, one of which is an &lt;a href=&quot;http://maps.waterdata.usgs.gov/mapper/index.html&qu [...]
-
-&lt;p&gt;We used this map to get the sensor info for the Napa River in Napa County, California.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/img/map-usgs-napa.png&quot;  alt&quot;USGS map showing Napa River sensor location and information&quot; title=&quot;USGS Napa River Sensor Information&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;We decided to first import the data into &lt;a href=&quot;http://www.r-project.org/&quot;&gt;R (the statistical programming language)&lt;/a&gt; for two reasons:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;The R package &lt;code&gt;waterData&lt;/code&gt; from USGS. This package allows us to retrieve and analyze hydrologic data from USGS. We can then export that data from within the R environment, then set up Druid to ingest it.&lt;/li&gt;
-&lt;li&gt;The R package &lt;code&gt;RDruid&lt;/code&gt; which we&amp;#39;ve &lt;a href=&quot;http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html&quot;&gt;blogged about before&lt;/a&gt;. This package allows us to query Druid from within the R environment.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;extracting-the-streamflow-data&quot;&gt;Extracting the Streamflow Data&lt;/h2&gt;
-
-&lt;p&gt;In R, load the waterData package, then run &lt;code&gt;importDVs()&lt;/code&gt;:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; install.packages&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;waterData&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;kc&quot;&gt;...&lt;/span&gt;
-&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;library&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;waterData&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;kc&quot;&gt;...&lt;/span&gt;
-&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; napa_flow &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; importDVs&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; code&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;00060&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; stat&lt;span class=&quot;o&quot [...]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The last line uses the function &lt;code&gt;waterData.importDVs()&lt;/code&gt; to get sensor (or &amp;quot;streamgage&amp;quot;) data directly from the USGS datasource. This function has the following arguments:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;staid, or site identification number, which is entered as a string due to the fact that some IDs have leading 0s. This value was obtained from the interactive map discussed above.&lt;/li&gt;
-&lt;li&gt;code, which specifies the type of sensor data we&amp;#39;re interested in (if available). Our chosen code specifies measurement of discharge, in cubic feet per second. You can learn about codes at the &lt;a href=&quot;http://nwis.waterdata.usgs.gov/usa/nwis/pmcodes&quot;&gt;USGS Water Resources site&lt;/a&gt;.&lt;/li&gt;
-&lt;li&gt;stat, which specifies the type of statistic we&amp;#39;re looking for&amp;mdash;in this case, the mean daily flow (mean is the default statistic). The USGS provides &lt;a href=&quot;http://help.waterdata.usgs.gov/codes-and-parameters&quot;&gt;a page summarizing various types of codes and parameters&lt;/a&gt;.&lt;/li&gt;
-&lt;li&gt;start and end dates. &lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;The information on the specific site and sensor should provide information on the type of data available and the start-end dates for the full historical record.&lt;/p&gt;
-
-&lt;p&gt;You can now analyse and visualize the streamflow data. For example, we ran:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; myWater.plot &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; plotParam&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;napa_flow&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kp&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;myWater.plot&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;to get:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/img/napa_streamflow_plot.png&quot; alt=&quot;Napa River streamflow historical data&quot; title=&quot;Napa River streamflow historical data&quot; &gt;&lt;/p&gt;
-
-&lt;p&gt;Reflected in the flow of the Napa River, you can see the severe drought California experienced in the late 1970s, the very wet years that followed, a less severe drought beginning in the late 1980s, and the beginning of the current drought.&lt;/p&gt;
-
-&lt;h2 id=&quot;transforming-the-data-for-druid&quot;&gt;Transforming the Data for Druid&lt;/h2&gt;
-
-&lt;p&gt;We first want to have a look at the content of the data frame:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kp&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;napa_flow&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-     staid val      dates qualcode
-&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;90&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-01&lt;/span&gt;        A
-&lt;span class=&quot;m&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;87&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-02&lt;/span&gt;        A
-&lt;span class=&quot;m&quot;&gt;3&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;85&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-03&lt;/span&gt;        A
-&lt;span class=&quot;m&quot;&gt;4&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;80&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-04&lt;/span&gt;        A
-&lt;span class=&quot;m&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;76&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-05&lt;/span&gt;        A
-&lt;span class=&quot;m&quot;&gt;6&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;11458000&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;75&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;1963-01-06&lt;/span&gt;        A
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;We don&amp;#39;t have any use for the qualcode (the &lt;a href=&quot;http://help.waterdata.usgs.gov/codes-and-parameters/daily-value-qualification-code-dv_rmk_cd&quot;&gt;Daily Value Qualification Code&lt;/a&gt;), column:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; napa_flow_subset &lt;span class=&quot;o&quot;&gt;&amp;lt;-&lt;/span&gt; napa_flow&lt;span class=&quot;p&quot;&gt;[,&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt [...]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;It may look like we also don&amp;#39;t need the staid column, either, since it&amp;#39;s all the same sensor ID. However, we&amp;#39;ll keep it because at some later time we may want to load similar data from other sensors.&lt;/p&gt;
-
-&lt;p&gt;Now we can export the data to a file, removing the header and row names:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-r&quot; data-lang=&quot;r&quot;&gt;&lt;span&gt;&lt;/span&gt;write.table&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;napa_flow_subset&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; file&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;~/napa-flow.tsv&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; sep&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;sp [...]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;And here&amp;#39;s our file:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ head ~/napa-flow.tsv 
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;90&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-01
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;87&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-02
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;85&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-03
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;80&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-04
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;76&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-05
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;75&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-06
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;73&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-07
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;71&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-08
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;65&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-09
-&lt;span class=&quot;s2&quot;&gt;&amp;quot;11458000&amp;quot;&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;59&lt;/span&gt;  &lt;span class=&quot;m&quot;&gt;1963&lt;/span&gt;-01-10
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;loading-the-data-into-druid&quot;&gt;Loading the Data into Druid&lt;/h2&gt;
-
-&lt;p&gt;Loading the data into Druid involves setting up Druid&amp;#39;s indexing service to ingest the data into the Druid cluster, where specialized nodes will manage it.&lt;/p&gt;
-
-&lt;h3 id=&quot;configure-the-indexing-task&quot;&gt;Configure the Indexing Task&lt;/h3&gt;
-
-&lt;p&gt;Druid has an indexing service that can load data. Since there&amp;#39;s a relatively small amount of data to ingest, we&amp;#39;re going to use the &lt;a href=&quot;http://druid.io/docs/latest/Batch-ingestion.html&quot;&gt;basic Druid indexing service&lt;/a&gt; to ingest it. (Another option to ingest data uses a Hadoop cluster and is set up in a similar way, but that is more than we need for this job.) We must create a task (in JSON format) that specifies the work the indexing s [...]
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;index&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;usgs&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;granularitySpec&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;uniform&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;gran&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;MONTH&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;intervals&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;1963-01-01/2013-12-31&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregators&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[{&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;
-    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;avgFlowCuFtsec&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-     &lt;span class=&quot;nt&quot;&gt;&amp;quot;fieldName&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;val&amp;quot;&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}],&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;firehose&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;local&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;baseDir&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;examples/usgs/&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;napa-flow-subset.tsv&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;parser&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-      &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestampSpec&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;column&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;dates&amp;quot;&lt;/span&gt;
-      &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-      &lt;span class=&quot;nt&quot;&gt;&amp;quot;data&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;tsv&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;columns&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;staid&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;val&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;dates&amp;quot;&lt;/span&gt;&lt;span class [...]
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimensions&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;staid&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;val&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-      &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The taks is saved to a file, &lt;code&gt;usgs_index_task.json&lt;/code&gt;. Note a few things about this task:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;&lt;p&gt;granularitySpec sets &lt;a href=&quot;http://druid.io/docs/latest/Concepts-and-Terminology.html&quot;&gt;segment&lt;/a&gt; granularity to MONTH, rather than using the default DAY, even though each row of our data is a daily reading. We do this to avoid having Druid create a segment per row of data. That&amp;#39;s a lot of extra work (note the interval is &amp;quot;1963-01-01/2013-12-31&amp;quot;), and we simply don&amp;#39;t need that much granularity to make sense of  [...]
-
-&lt;p&gt;A different granularity setting for the data itself (&lt;a href=&quot;http://druid.io/docs/latest/Tasks.html&quot;&gt;indexGranularity&lt;/a&gt;) controls how the data is rolled up before it is chunked into segments. This granularity, which defaults to &amp;quot;MINUTE&amp;quot;, won&amp;#39;t affect our data, which consists of daily values.&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;We specify aggregators that Druid will use as &lt;em&gt;metrics&lt;/em&gt; to summarize the data. &amp;quot;count&amp;quot; is a built-in metric that counts the raw number of rows on ingestion, and the Druid rows (after rollups) after processing. We&amp;#39;ve added a metric to summarize &amp;quot;val&amp;quot; from our water data.&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;The firehose section specifies out data source, which in this case is a file. If our data existed in multiple files, we could have set &amp;quot;filter&amp;quot; to &amp;quot;*.tsv&amp;quot;.&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;We have to specify the timestamp column so Druid knows.&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;We also specify the format of the data (&amp;quot;tsv&amp;quot;), what the columns are, and which to treat as dimensions. Dimensions are the values that describe our data.&lt;/p&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;start-a-druid-cluster-and-post-the-task&quot;&gt;Start a Druid Cluster and Post the Task&lt;/h2&gt;
-
-&lt;p&gt;Before submitting this task, we must start a small Druid cluster consisting of the indexing service, a Coordinator node, and a Historical node. Instructions on how to set up and start a Druid cluster are in the &lt;a href=&quot;http://druid.io/docs/latest/Tutorial:-Loading-Your-Data-Part-1.html&quot;&gt;Druid documentation&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;Once the cluster is ready, the task is submitted to the indexer&amp;#39;s REST service (showing the relative path to the task file):&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ curl -X &lt;span class=&quot;s1&quot;&gt;&amp;#39;POST&amp;#39;&lt;/span&gt; -H &lt;span class=&quot;s1&quot;&gt;&amp;#39;Content-Type:application/json&amp;#39;&lt;/span&gt; -d @examples/usgs/usgs_index_task.json localhost:8087/druid/indexer/v1/task
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;verify-success&quot;&gt;Verify Success&lt;/h2&gt;
-
-&lt;p&gt;If the task is accepted, a message similar to the following should appear almost immediately:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;{&amp;quot;task&amp;quot;:&amp;quot;index_usgs_2014-03-06T22:12:38.803Z&amp;quot;}
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The indexing service (or &amp;quot;overlord&amp;quot;) should log a message similar to the following:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;2014-03-06 22:13:14,495 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskQueue - Task SUCCESS: IndexTask{id=index_usgs_2014-03-06T22:12:38.803Z, type=index, dataSource=usgs} (30974 run duration)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;This shows that the data is in Druid. You&amp;#39;ll see messages in the other nodes&amp;#39; logs concerning the &amp;quot;usgs&amp;quot; data. We can further verify this by going to the overlord&amp;#39;s console (http://&amp;lt;host&amp;gt;:8087/console.html) to view information about the task, and the Coordinator&amp;#39;s console (http://&amp;lt;host&amp;gt;:8082) to view metadata about the individual segments.&lt;/p&gt;
-
-&lt;p&gt;We can also verify the data by querying Druid. Here&amp;#39;s a simple time-boundary query:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;queryType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;timeBoundary&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;usgs&amp;quot;&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Saved in a file called &lt;code&gt;tb-query.body&lt;/code&gt;, it can then be submitted to Druid:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;$ curl -X POST &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://localhost:8081/druid/v2/?pretty&amp;#39;&lt;/span&gt; -H &lt;span class=&quot;s1&quot;&gt;&amp;#39;content-type: application/json&amp;#39;&lt;/span&gt; -d @examples/usgs/tb-query.body
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;The response should be:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;1963-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;result&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;minTime&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;1963-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;maxTime&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2013-12-31T00:00:00.000Z&amp;quot;&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;You can learn about submitting more complex queries in the &lt;a href=&quot;http://druid.io/docs/latest/Tutorial:-All-About-Queries.html&quot;&gt;Druid documentation&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;what-to-try-next-something-more-akin-to-a-production-system&quot;&gt;What to Try Next: Something More Akin to a Production System&lt;/h2&gt;
-
-&lt;p&gt;For the purposes of demonstration, we&amp;#39;ve cobbled together a simple system for manually fetching, mutating, loading, analyzing, storing, and then querying (for yet more analysis) data. But this would hardly be anyone&amp;#39;s idea of a production system.&lt;/p&gt;
-
-&lt;p&gt;The USGS has REST-friendly services for accessing various realtime and historical data, including &lt;a href=&quot;http://waterservices.usgs.gov/rest/IV-Service.html&quot;&gt;water data&lt;/a&gt;. We could conceivably set up a data ingestion stack that fetches that data, feeds it to a messaging queue (e.g., Apache Kafka) and then process it and moves it on to Druid via a specialized framework (e.g., Apache Storm). Then we could query the system to generate both realtime statuses [...]
-</content>
-	</entry>
-	
-	<entry>
-		<title>How We Scaled HyperLogLog: Three Real-World Optimizations</title>
-		<link href="http://druid.io/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems.html"/>
-		<updated>2014-02-18T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2014/02/18/hyperloglog-optimizations-for-real-world-systems</id>
-                <author><name>NELSON RAY AND FANGJIN YANG</name></author>
-                <summary type="html">&lt;p&gt;At Metamarkets, we specialize in converting mountains of programmatic ad data
-into real-time, explorable views. Because these datasets are so large and
-complex, we’re always looking for ways to maximize the speed and efficiency of
-how we deliver them to our clients.  In this post, we’re going to continue our
-discussion of some of the techniques we use to calculate critical metrics such
-as unique users and device IDs with maximum performance and accuracy.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;At Metamarkets, we specialize in converting mountains of programmatic ad data
-into real-time, explorable views. Because these datasets are so large and
-complex, we’re always looking for ways to maximize the speed and efficiency of
-how we deliver them to our clients.  In this post, we’re going to continue our
-discussion of some of the techniques we use to calculate critical metrics such
-as unique users and device IDs with maximum performance and accuracy.&lt;/p&gt;
-
-&lt;p&gt;Approximation algorithms are rapidly gaining traction as the preferred way to
-determine the unique number of elements in high cardinality sets. In the space
-of cardinality estimation algorithms, HyperLogLog has quickly emerged as the
-de-facto standard. Widely discussed by &lt;a href=&quot;http://research.google.com/pubs/pub40671.html&quot;&gt;technology companies&lt;/a&gt; and
-&lt;a href=&quot;http://highscalability.com/blog/2012/4/5/big-data-counting-how-to-count-a-billion-distinct-objects-us.html&quot;&gt;popular blogs&lt;/a&gt;, HyperLogLog trades
-accuracy in data and query results for massive reductions in data storage and
-vastly improved &lt;a href=&quot;http://strataconf.com/stratany2013/public/schedule/detail/30045&quot;&gt;system performance&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;In our &lt;a href=&quot;http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data&quot;&gt;previous&lt;/a&gt; investigation of HyperLogLog, we briefly
-discussed our motivations for using approximate algorithms and how we leveraged
-HyperLogLog in &lt;a href=&quot;http://druid.io/&quot;&gt;Druid&lt;/a&gt;, Metamarkets’ open source, distributed data
-store.  Since implementing and deploying HyperLogLog last year, we’ve made
-several optimizations to further improve performance and reduce storage cost.
-This blog post will share some of those optimizations. This blog post assumes
-that you are already familiar with how HyperLogLog works. If you are not
-familiar with the algorithm, there are plenty of resources &lt;a href=&quot;http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf&quot;&gt;online&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;compacting-registers&quot;&gt;Compacting Registers&lt;/h2&gt;
-
-&lt;p&gt;In our initial implementation of HLL, we allocated 8 bits of memory for each
-register. Recall that each value stored in a register indicates the position of
-the first ‘1’ bit of a hashed input. Given that 2^255 ~== 10^76, a single 8 bit
-register could approximate (not well, though) a cardinality close to the number
-of atoms in the entire &lt;a href=&quot;http://www.universetoday.com/36302/atoms-in-the-universe/&quot;&gt;observable universe&lt;/a&gt;. Martin
-Traverso, et. al. of &lt;a href=&quot;https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920&quot;&gt;Facebook’s Presto&lt;/a&gt; , realized that this was a bit
-wasteful and proposed an optimization, exploiting the fact that the registers
-increment in near lockstep.&lt;/p&gt;
-
-&lt;p&gt;Given that each register is initially initialized with value 0, with 0 uniques,
-there is no change in any of the registers. Let’s say we have 8 registers. Then
-with 8 * 2^10 uniques, each register will have values ~ 10. Of course, there
-will be some variance, which can be calculated exactly if one were so inclined,
-given that the distribution in each register is an independent maximum of
-&lt;a href=&quot;http://en.wikipedia.org/wiki/Negative_binomial_distribution&quot;&gt;Negative Binomial&lt;/a&gt; (1, .5) draws.&lt;/p&gt;
-
-&lt;p&gt;With 4 bit registers, each register can only approximate up to 2^15 = 32,768
-uniques. In fact, the reality is worse because the higher numbers cannot be
-represented and are lost, impacting accuracy. Even with 2,048 registers, we
-can’t do much better than ~60M, which is one or two orders of magnitude lower
-than what we need.&lt;/p&gt;
-
-&lt;p&gt;Since the register values tend to increase together, the FB folks decided to
-introduce an offset counter and only store positive differences from it in the
-registers. That is, if we have register values of 8, 7, and 9, this corresponds
-to having an offset of 7 and using register difference values of 1, 0, and 2.
-Given the smallish spread that we expect to see, we typically won’t observe a
-difference of more than 15 among register values. So we feel comfortable using
-2,048 4 bit registers with an 8 bit offset, for 1025 bytes of storage &amp;lt; 2048
-bytes (no offset and 8 bit registers).&lt;/p&gt;
-
-&lt;p&gt;In fact, others have commented on the concentrated distribution of the register
-values as well. In her &lt;a href=&quot;http://algo.inria.fr/durand/Articles/these.ps&quot;&gt;thesis&lt;/a&gt;, Marianne Durand suggested using
-a variable bit prefix encoding. Researchers at &lt;a href=&quot;http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40671.pdf&quot;&gt;Google&lt;/a&gt; have had
-success with difference encodings and variable length encodings.&lt;/p&gt;
-
-&lt;h3 id=&quot;problem&quot;&gt;Problem&lt;/h3&gt;
-
-&lt;p&gt;This optimization has served us well, with no appreciable loss in accuracy when
-streaming many uniques into a single HLL object, because the offset increments
-when all the registers get hit. Similarly, we can combine many HLL objects of
-moderate size together and watch the offsets increase. However, a curious
-phenomenon occurs when we try to combine many “small” HLL objects together.&lt;/p&gt;
-
-&lt;p&gt;Suppose each HLL object stores a single unique value. Then its offset will be
-0, one register will have a value between 1 and 15, and the remaining registers
-will be 0. No matter how many of these we combine together, our aggregate HLL
-object will never be able to exceed a value of 15 in each register with a 0
-offset, which is equivalent to an offset of 15 with 0’s in each register. Using
-2,048 registers, this means we won’t be able to produce estimates greater than
-~ .7 * 2048^2 * 1 / (2048 / 2^15) ~ 47M. (&lt;a href=&quot;http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf&quot;&gt;&lt;em&gt;Flajolet, et al. 2007&lt;/em&gt;&lt;/a&gt;)&lt;/p&gt;
-
-&lt;p&gt;Not good, because this means our estimates are capped at 10^7 instead of 10^80,
-irrespective of the number of true uniques. And this isn’t just some
-pathological edge case. Its untimely appearance in production a while ago was
-no fun trying to fix.&lt;/p&gt;
-
-&lt;h3 id=&quot;floating-max&quot;&gt;Floating Max&lt;/h3&gt;
-
-&lt;p&gt;The root problem in the above scenario is that the high values (&amp;gt; 15) are
-being clipped, with no hope of making it into a “small” HLL object, since the
-offset is 0. Although they are rare, many cumulative misses can have a
-noticeably large effect. Our solution involves storing one additional pair, a
-“floating max” bucket with higher resolution. Previously, a value of 20 in
-bucket 94 would be clipped to 15. Now, we store (20, 94) as the floating max,
-requiring at most an additional 2 bytes, bringing our total up to 1027 bytes.
-With enough small HLL objects so that each position is covered by a floating
-max, the combined HLL object can exceed the previous limit of 15 in each
-position. It also turns out that just one floating max is sufficient to largely
-fix the problem.&lt;/p&gt;
-
-&lt;p&gt;Let’s take a look at one measure of the accuracy of our approximations. We
-simulate 1,000 runs of streaming 1B uniques into an HLL object and look at the
-proportion of cases in which we observed clipping with the offset approximation
-(black) and the addition of the floating max (red). So for 1e9 uniques, the max
-reduced clipping from 95%+ to ~15%. That is, in 85% of cases, the much smaller
-HLL objects with the floating max agreed with HLL versus less than 5% without
-the floating max.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;http://metamarkets.com/wp-content/uploads/2014/02/FJblogpost-600x560.png&quot; alt=&quot;Clipping on Cardinality&quot; title=&quot;Clipping on Cardinality&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;For the cost of only 2 bytes, the floating max register allowed us to union
-millions of HLL objects with minimal measurable loss in accuracy.&lt;/p&gt;
-
-&lt;h2 id=&quot;sparse-and-dense-storage&quot;&gt;Sparse and Dense Storage&lt;/h2&gt;
-
-&lt;p&gt;We first discussed the concept of representing HLL buckets in either a sparse
-or dense format in our &lt;a href=&quot;http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data&quot;&gt;first blog post&lt;/a&gt;. Since that time,
-Google has also written a &lt;a href=&quot;http://research.google.com/pubs/pub40671.html&quot;&gt;great paper&lt;/a&gt; on the matter. Data undergoes
-a &lt;a href=&quot;http://druid.io/blog/2011/05/20/druid-part-deux.html&quot;&gt;summarization process&lt;/a&gt; when it is ingested in Druid. It is
-unnecessarily expensive to store raw event data and instead, Druid rolls
-ingested data up to some time granularity.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;https://lh6.googleusercontent.com/O2YefUQdRdmCTXzh6xdxthD0VJY0Vq96DTXkhhPVAL_JXaJ1JuAWfFaxZDSmf9NDZgrmHS61RMFLqivacqsOw7evy1Ff73KNb1MdjoLchpCwc-YE8d9eCLiAAA&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;In practice, we see tremendous reductions in data volume by summarizing our
-&lt;a href=&quot;http://strataconf.com/stratany2013/public/schedule/detail/30045&quot;&gt;data&lt;/a&gt;. For a given summarized row, we can maintain HLL objects
-where each object represents the estimated number of unique elements for a
-column of that row.&lt;/p&gt;
-
-&lt;p&gt;When the summarization granularity is sufficiently small, only a limited number
-of unique elements may be seen for a dimension. In this case, a given HLL
-object may have registers that contain no values. The HLL registers are thus
-‘sparsely’ populated.&lt;/p&gt;
-
-&lt;p&gt;Our normal storage representation of HLL stores 2 register values per byte. In
-the sparse representation, we instead store the explicit indexes of buckets
-that have valid values in them as (index, value) pairs. When the sparse
-representation exceeds the size of the normal or ‘dense’ representation (1027
-bytes), we can switch to using only the dense representation. Our actual
-implementation uses a heuristic to determine when this switch occurs, but the
-idea is the same. In practice, many dimensions in real world data sets are of
-low cardinality, and this optimization can greatly reduce storage versus only
-storing the dense representation.&lt;/p&gt;
-
-&lt;h2 id=&quot;faster-lookups&quot;&gt;Faster Lookups&lt;/h2&gt;
-
-&lt;p&gt;One of the simpler optimizations that we implemented for faster cardinality
-calculations was to use lookups for register values. Instead of computing the
-actual register value by summing the register offset with the stored register
-value, we instead perform a lookup into a precalculated map. Similarly, to
-determine the number of zeros in a register value, we created a secondary
-lookup table. Given the number of registers we have, the cost of storing these
-lookup tables is near trivial. This problem is often known as the &lt;a href=&quot;http://en.wikipedia.org/wiki/Hamming_weight&quot;&gt;Hamming
-Weight problem&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;lessons&quot;&gt;Lessons&lt;/h2&gt;
-
-&lt;p&gt;Many of our optimizations came out of necessity, both to provide the
-interactive query latencies that Druid users have come to expect, and to keep
-our storage costs reasonable. If you have any further improvements to our
-optimizations, please share them with us! We strongly believe that as data sets
-get increasingly larger, estimation algorithms are key to keeping query times
-acceptable. The approximate algorithm space remains relatively new, but it is
-something we can build together.&lt;/p&gt;
-
-&lt;p&gt;For more information on Druid, please visit &lt;a href=&quot;http://druid.io/&quot;&gt;druid.io&lt;/a&gt; and follow
-&lt;a href=&quot;https://twitter.com/druidio&quot;&gt;@druidio&lt;/a&gt;. We’d also like to thank Eric Tschetter and Xavier Léauté
-for their contributions to this work.  Featured image courtesy of &lt;a href=&quot;http://donasdays.blogspot.com/2012/10/are-you-sprinter-or-long-distance-runner.html&quot;&gt;Donna L
-Martin&lt;/a&gt;.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>RDruid and Twitterstream</title>
-		<link href="http://druid.io/blog/2014/02/03/rdruid-and-twitterstream.html"/>
-		<updated>2014-02-03T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2014/02/03/rdruid-and-twitterstream</id>
-                <author><name>Igal Levy</name></author>
-                <summary type="html">&lt;p&gt;What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&amp;#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&amp;#39;s what combining the prowess of a &lt;a href=&quot;http://druid.io&quot;&gt;Druid database&lt;/a&gt; with the power of &lt;a href=&quot;http://www.r-project.org&quot;&gt;R&lt;/a&gt; can do.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;What if you could combine a statistical analysis language with the power of an analytics database for instant insights into realtime data? You&amp;#39;d be able to draw conclusions from analyzing data streams at the speed of now. That&amp;#39;s what combining the prowess of a &lt;a href=&quot;http://druid.io&quot;&gt;Druid database&lt;/a&gt; with the power of &lt;a href=&quot;http://www.r-project.org&quot;&gt;R&lt;/a&gt; can do.&lt;/p&gt;
-
-&lt;p&gt;In this blog, we&amp;#39;ll look at how to bring streamed realtime data into R using nothing more than a laptop, an Internet connection, and open-source applications. And we&amp;#39;ll do it with &lt;em&gt;only one&lt;/em&gt; Druid node.&lt;/p&gt;
-
-&lt;h2 id=&quot;what-youll-need&quot;&gt;What You&amp;#39;ll Need&lt;/h2&gt;
-
-&lt;p&gt;You&amp;#39;ll need to download and unpack &lt;a href=&quot;http://static.druid.io/artifacts/releases/druid-services-0.6.52-bin.tar.gz&quot;&gt;Druid&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;Get the &lt;a href=&quot;http://www.r-project.org/&quot;&gt;R application&lt;/a&gt; for your platform.
-We also recommend using &lt;a href=&quot;http://www.rstudio.com/&quot;&gt;RStudio&lt;/a&gt; as the R IDE, which is what we used to run R.&lt;/p&gt;
-
-&lt;p&gt;You&amp;#39;ll also need a free Twitter account to be able to get a sample of streamed Twitter data.&lt;/p&gt;
-
-&lt;h2 id=&quot;set-up-the-twitterstream&quot;&gt;Set Up the Twitterstream&lt;/h2&gt;
-
-&lt;p&gt;First, register with the Twitter API. Log in at the &lt;a href=&quot;https://dev.twitter.com/apps/new&quot;&gt;Twitter developer&amp;#39;s site&lt;/a&gt; (you can use your normal Twitter credentials) and fill out the form for creating an application; use any website and callback URL to complete the form. &lt;/p&gt;
-
-&lt;p&gt;Make note of the API credentials that are then generated. Later you&amp;#39;ll need to enter them when prompted by the Twitter-example startup script, or save them in a &lt;code&gt;twitter4j.properties&lt;/code&gt; file (nicer if you ever restart the server). If using a properties file, save it under &lt;code&gt;$DRUID_HOME/examples/twitter&lt;/code&gt;. The file should contains the following (using your real keys):&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;oauth.consumerKey=&amp;lt;yourTwitterConsumerKey&amp;gt;
-oauth.consumerSecret=&amp;lt;yourTwitterConsumerSecret&amp;gt;
-oauth.accessToken=&amp;lt;yourTwitterAccessToken&amp;gt;
-oauth.accessTokenSecret=&amp;lt;yourTwitterAccessTokenSecret&amp;gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;start-up-the-realtime-node&quot;&gt;Start Up the Realtime Node&lt;/h2&gt;
-
-&lt;p&gt;From the Druid home directory, start the Druid Realtime node:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;$DRUID_HOME/run_example_server.sh
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;When prompted, you&amp;#39;ll choose the &amp;quot;twitter&amp;quot; example. If you&amp;#39;re using the properties file, the server should start right up. Otherwise, you&amp;#39;ll have to answer the prompts with the credentials you obtained from Twitter. &lt;/p&gt;
-
-&lt;p&gt;After the Realtime node starts successfully, you should see &amp;quot;Connected_to_Twitter&amp;quot; printed, as well as messages similar to the following:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;2014-01-13 19:35:59,646 INFO [chief-twitterstream] druid.examples.twitter.TwitterSpritzerFirehoseFactory - nextRow() has returned 1,000 InputRows
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;This indicates that the Druid Realtime node is ingesting tweets in realtime.&lt;/p&gt;
-
-&lt;h2 id=&quot;set-up-r&quot;&gt;Set Up R&lt;/h2&gt;
-
-&lt;p&gt;Install and load the following packages:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;install.packages(&amp;quot;devtools&amp;quot;)
-install.packages(&amp;quot;ggplot2&amp;quot;)
-
-library(&amp;quot;devtools&amp;quot;)
-
-install_github(&amp;quot;RDruid&amp;quot;, &amp;quot;metamx&amp;quot;)
-
-library(RDruid)
-library(ggplot2)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Now tell RDruid where to find the Realtime node:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;druid &amp;lt;- druid.url(&amp;quot;localhost:8083&amp;quot;)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;querying-the-realtime-node&quot;&gt;Querying the Realtime Node&lt;/h2&gt;
-
-&lt;p&gt;&lt;a href=&quot;http://druid.io/docs/latest/Tutorial:-All-About-Queries.html&quot;&gt;Druid queries&lt;/a&gt; are in the format of JSON objects, but in R they&amp;#39;ll have a different format. Let&amp;#39;s look at this with a simple query that will give the time range of the Twitter data currently in our Druid node:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;&amp;gt; druid.query.timeBoundary(druid, dataSource=&amp;quot;twitterstream&amp;quot;, intervals=interval(ymd(20140101), ymd(20141231)), verbose=&amp;quot;true&amp;quot;)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Let&amp;#39;s break this query down:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;&lt;code&gt;druid.query.timeBoundary&lt;/code&gt; &amp;ndash; The RDruid query that finds the earliest and latest timestamps on data in Druid, within a specified interval.&lt;/li&gt;
-&lt;li&gt;&lt;code&gt;druid&lt;/code&gt; and &lt;code&gt;dataSource&lt;/code&gt; &amp;ndash; Specify the location of the Druid node and the name of the Twitter data stream.&lt;/li&gt;
-&lt;li&gt;&lt;code&gt;intervals&lt;/code&gt; &amp;ndash; The interval we&amp;#39;re looking in. Our choice is more than wide enough to encompass any data we&amp;#39;ve received from Twitter.&lt;/li&gt;
-&lt;li&gt;&lt;code&gt;verbose&lt;/code&gt; &amp;ndash; The response should also print the JSON object that is posted to the Realtime node, that node&amp;#39;s HTTP response, and possibly other information besides the actual response to the query.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;By making this a verbose query, we can take a look at the JSON object that RDruid creates from our R query and will post to the Druid node:&lt;/p&gt;
-
-&lt;p&gt;{
-    &amp;quot;dataSource&amp;quot; : &amp;quot;twitterstream&amp;quot;,
-    &amp;quot;intervals&amp;quot; : [
-        &amp;quot;2014-01-01T00:00:00.000+00:00/2014-12-31T00:00:00.000+00:00&amp;quot;
-    ],
-    &amp;quot;queryType&amp;quot; : &amp;quot;timeBoundary&amp;quot;
-}&lt;/p&gt;
-
-&lt;p&gt;This is the type of query that Druid can understand. Now let&amp;#39;s look at the rest of the post and response:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;* Adding handle: conn: 0x7fa1eb723800
-* Adding handle: send: 0
-* Adding handle: recv: 0
-* Curl_addHandleToPipeline: length: 1
-* - Conn 2 (0x7fa1eb723800) send_pipe: 1, recv_pipe: 0
-* About to connect() to localhost port 8083 (#2)
-*   Trying ::1...
-* Connected to localhost (::1) port 8083 (#2)
-&amp;gt; POST /druid/v2/ HTTP/1.1
-Host: localhost:8083
-Accept: */*
-Accept-Encoding: gzip
-Content-Type: application/json
-Content-Length: 151
-
-* upload completely sent off: 151 out of 151 bytes
-&amp;lt; HTTP/1.1 200 OK
-&amp;lt; Content-Type: application/x-javascript
-&amp;lt; Transfer-Encoding: chunked
-* Server Jetty(8.1.11.v20130520) is not blacklisted
-&amp;lt; Server: Jetty(8.1.11.v20130520)
-&amp;lt; 
-* Connection #2 to host localhost left intact
-                  minTime                   maxTime 
-&amp;quot;2014-01-25 00:52:00 UTC&amp;quot; &amp;quot;2014-01-25 01:35:00 UTC&amp;quot; 
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;At the very end comes the response to our query, a minTime and maxTime, the boundaries to our data set.&lt;/p&gt;
-
-&lt;h3 id=&quot;more-complex-queries&quot;&gt;More Complex Queries&lt;/h3&gt;
-
-&lt;p&gt;Now lets look at some real Twitter data. Say we are interested in the number of tweets per language during that time period. We need to do an aggregation via a groupBy query (see RDruid help in RStudio):&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;druid.query.groupBy(druid, dataSource=&amp;quot;twitterstream&amp;quot;, 
-                    interval(ymd(&amp;quot;2014-01-01&amp;quot;), ymd(&amp;quot;2015-01-01&amp;quot;)), 
-                    granularity=granularity(&amp;quot;P1D&amp;quot;), 
-                    aggregations = (tweets = sum(metric(&amp;quot;tweets&amp;quot;))), 
-                    dimensions = &amp;quot;lang&amp;quot;, 
-                    verbose=&amp;quot;true&amp;quot;)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;We see some new arguments in this query:&lt;/p&gt;
-
-&lt;ul&gt;
-&lt;li&gt;&lt;code&gt;granularity&lt;/code&gt; &amp;ndash; This sets the time period for each aggregation (in ISO 8601). Since all our data is in one day and we don&amp;#39;t care about breaking down by hour or minute, we choose per-day granularity.&lt;/li&gt;
-&lt;li&gt;&lt;code&gt;aggregations&lt;/code&gt; &amp;ndash; This is where we specify and name the metrics that we&amp;#39;re interesting in summing up. We wants tweets, and it just so happens that this metric is named &amp;quot;tweets&amp;quot; as it&amp;#39;s mapped from the twitter API, so we&amp;#39;ll keep that name as the column head for our output.&lt;/li&gt;
-&lt;li&gt;&lt;code&gt;dimension&lt;/code&gt; &amp;ndash; Here&amp;#39;s the actual type of data we&amp;#39;re interesting in. Tweets are identified by language in their metadata (using ISO 639 language codes). We use the name of the dimension, &amp;quot;lang,&amp;quot; to slice the data along language.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Here&amp;#39;s the actual output:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;{
-    &amp;quot;intervals&amp;quot; : [
-        &amp;quot;2014-01-01T00:00:00.000+00:00/2015-01-01T00:00:00.000+00:00&amp;quot;
-    ],
-    &amp;quot;aggregations&amp;quot; : [
-        {
-            &amp;quot;type&amp;quot; : &amp;quot;doubleSum&amp;quot;,
-            &amp;quot;name&amp;quot; : &amp;quot;tweets&amp;quot;,
-            &amp;quot;fieldName&amp;quot; : &amp;quot;tweets&amp;quot;
-        }
-    ],
-    &amp;quot;dataSource&amp;quot; : &amp;quot;twitterstream&amp;quot;,
-    &amp;quot;filter&amp;quot; : null,
-    &amp;quot;having&amp;quot; : null,
-    &amp;quot;granularity&amp;quot; : {
-        &amp;quot;type&amp;quot; : &amp;quot;period&amp;quot;,
-        &amp;quot;period&amp;quot; : &amp;quot;P1D&amp;quot;,
-        &amp;quot;origin&amp;quot; : null,
-        &amp;quot;timeZone&amp;quot; : null
-    },
-    &amp;quot;dimensions&amp;quot; : [
-        &amp;quot;lang&amp;quot;
-    ],
-    &amp;quot;postAggregations&amp;quot; : null,
-    &amp;quot;limitSpec&amp;quot; : null,
-    &amp;quot;queryType&amp;quot; : &amp;quot;groupBy&amp;quot;,
-    &amp;quot;context&amp;quot; : null
-}
-* Adding handle: conn: 0x7fa1eb767600
-* Adding handle: send: 0
-* Adding handle: recv: 0
-* Curl_addHandleToPipeline: length: 1
-* - Conn 3 (0x7fa1eb767600) send_pipe: 1, recv_pipe: 0
-* About to connect() to localhost port 8083 (#3)
-*   Trying ::1...
-* Connected to localhost (::1) port 8083 (#3)
-&amp;gt; POST /druid/v2/ HTTP/1.1
-Host: localhost:8083
-Accept: */*
-Accept-Encoding: gzip
-Content-Type: application/json
-Content-Length: 489
-
-* upload completely sent off: 489 out of 489 bytes
-&amp;lt; HTTP/1.1 200 OK
-&amp;lt; Content-Type: application/x-javascript
-&amp;lt; Transfer-Encoding: chunked
-* Server Jetty(8.1.11.v20130520) is not blacklisted
-&amp;lt; Server: Jetty(8.1.11.v20130520)
-&amp;lt; 
-* Connection #3 to host localhost left intact
-    timestamp tweets  lang
-1  2014-01-25   6476    ar
-2  2014-01-25      1    bg
-3  2014-01-25     22    ca
-4  2014-01-25     10    cs
-5  2014-01-25     21    da
-6  2014-01-25    311    de
-7  2014-01-25     23    el
-8  2014-01-25  74842    en
-9  2014-01-25     20 en-GB
-10 2014-01-25    690 en-gb
-11 2014-01-25  22920    es
-12 2014-01-25      2    eu
-13 2014-01-25      2    fa
-14 2014-01-25     10    fi
-15 2014-01-25     36   fil
-16 2014-01-25   1521    fr
-17 2014-01-25      9    gl
-18 2014-01-25     15    he
-19 2014-01-25      1    hi
-20 2014-01-25      5    hu
-21 2014-01-25   3809    id
-22 2014-01-25      4    in
-23 2014-01-25    256    it
-24 2014-01-25  19748    ja
-25 2014-01-25   1079    ko
-26 2014-01-25      1    ms
-27 2014-01-25     19   msa
-28 2014-01-25    243    nl
-29 2014-01-25     24    no
-30 2014-01-25    113    pl
-31 2014-01-25  12707    pt
-32 2014-01-25      3    ro
-33 2014-01-25   1606    ru
-34 2014-01-25      1    sr
-35 2014-01-25     76    sv
-36 2014-01-25    532    th
-37 2014-01-25   1415    tr
-38 2014-01-25     30    uk
-39 2014-01-25      6 xx-lc
-40 2014-01-25      1 zh-CN
-41 2014-01-25     30 zh-cn
-42 2014-01-25     34 zh-tw
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;This gives an idea of what languages dominate Twitter (at least for the given time range). For visualization, you can use a library like ggplot2. Try the &lt;code&gt;geom_bar&lt;/code&gt; function to quickly produce a basic bar chart of the data. First, send the query above to a dataframe (let&amp;#39;s call it &lt;code&gt;tweet_langs&lt;/code&gt; in this example), then subset it to take languages with more than a thousand tweets:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;major_tweet_langs &amp;lt;- subset(tweet_langs, tweets &amp;gt; 1000)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Then create the chart:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;ggplot(major_tweet_langs, aes(x=lang, y=tweets)) + geom_bar(stat=&amp;quot;identity&amp;quot;)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;You can refine this query with more aggregations and post aggregations (math within the results). For example, to find out how many rows in Druid the data for each of those languages takes, use:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;druid.query.groupBy(druid, dataSource=&amp;quot;twitterstream&amp;quot;, 
-                    interval(ymd(&amp;quot;2014-01-01&amp;quot;), ymd(&amp;quot;2015-01-01&amp;quot;)), 
-                    granularity=granularity(&amp;quot;all&amp;quot;), 
-                    aggregations = list(rows = druid.count(), 
-                                        tweets = sum(metric(&amp;quot;tweets&amp;quot;))), 
-                    dimensions = &amp;quot;lang&amp;quot;, 
-                    verbose=&amp;quot;true&amp;quot;)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;metrics-and-dimensions&quot;&gt;Metrics and Dimensions&lt;/h2&gt;
-
-&lt;p&gt;How do you find out what metrics and dimensions are available to query? You can find the metrics in &lt;code&gt;$DRUID_HOME/examples/twitter/twitter_realtime.spec&lt;/code&gt;. The dimensions are not as apparent. There&amp;#39;s an easy way to query for them from a certain type of Druid node, but not from a Realtime node, which leaves the less-appetizing approach of digging through &lt;a href=&quot;https://github.com/metamx/druid/blob/druid-0.5.x/examples/src/main/java/druid/exa [...]
-
-&lt;ul&gt;
-&lt;li&gt;&amp;quot;first_hashtag&amp;quot;&lt;/li&gt;
-&lt;li&gt;&amp;quot;user_time_zone&amp;quot;&lt;/li&gt;
-&lt;li&gt;&amp;quot;user_location&amp;quot;&lt;/li&gt;
-&lt;li&gt;&amp;quot;is_retweet&amp;quot;&lt;/li&gt;
-&lt;li&gt;&amp;quot;is_viral&amp;quot;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Some interesting analyses on current events could be done using these dimensions and metrics. For example, you could filter on specific hashtags for events that happen to be spiking at the time:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;druid.query.groupBy(druid, dataSource=&amp;quot;twitterstream&amp;quot;, 
-                interval(ymd(&amp;quot;2014-01-01&amp;quot;), ymd(&amp;quot;2015-01-01&amp;quot;)), 
-                granularity=granularity(&amp;quot;P1D&amp;quot;), 
-                aggregations = (tweets = sum(metric(&amp;quot;tweets&amp;quot;))), 
-                filter =
-                    dimension(&amp;quot;first_hashtag&amp;quot;) %~% &amp;quot;academyawards&amp;quot; |
-                    dimension(&amp;quot;first_hashtag&amp;quot;) %~% &amp;quot;oscars&amp;quot;,
-                dimensions   = list(&amp;quot;first_hashtag&amp;quot;))
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;See the &lt;a href=&quot;https://github.com/metamx/RDruid/wiki/Examples&quot;&gt;RDruid wiki&lt;/a&gt; for more examples.&lt;/p&gt;
-
-&lt;p&gt;The point to remember is that this data is being streamed into Druid and brought into R via RDruid in realtime. For example, with an R script the data could be continuously queried, updated, and analyzed. &lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Querying Your Data</title>
-		<link href="http://druid.io/blog/2013/11/04/querying-your-data.html"/>
-		<updated>2013-11-04T00:00:00-08:00</updated>
-                <id>http://druid.io/blog/2013/11/04/querying-your-data</id>
-                <author><name>Russell Jurney</name></author>
-                <summary type="html">&lt;p&gt;Before we start querying druid, we&amp;#39;re going to finish setting up a complete cluster on localhost. In our previous posts, we setup a Realtime node. In this tutorial we will also setup the other Druid node types: Compute, Master and Broker.&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;Before we start querying druid, we&amp;#39;re going to finish setting up a complete cluster on localhost. In our previous posts, we setup a Realtime node. In this tutorial we will also setup the other Druid node types: Compute, Master and Broker.&lt;/p&gt;
-
-&lt;h2 id=&quot;booting-a-broker-node&quot;&gt;Booting a Broker Node&lt;/h2&gt;
-
-&lt;ol&gt;
-&lt;li&gt;Setup a config file at config/broker/runtime.properties that looks like this: &lt;a href=&quot;https://gist.github.com/rjurney/5818837&quot;&gt;https://gist.github.com/rjurney/5818837&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;Run the broker node:&lt;/li&gt;
-&lt;/ol&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;java -Xmx256m -Duser.timezone&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTC -Dfile.encoding&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTF-8 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--Ddruid.realtime.specFile&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;realtime.spec &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--classpath services/target/druid-services-0.5.6-SNAPSHOT-selfcontained.jar:config/broker &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
-com.metamx.druid.http.BrokerMain
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;booting-a-master-node&quot;&gt;Booting a Master Node&lt;/h2&gt;
-
-&lt;ol&gt;
-&lt;li&gt;Setup a config file at config/master/runtime.properties that looks like this: &lt;a href=&quot;https://gist.github.com/rjurney/5818870&quot;&gt;https://gist.github.com/rjurney/5818870&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;Run the master node:&lt;/li&gt;
-&lt;/ol&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;java -Xmx256m -Duser.timezone&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTC -Dfile.encoding&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTF-8 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--classpath services/target/druid-services-0.5.6-SNAPSHOT-selfcontained.jar:config/master &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
-com.metamx.druid.http.MasterMain
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;booting-a-realtime-node&quot;&gt;Booting a Realtime Node&lt;/h2&gt;
-
-&lt;ol&gt;
-&lt;li&gt;&lt;p&gt;Setup a config file at config/realtime/runtime.properties that looks like this: &lt;a href=&quot;https://gist.github.com/rjurney/5818774&quot;&gt;https://gist.github.com/rjurney/5818774&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;Setup a realtime.spec file like this: &lt;a href=&quot;https://gist.github.com/rjurney/5818779&quot;&gt;https://gist.github.com/rjurney/5818779&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
-&lt;li&gt;&lt;p&gt;Run the realtime node:&lt;/p&gt;&lt;/li&gt;
-&lt;/ol&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;java -Xmx256m -Duser.timezone&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTC -Dfile.encoding&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTF-8 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--Ddruid.realtime.specFile&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;realtime.spec &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--classpath services/target/druid-services-0.5.6-SNAPSHOT-selfcontained.jar:config/realtime &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
-com.metamx.druid.realtime.RealtimeMain
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;booting-a-compute-node&quot;&gt;Booting a Compute Node&lt;/h2&gt;
-
-&lt;ol&gt;
-&lt;li&gt;Setup a config file at config/compute/runtime.properties that looks like this: &lt;a href=&quot;https://gist.github.com/rjurney/5818885&quot;&gt;https://gist.github.com/rjurney/5818885&lt;/a&gt;&lt;/li&gt;
-&lt;li&gt;Run the compute node:&lt;/li&gt;
-&lt;/ol&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;java -Xmx256m -Duser.timezone&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTC -Dfile.encoding&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;UTF-8 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--classpath services/target/druid-services-0.5.6-SNAPSHOT-selfcontained.jar:config/compute &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
-com.metamx.druid.http.ComputeMain
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h1 id=&quot;querying-your-data&quot;&gt;Querying Your Data&lt;/h1&gt;
-
-&lt;p&gt;Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to &lt;a href=&quot;http://druid.io/blog/2013/08/30/loading-data.html&quot;&gt;Loading Your Data&lt;/a&gt;. Having done that, its time to query our data!&lt;/p&gt;
-
-&lt;h2 id=&quot;querying-different-nodes&quot;&gt;Querying Different Nodes&lt;/h2&gt;
-
-&lt;p&gt;As a shared-nothing system, there are three ways to query druid, against the Realtime, Compute or Broker node. Querying a Realtime node returns only realtime data, querying a compute node returns only historical segments. Querying the broker will query both realtime and compute segments and compose an overall result for the query. This is the normal mode of operation for queries in druid.&lt;/p&gt;
-
-&lt;h3 id=&quot;construct-a-query&quot;&gt;Construct a Query&lt;/h3&gt;
-
-&lt;p&gt;For constructing this query, see below at: Querying Against the realtime.spec&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;queryType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;groupBy&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;granularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimensions&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[],&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregations&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt;&lt;span class=& [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;longSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt;&lt;span class [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class [...]
-    &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;intervals&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;querying-the-realtime-node&quot;&gt;Querying the Realtime Node&lt;/h3&gt;
-
-&lt;p&gt;Run our query against port 8080:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;curl -X POST &lt;span class=&quot;s2&quot;&gt;&amp;quot;http://localhost:8080/druid/v2/?pretty&amp;quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--H &lt;span class=&quot;s1&quot;&gt;&amp;#39;content-type: application/json&amp;#39;&lt;/span&gt; -d @query.body
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;See our result:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;15000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;querying-the-compute-node&quot;&gt;Querying the Compute Node&lt;/h3&gt;
-
-&lt;p&gt;Run the query against port 8082:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;curl -X POST &lt;span class=&quot;s2&quot;&gt;&amp;quot;http://localhost:8082/druid/v2/?pretty&amp;quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--H &lt;span class=&quot;s1&quot;&gt;&amp;#39;content-type: application/json&amp;#39;&lt;/span&gt; -d @query.body
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;And get (similar to):&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;27&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;77000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;querying-both-nodes-via-the-broker&quot;&gt;Querying both Nodes via the Broker&lt;/h3&gt;
-
-&lt;p&gt;Run the query against port 8083:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span&gt;&lt;/span&gt;curl -X POST &lt;span class=&quot;s2&quot;&gt;&amp;quot;http://localhost:8083/druid/v2/?pretty&amp;quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
--H &lt;span class=&quot;s1&quot;&gt;&amp;#39;content-type: application/json&amp;#39;&lt;/span&gt; -d @query.body
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;And get:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;15000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Now that we know what nodes can be queried (although you should usually use the broker node), lets learn how to know what queries are available.&lt;/p&gt;
-
-&lt;h2 id=&quot;querying-against-the-realtime-spec&quot;&gt;Querying Against the realtime.spec&lt;/h2&gt;
-
-&lt;p&gt;How are we to know what queries we can run? Although &lt;a href=&quot;http://druid.io/docs/latest/Querying.html&quot;&gt;Querying&lt;/a&gt; is a helpful index, to get a handle on querying our data we need to look at our Realtime node&amp;#39;s realtime.spec file:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;schema&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-               &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregators&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&l [...]
-                                  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt; [...]
-               &lt;span class=&quot;nt&quot;&gt;&amp;quot;indexGranularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;minute&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-           &lt;span class=&quot;nt&quot;&gt;&amp;quot;shardSpec&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;none&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;config&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;maxRowsInMemory&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;500000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-               &lt;span class=&quot;nt&quot;&gt;&amp;quot;intermediatePersistPeriod&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;PT10m&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;firehose&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;kafka-0.7.2&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                 &lt;span class=&quot;nt&quot;&gt;&amp;quot;consumerProps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;zk.connect&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;localhost:2181&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;zk.connectiontimeout.ms&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;15000&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;zk.sessiontimeout.ms&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;15000&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;zk.synctime.ms&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;5000&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;groupid&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;topic-pixel-local&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;fetch.size&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;1048586&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;autooffset.reset&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;largest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                                     &lt;span class=&quot;nt&quot;&gt;&amp;quot;autocommit.enable&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;false&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-                 &lt;span class=&quot;nt&quot;&gt;&amp;quot;feed&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                 &lt;span class=&quot;nt&quot;&gt;&amp;quot;parser&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestampSpec&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;column&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class [...]
-                              &lt;span class=&quot;nt&quot;&gt;&amp;quot;data&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;format&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;json&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-                              &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimensionExclusions&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;plumber&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;realtime&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                &lt;span class=&quot;nt&quot;&gt;&amp;quot;windowPeriod&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;PT10m&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                &lt;span class=&quot;nt&quot;&gt;&amp;quot;segmentGranularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;hour&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                &lt;span class=&quot;nt&quot;&gt;&amp;quot;basePersistDirectory&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;/tmp/realtime/basePersist&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-                &lt;span class=&quot;nt&quot;&gt;&amp;quot;rejectionPolicy&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;messageTime&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-
-&lt;span class=&quot;p&quot;&gt;}]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;datasource&quot;&gt;dataSource&lt;/h3&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Our dataSource tells us the name of the relation/table, or &amp;#39;source of data&amp;#39;, to query in both our realtime.spec and query.body!&lt;/p&gt;
-
-&lt;h3 id=&quot;aggregations&quot;&gt;aggregations&lt;/h3&gt;
-
-&lt;p&gt;Note the &lt;a href=&quot;http://druid.io/docs/latest/Aggregations.html&quot;&gt;aggregations&lt;/a&gt; in our query:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;    &lt;span class=&quot;s2&quot;&gt;&amp;quot;aggregations&amp;quot;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt;&lt;span class=& [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;longSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt;&lt;span class [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class [...]
-    &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;,&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;this matches up to the aggregators in the schema of our realtime.spec!&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;aggregators&amp;quot;&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span cla [...]
-                                  &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt; [...]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;dimensions&quot;&gt;dimensions&lt;/h3&gt;
-
-&lt;p&gt;Lets look back at our actual records (from &lt;a href=&quot;http://druid.io/blog/2013/08/30/loading-data.html&quot;&gt;Loading Your Data&lt;/a&gt;:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;utcdt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T01:01:01&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp; [...]
-&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;utcdt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T01:01:02&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;utcdt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T01:01:03&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;utcdt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T01:01:04&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;utcdt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T01:01:05&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Note that we have two dimensions to our data, other than our primary metric, wp. They are &amp;#39;gender&amp;#39; and &amp;#39;age&amp;#39;. We can specify these in our query! Note that we have added a dimension: age, below.&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;queryType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;groupBy&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;granularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimensions&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregations&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt;&lt;span class=& [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;longSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt;&lt;span class [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class [...]
-    &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;intervals&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Which gets us grouped data in return!&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;100&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;20&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;30&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;4000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;40&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;5000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;age&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;50&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;2000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h3 id=&quot;filtering&quot;&gt;filtering&lt;/h3&gt;
-
-&lt;p&gt;Now that we&amp;#39;ve observed our dimensions, we can also filter:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;queryType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;groupBy&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;dataSource&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;druidtest&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;granularity&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;all&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;filter&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;selector&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;dimension&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;gender&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-        &lt;span class=&quot;nt&quot;&gt;&amp;quot;value&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;male&amp;quot;&lt;/span&gt;
-    &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregations&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt;&lt;span class=& [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;longSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt;&lt;span class [...]
-        &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;quot;type&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;doubleSum&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt;&lt;span class [...]
-    &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;intervals&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00/2020-01-01T00&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Which gets us just people aged 40:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;&lt;span&gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;version&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;v1&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;timestamp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;2010-01-01T00:00:00.000Z&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-  &lt;span class=&quot;nt&quot;&gt;&amp;quot;event&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;imps&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;wp&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;9000.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
-    &lt;span class=&quot;nt&quot;&gt;&amp;quot;rows&amp;quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
-  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Check out &lt;a href=&quot;http://druid.io/docs/latest/Filters.html&quot;&gt;Filters&lt;/a&gt; for more.&lt;/p&gt;
-
-&lt;h2 id=&quot;learn-more&quot;&gt;Learn More&lt;/h2&gt;
-
-&lt;p&gt;Finally, you can learn more about querying at &lt;a href=&quot;http://druid.io/docs/latest/Querying.html&quot;&gt;Querying&lt;/a&gt;!&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Druid at XLDB</title>
-		<link href="http://druid.io/blog/2013/09/20/druid-at-xldb.html"/>
-		<updated>2013-09-20T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2013/09/20/druid-at-xldb</id>
-                <author><name>Russell Jurney</name></author>
-                <summary type="html">&lt;p&gt;We recently attended &lt;a href=&quot;http://www.xldb.org/&quot;&gt;Stanford XLDB&lt;/a&gt; and the experience was a blast. Once a year, XLDB invites speakers from different organizations to discuss the challenges of and solutions to dealing with Xtreme (with an X!) data sets. This year, Jeff Dean dropped knowledge bombs about architecting scalable systems, Michael Stonebraker provided inspiring advice about growing open source projects, CERN [...]
-</summary>
-		<content type="html">&lt;p&gt;We recently attended &lt;a href=&quot;http://www.xldb.org/&quot;&gt;Stanford XLDB&lt;/a&gt; and the experience was a blast. Once a year, XLDB invites speakers from different organizations to discuss the challenges of and solutions to dealing with Xtreme (with an X!) data sets. This year, Jeff Dean dropped knowledge bombs about architecting scalable systems, Michael Stonebraker provided inspiring advice about growing open source projects, CERN explained how [...]
-
-&lt;p&gt;We attended XLDB to teach our very first Druid tutorial session. Battling an alarm clock that went off far too early (for engineers anyway) and braving the insanity that is highway 101 morning traffic, most of us &lt;em&gt;almost&lt;/em&gt; managed to show up on time for our session.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;http://distilleryimage3.ak.instagram.com/ce5ff7c4197111e3b2e322000a1f9a5c_7.jpg&quot; alt=&quot;Druid Users at XLDB&quot;&gt;&lt;/p&gt;
-
-&lt;p&gt;The focus of our tutorial is to educate people on why we built Druid, how Druid is architected, and how to build applications on top of Druid. The tutorial has several hands-on sections about how to spin up and load data into a Druid cluster. For &lt;a href=&quot;http://www.r-project.org/&quot;&gt;R&lt;/a&gt; enthusiasts out there, there is a section about building an R application for data analysis using Druid. Check out our slides below:&lt;/p&gt;
-
-&lt;script async=&quot;&quot; class=&quot;speakerdeck-embed&quot; data-id=&quot;50c52830fc7301302f630ada113e7e19&quot; data-ratio=&quot;1.72972972972973&quot; src=&quot;//speakerdeck.com/assets/embed.js&quot;&gt;&lt;/script&gt;
-
-&lt;p&gt;We are constantly trying improve the Druid educational process. In the future, we hope to refine and repeat this talk at other cool conferences.&lt;/p&gt;
-</content>
-	</entry>
-	
-	<entry>
-		<title>Launching Druid With Apache Whirr</title>
-		<link href="http://druid.io/blog/2013/09/19/launching-druid-with-apache-whirr.html"/>
-		<updated>2013-09-19T00:00:00-07:00</updated>
-                <id>http://druid.io/blog/2013/09/19/launching-druid-with-apache-whirr</id>
-                <author><name>Russell Jurney</name></author>
-                <summary type="html">&lt;p&gt;Without Whirr, to launch a Druid cluster, you&amp;#39;d have to provision machines yourself, and then install each node type manually. This process is outlined &lt;a href=&quot;https://github.com/metamx/druid/wiki/Tutorial%3A-The-Druid-Cluster&quot;&gt;here&lt;/a&gt;. With Whirr, you can boot a druid cluster by editing a simple configuration file and then issuing a single command!&lt;/p&gt;
-</summary>
-		<content type="html">&lt;p&gt;Without Whirr, to launch a Druid cluster, you&amp;#39;d have to provision machines yourself, and then install each node type manually. This process is outlined &lt;a href=&quot;https://github.com/metamx/druid/wiki/Tutorial%3A-The-Druid-Cluster&quot;&gt;here&lt;/a&gt;. With Whirr, you can boot a druid cluster by editing a simple configuration file and then issuing a single command!&lt;/p&gt;
-
-&lt;h2 id=&quot;about-druid&quot;&gt;About Druid&lt;/h2&gt;
-
-&lt;p&gt;Druid is a rockin&amp;#39; exploratory analytical data store capable of offering interactive query of big data in realtime - as data is ingested. Druid cost effectively drives 10&amp;#39;s of billions of events per day for the &lt;a href=&quot;http://www.metamarkets.com&quot;&gt;Metamarkets&lt;/a&gt; platform, and Metamarkets is committed to building Druid in open source.&lt;/p&gt;
-
-&lt;h2 id=&quot;about-apache-whirr&quot;&gt;About Apache Whirr&lt;/h2&gt;
-
-&lt;p&gt;Apache Whirr is a set of libraries for running cloud services. It allows you to use simple commands to boot clusters of distributed systems for testing and experimentation. Apache Whirr makes booting clusters easy.&lt;/p&gt;
-
-&lt;h2 id=&quot;installing-whirr&quot;&gt;Installing Whirr&lt;/h2&gt;
-
-&lt;p&gt;Until Druid is part of an Apache release (a month or two from now) of Whirr, you&amp;#39;ll need to clone the code from &lt;a href=&quot;https://github.com/rjurney/whirr/tree/trunk&quot;&gt;https://github.com/rjurney/whirr/tree/trunk&lt;/a&gt; and build Whirr.&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;git clone git@github.com:rjurney/whirr.git
-cd whirr
-git checkout trunk
-mvn clean install -Dmaven.test.failure.ignore=true
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;configuring-your-cloud-provider&quot;&gt;Configuring your Cloud Provider&lt;/h2&gt;
-
-&lt;p&gt;You&amp;#39;ll need to set these environment variables:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;export WHIRR_PROVIDER=aws-ec2
-export WHIRR_IDENTITY=$AWS_ACCESS_KEY_ID
-export WHIRR_CREDENTIAL=$AWS_SECRET_ACCESS_KEY
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;h2 id=&quot;build-properties&quot;&gt;build.properties&lt;/h2&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;cat recipes/druid.properties
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Much of the configuration is self explanatory:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;# Change the cluster name here
-whirr.cluster-name=druid
-
-# Change the number of machines in the cluster here
-whirr.instance-templates=1 zookeeper+druid-mysql+druid-master+druid-broker+druid-compute+druid-realtime
-# whirr.instance-templates=3 zookeeper,1 druid-mysql,2 druid-realtime,2 druid-broker,2 druid-master,5 druid-compute
-
-# Which version of druid to load
-whirr.druid.version=0.5.54
-
-# S3 bucket to store segments in
-whirr.druid.pusher.s3.bucket=dummy_s3_bucket
-
-# The realtime.spec file to use to configure a realtime node
-# whirr.druid.realtime.spec.path=/path/to/druid/examples/config/realtime/realtime.spec
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;Note that you can change a cluster&amp;#39;s configuration with the whirr.instance-templates parameter. This enables you to boot clusters large or small. Note that at least one zookeeper and druid-mysql nodes are required.&lt;/p&gt;
-
-&lt;h2 id=&quot;launching-a-druid-cluster-with-whirr&quot;&gt;Launching a Druid Cluster with Whirr&lt;/h2&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;bin/whirr launch-cluster --config recipes/druid.properties
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;When the cluster is ready, ssh instructions will print and we can connect and use the cluster. For more instructions on using a Druid cluster, see &lt;a href=&quot;https://github.com/metamx/druid/wiki/Querying-your-data&quot;&gt;here&lt;/a&gt;. To destroy a cluster when we&amp;#39;re done, run:&lt;/p&gt;
-&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&lt;span&gt;&lt;/span&gt;bin/whirr destroy-cluster --config recipes/druid.properties
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
-&lt;p&gt;We hope Apache Whirr makes experimenting with Druid easier than ever!&lt;/p&gt;
-</content>
-	</entry>
-	
-</feed>
diff --git a/robots.txt b/robots.txt
index a958ccd..f024d6b 100644
--- a/robots.txt
+++ b/robots.txt
@@ -1,4 +1,4 @@
-# robots.txt for http://druid.io
+# robots.txt for http://apache.druid.org
 
 # Keep robots from crawling old Druid doc versions
 
diff --git a/technology.html b/technology.html
index 5e36c33..8c59be2 100644
--- a/technology.html
+++ b/technology.html
@@ -214,7 +214,7 @@ Druid connects to a source of raw data, typically a message bus such as Apache K
   <img src="img/diagram-4.png" style="max-width: 580px;">
 </div>
 
-<p>For more information, please visit <a href="http://druid.io/docs/latest/ingestion/index.html">our docs page</a>.</p>
+<p>For more information, please visit <a href="/docs/latest/ingestion/index.html">our docs page</a>.</p>
 
 <h2 id="storage">Storage</h2>
 
@@ -232,7 +232,7 @@ This pre-aggregation step is known as <a href="/docs/latest/tutorials/tutorial-r
   <img src="img/diagram-5.png" style="max-width: 800px;">
 </div>
 
-<p>For more information, please visit <a href="http://druid.io/docs/latest/design/segments.html">our docs page</a>.</p>
+<p>For more information, please visit <a href="/docs/latest/design/segments.html">our docs page</a>.</p>
 
 <h2 id="querying">Querying</h2>
 
@@ -243,7 +243,7 @@ In addition to standard SQL operators, Druid supports unique operators that leve
   <img src="img/diagram-6.png" style="max-width: 580px;">
 </div>
 
-<p>For more information, please visit <a href="http://druid.io/docs/latest/querying/querying.html">our docs page</a>.</p>
+<p>For more information, please visit <a href="/docs/latest/querying/querying.html">our docs page</a>.</p>
 
 <h2 id="architecture">Architecture</h2>
 
@@ -259,7 +259,7 @@ For example, an operator can dedicate more resources to Druid’s ingestion proc
   <img src="img/diagram-7.png" style="max-width: 620px;">
 </div>
 
-<p>For more information, please visit <a href="http://druid.io/docs/latest/design/index.html">our docs page</a>.</p>
+<p>For more information, please visit <a href="/docs/latest/design/index.html">our docs page</a>.</p>
 
 <h2 id="operations">Operations</h2>
 
@@ -301,7 +301,7 @@ As such, Druid possesses several features to ensure uptime and no data loss.</p>
   </div>
 </div>
 
-<p>For more information, please visit <a href="http://druid.io/docs/latest/operations/recommendations.html">our docs page</a>.</p>
+<p>For more information, please visit <a href="/docs/latest/operations/recommendations.html">our docs page</a>.</p>
 
     </div>
   </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org