You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by fj...@apache.org on 2020/01/02 23:32:24 UTC
[druid-website-src] 01/01: a bunch of wording changes

This is an automated email from the ASF dual-hosted git repository.

fjy pushed a commit to branch b
in repository https://gitbox.apache.org/repos/asf/druid-website-src.git

commit 219091f197b6b0f6420ef4e3477188ac95d52199
Author: fjy <fa...@gmail.com>
AuthorDate: Thu Jan 2 15:32:13 2020 -0800

    a bunch of wording changes
---
 faq.md        | 72 ++++++++++++++++++++++++++++++++---------------------------
 index.html    |  2 +-
 technology.md |  2 +-
 use-cases.md  | 56 +++++++++++++---------------------------------
 4 files changed, 56 insertions(+), 76 deletions(-)

diff --git a/faq.md b/faq.md
index 8fd64c8..6334c79 100644
--- a/faq.md
+++ b/faq.md
@@ -17,37 +17,38 @@ traditional data warehouses cannot.
 
 Druid offers the following advantages over traditional data warehouses:
 
-* Low latency streaming ingest, and direct integration with messages buses such as
-Apache Kafka.
-* Time-based partitioning, which enables performant time-based
-queries.
-* Fast search and filter, for fast ad-hoc slice and dice.
-* Minimal schema design, and native support for semi-structured and nested data.
-
-Consider using Druid over a data warehouse if you have streaming data, and
-require low-latency ingest as well as low-latency queries. Also consider Druid
-if you need ad-hoc analytics. Druid is great for slice and dice and drill
-downs. Druid is also often used over a data warehouse to power interactive
-applications, where support for high concurrency queries is required.
-
-### Is Druid a SQL-on-Hadoop solution? When should I use Druid over Presto/Hive?
-
-Druid supports SQL and can load data from Hadoop, but is not a SQL-on-Hadoop
-system. Modern SQL-on-Hadoop solutions are used for the same use cases as data
-warehouses, except they are designed for architectures where compute and
-storage are separated systems, and data is loaded from storage into the compute
-layer as needed by queries.
-
-The previous section on Druid vs data warehouses also applies to Druid versus
-SQL-on-Hadoop solutions.
+* Much lower latency for OLAP-style queries
+* Much lower latency for data ingest (both streaming and batch)
+* Out-of-the-box integration with Apache Kakfa, AWS Kinesis, HDFS, AWS S3, and more
+* Time-based partitioning, which enables performant time-based queries
+* Fast search and filter, for fast slice and dice
+* Minimal schema design and native support for semi-structured and nested data
+
+Consider using Druid to augment your data warehouse if your use case requires:
+
+* Powering an user-facing application
+* Low-latency query response with high concurrency
+* Instant data visibility
+* Fast ad-hoc slice and dice
+* Streaming data
+
+To summarize, Druid shines when the use cases involves real-time analytics and
+where the end-user (technical or not) wants to apply numerous queries in rapid
+succession to explore or better understand data trends. 
 
 ### Is Druid a log aggregation/log search system? When should I use Druid over Elastic/Splunk?
 
-Druid uses inverted indexes (in particular, compressed bitmaps) for fast searching and filtering, but it is not generally considered a search system.
-While Druid contains many features commonly found in search systems, such as the ability to stream in structured and semi-structured data and the ability to search and filter the data, Druid isn’t commonly used to ingest text logs and run full text search queries over the text logs.
-However, Druid is often used to ingest and analyze semi-structured data such as JSON.
+Druid uses inverted indexes (in particular, compressed bitmaps) for fast
+searching and filtering, but it is not generally considered a search system.
+While Druid contains many features commonly found in search systems, such as
+the ability to stream in structured and semi-structured data and the ability to
+search and filter the data, Druid isn’t commonly used to ingest text logs and
+run full text search queries over the text logs.  However, Druid is often used
+to ingest and analyze semi-structured data such as JSON.
 
-Druid at its core is an analytics engine and as such, it can support numerical aggregations, groupBys (including multi-dimensional groupBys), and other analytic workloads faster and more efficiently than search systems.
+Druid at its core is an analytics engine and as such, it can support numerical
+aggregations, groupBys (including multi-dimensional groupBys), and other
+analytic workloads faster and more efficiently than search systems.
 
 ### Is Druid a timeseries database? When should I use Druid over InfluxDB/OpenTSDB/Prometheus?
 
@@ -62,17 +63,22 @@ from analytics databases and search systems, it can significantly
 outperformance TSDBs when grouping, searching, and filtering on tags that are
 not time, or when computing complex metrics such as histograms and quantiles.
 
+### Does Druid separate storage and compute?
+
+Druid creates an indexed copy of raw data that is highly optimized for
+analytic queries. Druid runs queries over this indexed data, called a ['segment'](/docs/latest/design/segments.html)
+in Druid, and does not pull raw data from an external storage system as needed
+by queries. 
 
 ### How is Druid deployed?
 
 Druid can be deployed on commodity hardware in any *NIX based environment.
-A Druid cluster consists of several different processes, each designed to do a small set of things very well (ingestion, querying, coordination, etc).
-Many of these processes can be co-located and deployed together on the same hardware as described [here](/docs/latest/tutorials/quickstart).
-
-Druid was initially created in the cloud, and runs well in AWS, GCP, Azure, and other cloud environments.
+A Druid cluster consists of several different services, each designed to do a small set of things very well (ingestion, querying, coordination, etc).
+Many of these services can be co-located and deployed together on the same hardware as described [here](/docs/latest/tutorials/quickstart).
 
+Druid was designed for the cloud, and runs well in AWS, GCP, Azure, and other cloud environments.
 
-### Where does Druid fit in my existing Hadoop-based data stack?
+### Where does Druid fit in my big data stack?
 
 Druid typically connects to a source of raw data such as a message bus such as Apache Kafka, or a filesystem such as HDFS.
 Druid ingests an optimized, column-oriented, indexed copy of your data and serves analytics workloads on top of it.
@@ -96,7 +102,7 @@ disk and memory and extend the amount of data a single node can load up to the
 size of its disks.
 
 Individual Historicals can be configured with the maximum amount of data
-they should be given.  Coupled with the Coordinator’s ability to assign data to
+they should be given. Coupled with the Coordinator’s ability to assign data to
 different “tiers” based on different query requirements, Druid is essentially a
 system that can be configured across a wide spectrum of performance
 requirements. All data can be in memory and processed, or data can be heavily
diff --git a/index.html b/index.html
index 80a5707..3f48cbb 100644
--- a/index.html
+++ b/index.html
@@ -57,7 +57,7 @@ canonical: 'https://druid.apache.org/'
           </p>
         </div>
         <div class="feature">
-          <span class="fa fa-globe fa"></span>
+          <span class="fa fa-cloud fa"></span>
           <h5>Deploy in AWS/GCP/Azure, hybrid clouds, Kubernetes, and bare metal</h5>
           <p>
             Druid can be deployed in any *NIX environment on commodity hardware, both in the cloud and on premise. Deploying Druid is easy: scaling up and down is as simple as adding and removing Druid services.
diff --git a/technology.md b/technology.md
index f26ea2e..8524c3d 100644
--- a/technology.md
+++ b/technology.md
@@ -6,7 +6,7 @@ canonical: 'https://druid.apache.org/technology'
 ---
 
 Apache Druid is an open source distributed data store.
-Druid’s core design combines ideas from [OLAP/analytic databases](https://en.wikipedia.org/wiki/Online_analytical_processing), [timeseries databases](https://en.wikipedia.org/wiki/Time_series_database), and [search systems](https://en.wikipedia.org/wiki/Full-text_search) to create a unified system for a broad range of [use cases](/use-cases). Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture.
+Druid’s core design combines ideas from [data warehouses](https://en.wikipedia.org/wiki/Data_warehouse), [timeseries databases](https://en.wikipedia.org/wiki/Time_series_database), and [search systems](https://en.wikipedia.org/wiki/Full-text_search) to create a unified system for real-time analytics for a broad range of [use cases](/use-cases). Druid merges key characteristics of each of the 3 systems into its ingestion layer, storage format, querying layer, and core architecture.
 
 <div class="image-large">
   <img src="img/diagram-2.png" style="max-width: 360px">
diff --git a/use-cases.md b/use-cases.md
index a84c5d7..633341c 100644
--- a/use-cases.md
+++ b/use-cases.md
@@ -5,47 +5,21 @@ sectionid: use-cases
 canonical: 'https://druid.apache.org/use-cases'
 ---
 
-## Streaming and operational data
-
-Apache Druid generally works well with any event-oriented, clickstream, timeseries, or telemetry data, especially streaming datasets from [Apache Kafka](https://kafka.apache.org/).
-Druid provides [exactly once consumption semantics](/docs/latest/development/extensions-core/kafka-ingestion) from Apache Kafka and is commonly used as a sink for event-oriented Kafka topics.
-
-Druid also works well for batch data sets.
-Organizations have deployed Druid to accelerate queries and power applications where the input data is one or more static files.
-Druid is a great fit if you are developing a user-facing application and you want your users to be able to self service their own questions.
-
-Some common high level use cases of Druid include:
-
-<div class="features">
-  <div class="feature">
-    <span class="fa fa-rocket fa"></span>
-    <h5>Analyze performance</h5>
-    <p>
-      Create interactive dashboards with full drill down capabilities. Analyze performance of digital products, track mobile app usage, or monitor site reliability.
-    </p>
-  </div>
-  <div class="feature">
-    <span class="fa fa-exclamation-triangle fa"></span>
-    <h5>Diagnose problems</h5>
-    <p>
-      Find the root cause of issues. Troubleshoot netflow bottlenecks, analyze security threats, or diagnose software crashes.
-    </p>
-  </div>
-  <div class="feature">
-    <span class="fa fa-users fa"></span>
-    <h5>Find commonalities</h5>
-    <p>
-      Find common attributes among events. Identify shared components in defective products, or determine patterns in top performing products.
-    </p>
-  </div>
-  <div class="feature">
-    <span class="fa fa-money-bill-wave-alt fa"></span>
-    <h5>Increase efficiency</h5>
-    <p>
-      Improve product engagement. Optimize ad-spend in digital marketing campaigns or increase user engagement in online products.
-    </p>
-  </div>
-</div>
+## Real-time analytics and intelligence
+
+Apache Druid is a database that is most often used for powering use cases where real-time ingest, fast query performance, and high uptime are important. As such, Druid is commonly used for powering GUIs of analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.
+
+Common application areas for Druid include:
+
+* Clickstream analytics (web and mobile analytics)
+* Risk/fraud analysis
+* Network telemetry analytics (network performance monitoring)
+* Server metrics storage
+* Supply chain analytics (manufacturing metrics)
+* Application performance metrics
+* Business intelligence / OLAP
+
+Some of these use cases are described in more detail below. For an overview of technical differentiation, please see the [FAQ](/faq).
 
 ## User activity and behavior
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org