You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@calcite.apache.org by jh...@apache.org on 2015/06/01 19:56:28 UTC
[14/19] incubator-calcite git commit: [CALCITE-355] Web site
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/stream.md
----------------------------------------------------------------------
diff --git a/doc/stream.md b/doc/stream.md
deleted file mode 100644
index 4052ac1..0000000
--- a/doc/stream.md
+++ /dev/null
@@ -1,631 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# Calcite SQL extensions for streaming
-
-## Introduction
-
-Streams are collections to records that flow continuously, and forever.
-Unlike tables, they are not typically stored on disk, but flow over the
-network and are held for short periods of time in memory.
-
-Streams complement tables because they represent what is happening in the
-present and future of the enterprise whereas tables represent the past.
-It is very common for a stream to be archived into a table.
-
-Like tables, you often want to query streams in a high-level language
-based on relational algebra, validated according to a schema, and optimized
-to take advantage of available resources and algorithms.
-
-Calcite's SQL is an extension to standard SQL, not another 'SQL-like' language.
-The distinction is important, for several reasons:
-* Streaming SQL is easy to learn for anyone who knows regular SQL.
-* The semantics are clear, because we aim to produce the same results on a
- stream as if the same data were in a table.
-* You can write queries that combine streams and tables (or the history of
- a stream, which is basically an in-memory table).
-* Lots of existing tools can generate standard SQL.
-
-## An example schema
-
-Our streaming SQL examples use the following schema:
-* `Orders (rowtime, productId, orderId, units)` - a stream and a table
-* `Products (rowtime, productId, name)` - a table
-* `Shipments (rowtime, orderId)` - a stream
-
-## A simple query
-
-Let's start with the simplest streaming query:
-
-```sql
-SELECT STREAM *
-FROM Orders;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 10:17:00 | 30 | 5 | 4
- 10:17:05 | 10 | 6 | 1
- 10:18:05 | 20 | 7 | 2
- 10:18:07 | 30 | 8 | 20
- 11:02:00 | 10 | 9 | 6
- 11:04:00 | 10 | 10 | 1
- 11:09:30 | 40 | 11 | 12
- 11:24:11 | 10 | 12 | 4
-```
-
-This query reads all columns and rows from the `Orders` stream.
-Like any streaming query, it never terminates. It outputs a record whenever
-a record arrives in `Orders`.
-
-Type `Control-C` to terminate the query.
-
-The `STREAM` keyword is the main extension in streaming SQL. It tells the
-system that you are interested in incoming orders, not existing ones. The query
-
-```sql
-SELECT *
-FROM Orders;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 08:30:00 | 10 | 1 | 3
- 08:45:10 | 20 | 2 | 1
- 09:12:21 | 10 | 3 | 10
- 09:27:44 | 30 | 4 | 2
-
-4 records returned.
-```
-
-is also valid, but will print out all existing orders and then terminate. We
-call it a *relational* query, as opposed to *streaming*. It has traditional
-SQL semantics.
-
-`Orders` is special, in that it has both a stream and a table. If you try to run
-a streaming query on a table, or a relational query on a stream, Calcite gives
-an error:
-
-```sql
-> SELECT * FROM Shipments;
-ERROR: Cannot convert stream 'SHIPMENTS' to a table
-
-> SELECT STREAM * FROM Products;
-ERROR: Cannot convert table 'PRODUCTS' to a stream
-```
-
-# Filtering rows
-
-Just as in regular SQL, you use a `WHERE` clause to filter rows:
-
-```sql
-SELECT STREAM *
-FROM Orders
-WHERE units > 3;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 10:17:00 | 30 | 5 | 4
- 10:18:07 | 30 | 8 | 20
- 11:02:00 | 10 | 9 | 6
- 11:09:30 | 40 | 11 | 12
- 11:24:11 | 10 | 12 | 4
-```
-
-# Projecting expressions
-
-Use expressions in the `SELECT` clause to choose which columns to return or
-compute expressions:
-
-```sql
-SELECT STREAM rowtime,
- 'An order for ' || units || ' '
- || CASE units WHEN 1 THEN 'unit' ELSE 'units' END
- || ' of product #' || productId AS description
-FROM Orders;
-
- rowtime | description
-----------+---------------------------------------
- 10:17:00 | An order for 4 units of product #30
- 10:17:05 | An order for 1 unit of product #10
- 10:18:05 | An order for 2 units of product #20
- 10:18:07 | An order for 20 units of product #30
- 11:02:00 | An order by 6 units of product #10
- 11:04:00 | An order by 1 unit of product #10
- 11:09:30 | An order for 12 units of product #40
- 11:24:11 | An order by 4 units of product #10
-```
-
-We recommend that you always include the `rowtime` column in the `SELECT`
-clause. Having a sorted timestamp in each stream and streaming query makes it
-possible to do advanced calculations later, such as `GROUP BY` and `JOIN`.
-
-# Tumbling windows
-
-There are several ways to compute aggregate functions on streams. The
-differences are:
-* How many rows come out for each row in?
-* Does each incoming value appear in one total, or more?
-* What defines the "window", the set of rows that contribute to a given output row?
-* Is the result a stream or a relation?
-
-First we'll look a *tumbling window*, which is defined by a streaming
-`GROUP BY`. Here is an example:
-
-```sql
-SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,
- productId,
- COUNT(*) AS c,
- SUM(units) AS units
-FROM Orders
-GROUP BY FLOOR(rowtime TO HOUR), productId;
-
- rowtime | productId | c | units
-----------+-----------+---------+-------
- 10:00:00 | 30 | 2 | 24
- 10:00:00 | 10 | 1 | 1
- 10:00:00 | 20 | 1 | 7
- 11:00:00 | 10 | 3 | 11
- 11:00:00 | 40 | 1 | 12
-```
-
-The result is a stream. At 11 o'clock, Calcite emits a sub-total for every
-`productId` that had an order since 10 o'clock. At 12 o'clock, it will emit
-the orders that occurred between 11:00 and 12:00. Each input row contributes to
-only one output row.
-
-How did Calcite know that the 10:00:00 sub-totals were complete at 11:00:00,
-so that it could emit them? It knows that `rowtime` is increasing, and it knows
-that `FLOOR(rowtime TO HOUR)` is also increasing. So, once it has seen a row
-at or after 11:00:00, it will never see a row that will contribute to a 10:00:00
-total.
-
-A column or expression that is increasing or decreasing is said to be
-*monotonic*. Without a monotonic expression in the `GROUP BY` clause, Calcite is
-not able to make progress, and it will not allow the query:
-
-```sql
-> SELECT STREAM productId,
-> COUNT(*) AS c,
-> SUM(units) AS units
-> FROM Orders
-> GROUP BY productId;
-ERROR: Streaming aggregation requires at least one monotonic expression in GROUP BY clause
-```
-
-Monotonic columns need to be declared in the schema. The monotonicity is
-enforced when records enter the stream and assumed by queries that read from
-that stream. We recommend that you give each stream a timestamp column called
-`rowtime`, but you can declare others, `orderId`, for example.
-
-# Filtering after aggregation
-
-As in standard SQL, you can apply a `HAVING` clause to filter rows emitted by
-a streaming `GROUP BY`:
-
-```sql
-SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,
- productId
-FROM Orders
-GROUP BY FLOOR(rowtime TO HOUR), productId
-HAVING COUNT(*) > 2 OR SUM(units) > 10;
-
- rowtime | productId
-----------+-----------
- 10:00:00 | 30
- 11:00:00 | 10
- 11:00:00 | 40
-```
-
-# Sub-queries, views and SQL's closure property
-
-The previous `HAVING` query can be expressed using a `WHERE` clause on a
-sub-query:
-
-```sql
-SELECT STREAM rowtime, productId
-FROM (
- SELECT FLOOR(rowtime TO HOUR) AS rowtime,
- productId,
- COUNT(*) AS c,
- SUM(units) AS su
- FROM Orders
- GROUP BY FLOOR(rowtime TO HOUR), productId)
-WHERE c > 2 OR su > 10;
-
- rowtime | productId
-----------+-----------
- 10:00:00 | 30
- 11:00:00 | 10
- 11:00:00 | 40
-```
-
-`HAVING` was introduced in the early days of SQL, when a way was needed to
-perform a filter *after* aggregation. (Recall that `WHERE` filters rows before
-they enter the `GROUP BY` clause.)
-
-Since then, SQL has become a mathematically closed language, which means that
-any operation you can perform on a table can also perform on a query.
-
-The *closure property* of SQL is extremely powerful. Not only does it render
-`HAVING` obsolete (or, at least, reduce it to syntactic sugar), it makes views
-possible:
-
-```sql
-CREATE VIEW HourlyOrderTotals (rowtime, productId, c, su) AS
- SELECT FLOOR(rowtime TO HOUR),
- productId,
- COUNT(*),
- SUM(units)
- FROM Orders
- GROUP BY FLOOR(rowtime TO HOUR), productId;
-
-SELECT STREAM rowtime, productId
-FROM HourlyOrderTotals
-WHERE c > 2 OR su > 10;
-
- rowtime | productId
-----------+-----------
- 10:00:00 | 30
- 11:00:00 | 10
- 11:00:00 | 40
-```
-
-Sub-queries in the `FROM` clause are sometimes referred to as "inline views",
-but really, nested queries are more fundamental. Views are just a convenient
-way to carve your SQL into manageable chunks.
-
-Many people find that nested queries and views are even more useful on streams
-than they are on relations. Streaming queries are pipelines of
-operators all running continuously, and often those pipelines get quite long.
-Nested queries and views help to express and manage those pipelines.
-
-And, by the way, a `WITH` clause can accomplish the same as a sub-query or
-a view:
-
-```sql
-WITH HourlyOrderTotals (rowtime, productId, c, su) AS (
- SELECT FLOOR(rowtime TO HOUR),
- productId,
- COUNT(*),
- SUM(units)
- FROM Orders
- GROUP BY FLOOR(rowtime TO HOUR), productId)
-SELECT STREAM rowtime, productId
-FROM HourlyOrderTotals
-WHERE c > 2 OR su > 10;
-
- rowtime | productId
-----------+-----------
- 10:00:00 | 30
- 11:00:00 | 10
- 11:00:00 | 40
-```
-
-## Converting between streams and relations
-
-Look back at the definition of the `HourlyOrderTotals` view.
-Is the view a stream or a relation?
-
-It does not contain the `STREAM` keyword, so it is a relation.
-However, it is a relation that can be converted into a stream.
-
-You can use it in both relational and streaming queries:
-
-```sql
-# A relation; will query the historic Orders table.
-# Returns the largest number of product #10 ever sold in one hour.
-SELECT max(su)
-FROM HourlyOrderTotals
-WHERE productId = 10;
-
-# A stream; will query the Orders stream.
-# Returns every hour in which at least one product #10 was sold.
-SELECT STREAM rowtime
-FROM HourlyOrderTotals
-WHERE productId = 10;
-```
-
-This approach is not limited to views and sub-queries.
-Following the approach set out in CQL [<a href="#ref1">1</a>], every query
-in streaming SQL is defined as a relational query and converted to a stream
-using the `STREAM` keyword in the top-most `SELECT`.
-
-If the `STREAM` keyword is present in sub-queries or view definitions, it has no
-effect.
-
-At query preparation time, Calcite figures out whether the relations referenced
-in the query can be converted to streams or historical relations.
-
-Sometimes a stream makes available some of its history (say the last 24 hours of
-data in an Apache Kafka [<a href="#ref2">2</a>] topic)
-but not all. At run time, Calcite figures out whether there is sufficient
-history to run the query, and if not, gives an error.
-
-## Hopping windows
-
-Previously we saw how to define a tumbling window using a `GROUP BY` clause.
-Each record contributed to a single sub-total record, the one containing its
-hour and product id.
-
-But suppose we want to emit, every hour, the number of each product ordered over
-the past three hours. To do this, we use `SELECT ... OVER` and a sliding window
-to combine multiple tumbling windows.
-
-```sql
-SELECT STREAM rowtime,
- productId,
- SUM(su) OVER w AS su,
- SUM(c) OVER w AS c
-FROM HourlyTotals
-WINDOW w AS (
- ORDER BY rowtime
- PARTITION BY productId
- RANGE INTERVAL '2' HOUR PRECEDING)
-```
-
-This query uses the `HourlyOrderTotals` view defined previously.
-The 2 hour interval combines the totals timestamped 09:00:00, 10:00:00 and
-11:00:00 for a particular product into a single total timestamped 11:00:00 and
-summarizing orders for that product between 09:00:00 and 12:00:00.
-
-## Limitations of tumbling and hopping windows
-
-In the present syntax, we acknowledge that it is not easy to create certain
-kinds of windows.
-
-First, let's consider tumbling windows over complex periods.
-
-The `FLOOR` and `CEIL` functions make is easy to create a tumbling window that
-emits on a whole time unit (say every hour, or every minute) but less easy to
-emit, say, every 15 minutes. One could imagine an extension to the `FLOOR`
-function that emits unique values on just about any periodic basis (say in 11
-minute intervals starting from midnight of the current day).
-
-Next, let's consider hopping windows whose retention period is not a multiple
-of its emission period. Say we want to output, at the top of each hour, the
-orders for the previous 7,007 seconds. If we were to simulate this hopping
-window using a sliding window over a tumbling window, as before, we would have
-to sum lots of 1-second windows (because 3,600 and 7,007 are co-prime).
-This is a lot of effort for both the system and the person writing the query.
-
-Calcite could perhaps solve this generalizing `GROUP BY` syntax, but we would
-be destroying the principle that an input row into a `GROUP BY` appears in
-precisely one output row.
-
-Calcite's SQL extensions for streaming queries are evolving. As we learn more
-about how people wish to query streams, we plan to make the language more
-expressive while remaining compatible with standard SQL and consistent with
-its principles, look and feel.
-
-## Sorting
-
-The story for `ORDER BY` is similar to `GROUP BY`.
-The syntax looks like regular SQL, but Calcite must be sure that it can deliver
-timely results. It therefore requires a monotonic expression on the leading edge
-of your `ORDER BY` key.
-
-```sql
-SELECT STREAM FLOOR(rowtime TO hour) AS rowtime, productId, orderId, units
-FROM Orders
-ORDER BY FLOOR(rowtime TO hour) ASC, units DESC;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 10:00:00 | 30 | 8 | 20
- 10:00:00 | 30 | 5 | 4
- 10:00:00 | 20 | 7 | 2
- 10:00:00 | 10 | 6 | 1
- 11:00:00 | 40 | 11 | 12
- 11:00:00 | 10 | 9 | 6
- 11:00:00 | 10 | 12 | 4
- 11:00:00 | 10 | 10 | 1
-```
-
-Most queries will return results in the order that they were inserted,
-because the engine is using streaming algorithms, but you should not rely on it.
-For example, consider this:
-
-```sql
-SELECT STREAM *
-FROM Orders
-WHERE productId = 10
-UNION ALL
-SELECT STREAM *
-FROM Orders
-WHERE productId = 30;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 10:17:05 | 10 | 6 | 1
- 10:17:00 | 30 | 5 | 4
- 10:18:07 | 30 | 8 | 20
- 11:02:00 | 10 | 9 | 6
- 11:04:00 | 10 | 10 | 1
- 11:24:11 | 10 | 12 | 4
-```
-
-The rows with `productId` = 30 are apparently out of order, probably because
-the `Orders` stream was partitioned on `productId` and the partitioned streams
-sent their data at different times.
-
-If you require a particular ordering, add an explicit `ORDER BY`:
-
-```sql
-SELECT STREAM *
-FROM Orders
-WHERE productId = 10
-UNION ALL
-SELECT STREAM *
-FROM Orders
-WHERE productId = 30
-ORDER BY rowtime;
-
- rowtime | productId | orderId | units
-----------+-----------+---------+-------
- 10:17:00 | 30 | 5 | 4
- 10:17:05 | 10 | 6 | 1
- 10:18:07 | 30 | 8 | 20
- 11:02:00 | 10 | 9 | 6
- 11:04:00 | 10 | 10 | 1
- 11:24:11 | 10 | 12 | 4
-```
-
-Calcite will probably implement the `UNION ALL` by merging using `rowtime`,
-which is only slightly less efficient.
-
-You only need to add an `ORDER BY` to the outermost query. If you need to,
-say, perform `GROUP BY` after a `UNION ALL`, Calcite will add an `ORDER BY`
-implicitly, in order to make the GROUP BY algorithm possible.
-
-## Table constructor
-
-The `VALUES` clause creates an inline table with a given set of rows.
-
-Streaming is disallowed. The set of rows never changes, and therefore a stream
-would never return any rows.
-
-```sql
-> SELECT STREAM * FROM (VALUES (1, 'abc'));
-
-ERROR: Cannot stream VALUES
-```
-
-## Sliding windows
-
-Standard SQL features so-called "analytic functions" that can be used in the
-`SELECT` clause. Unlike `GROUP BY`, these do not collapse records. For each
-record that goes in, one record comes out. But the aggregate function is based
-on a window of many rows.
-
-Let's look at an example.
-
-```sql
-SELECT STREAM rowtime,
- productId,
- units,
- SUM(units) OVER (ORDER BY rowtime RANGE INTERVAL '1' HOUR PRECEDING) unitsLastHour
-FROM Orders;
-```
-
-The feature packs a lot of power with little effort. You can have multiple
-functions in the `SELECT` clause, based on multiple window specifications.
-
-The following example returns orders whose average order size over the last
-10 minutes is greater than the average order size for the last week.
-
-```sql
-SELECT STREAM *
-FROM (
- SELECT STREAM rowtime,
- productId,
- units,
- AVG(units) OVER product (RANGE INTERVAL '10' MINUTE PRECEDING) AS m10,
- AVG(units) OVER product (RANGE INTERVAL '7' DAY PRECEDING) AS d7
- FROM Orders
- WINDOW product AS (
- ORDER BY rowtime
- PARTITION BY productId))
-WHERE m10 > d7;
-```
-
-For conciseness, here we use a syntax where you partially define a window
-using a `WINDOW` clause and then refine the window in each `OVER` clause.
-You could also define all windows in the `WINDOW` clause, or all windows inline,
-if you wish.
-
-But the real power goes beyond syntax. Behind the scenes, this query is
-maintaining two tables, and adding and removing values from sub-totals using
-with FIFO queues. But you can access those tables without introducing a join
-into the query.
-
-Some other features of the windowed aggregation syntax:
-* You can define windows based on row count.
-* The window can reference rows that have not yet arrived.
- (The stream will wait until they have arrived).
-* You can compute order-dependent functions such as `RANK` and median.
-
-## Cascading windows
-
-What if we want a query that returns a result for every record, like a
-sliding window, but resets totals on a fixed time period, like a
-tumbling window? Such a pattern is called a *cascading window*. Here
-is an example:
-
-```sql
-SELECT STREAM rowtime,
- productId,
- units,
- SUM(units) OVER (PARTITION BY FLOOR(rowtime TO HOUR)) AS unitsSinceTopOfHour
-FROM Orders;
-```
-
-It looks similar to a sliding window query, but the monotonic
-expression occurs within the `PARTITION BY` clause of the window. As
-the rowtime moves from from 10:59:59 to 11:00:00, `FLOOR(rowtime TO
-HOUR)` changes from 10:00:00 to 11:00:00, and therefore a new
-partition starts. The first row to arrive in the new hour will start a
-new total; the second row will have a total that consists of two rows,
-and so on.
-
-Calcite knows that the old partition will never be used again, so
-removes all sub-totals for that partition from its internal storage.
-
-Analytic functions that using cascading and sliding windows can be
-combined in the same query.
-
-## State of the stream
-
-Not all concepts in this article have been implemented in Calcite.
-And others may be implemented in Calcite but not in a particular adapter
-such as Samza SQL [<a href="#ref3">3</a>].
-
-### Implemented
-* Streaming SELECT, WHERE, GROUP BY, HAVING, UNION ALL, ORDER BY
-* FLOOR and CEILING functions
-* Monotonicity
-* Streaming VALUES is disallowed
-
-### Not implemented
-* Stream-to-stream JOIN
-* Stream-to-table JOIN
-* Stream on view
-* Streaming UNION ALL with ORDER BY (merge)
-* Relational query on stream
-* Streaming windowed aggregation (sliding and cascading windows)
-* Check that STREAM in sub-queries and views is ignored
-* Check that streaming ORDER BY cannot have OFFSET or LIMIT
-* Limited history; at run time, check that there is sufficient history
- to run the query.
-
-### To do in this document
-* Re-visit whether you can stream VALUES
-* OVER clause to define window on stream
-* Windowed aggregation
-* Punctuation
-* Stream-to-table join
-** Stream-to-table join where table is changing
-* Stream-to-stream join
-* Relational queries on streams (e.g. "pie chart" query)
-* Diagrams for various window types
-
-## References
-
-* [<a name="ref1">1</a>]
- <a href="http://ilpubs.stanford.edu:8090/758/">Arasu, Arvind and Babu,
- Shivnath and Widom, Jennifer (2003) The CQL Continuous Query
- Language: Semantic Foundations and Query Execution</a>.
-* [<a name="ref2">2</a>]
- <a href="http://kafka.apache.org/documentation.html">Apache Kafka</a>.
-* [<a name="ref3">3</a>] <a href="http://samza.apache.org">Apache Samza</a>.
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/tutorial.md
----------------------------------------------------------------------
diff --git a/doc/tutorial.md b/doc/tutorial.md
deleted file mode 100644
index 91ddef1..0000000
--- a/doc/tutorial.md
+++ /dev/null
@@ -1,753 +0,0 @@
-<!--
-Licensed to the Apache Software Foundation (ASF) under one or more
-contributor license agreements. See the NOTICE file distributed with
-this work for additional information regarding copyright ownership.
-The ASF licenses this file to you under the Apache License, Version 2.0
-(the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-# CSV Adapter Tutorial
-
-Calcite-example-CSV is a fully functional adapter for
-<a href="https://github.com/apache/incubator-calcite">Calcite</a> that reads
-text files in
-<a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV
-(comma-separated values)</a> format. It is remarkable that a couple of
-hundred lines of Java code are sufficient to provide full SQL query
-capability.
-
-CSV also serves as a template for building adapters to other
-data formats. Even though there are not many lines of code, it covers
-several important concepts:
-* user-defined schema using SchemaFactory and Schema interfaces;
-* declaring schemas in a model JSON file;
-* declaring views in a model JSON file;
-* user-defined table using the Table interface;
-* determining the record type of a table;
-* a simple implementation of Table, using the ScannableTable interface, that
- enumerates all rows directly;
-* a more advanced implementation that implements FilterableTable, and can
- filter out rows according to simple predicates;
-* advanced implementation of Table, using TranslatableTable, that translates
- to relational operators using planner rules.
-
-## Download and build
-
-You need Java (1.7 or higher; 1.8 preferred), git and maven (3.2.1 or later).
-
-```bash
-$ git clone https://github.com/apache/incubator-calcite.git
-$ cd incubator-calcite
-$ mvn install -DskipTests -Dcheckstyle.skip=true
-$ cd example/csv
-```
-
-## First queries
-
-Now let's connect to Calcite using
-<a href="https://github.com/julianhyde/sqlline">sqlline</a>, a SQL shell
-that is included in this project.
-
-```bash
-$ ./sqlline
-sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin
-```
-
-(If you are running Windows, the command is `sqlline.bat`.)
-
-Execute a metadata query:
-
-```bash
-sqlline> !tables
-+------------+--------------+-------------+---------------+----------+------+
-| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE |
-+------------+--------------+-------------+---------------+----------+------+
-| null | SALES | DEPTS | TABLE | null | null |
-| null | SALES | EMPS | TABLE | null | null |
-| null | SALES | HOBBIES | TABLE | null | null |
-| null | metadata | COLUMNS | SYSTEM_TABLE | null | null |
-| null | metadata | TABLES | SYSTEM_TABLE | null | null |
-+------------+--------------+-------------+---------------+----------+------+
-```
-
-(JDBC experts, note: sqlline's <code>!tables</code> command is just executing
-<a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String, java.lang.String, java.lang.String, java.lang.String[])"><code>DatabaseMetaData.getTables()</code></a>
-behind the scenes.
-It has other commands to query JDBC metadata, such as <code>!columns</code> and <code>!describe</code>.)
-
-As you can see there are 5 tables in the system: tables
-<code>EMPS</code>, <code>DEPTS</code> and <code>HOBBIES</code> in the current
-<code>SALES</code> schema, and <code>COLUMNS</code> and
-<code>TABLES</code> in the system <code>metadata</code> schema. The
-system tables are always present in Calcite, but the other tables are
-provided by the specific implementation of the schema; in this case,
-the <code>EMPS</code> and <code>DEPTS</code> tables are based on the
-<code>EMPS.csv</code> and <code>DEPTS.csv</code> files in the
-<code>target/test-classes</code> directory.
-
-Let's execute some queries on those tables, to show that Calcite is providing
-a full implementation of SQL. First, a table scan:
-
-```bash
-sqlline> SELECT * FROM emps;
-+--------+--------+---------+---------+----------------+--------+-------+---+
-| EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | S |
-+--------+--------+---------+---------+----------------+--------+-------+---+
-| 100 | Fred | 10 | | | 30 | 25 | t |
-| 110 | Eric | 20 | M | San Francisco | 3 | 80 | n |
-| 110 | John | 40 | M | Vancouver | 2 | null | f |
-| 120 | Wilma | 20 | F | | 1 | 5 | n |
-| 130 | Alice | 40 | F | Vancouver | 2 | null | f |
-+--------+--------+---------+---------+----------------+--------+-------+---+
-```
-
-Now JOIN and GROUP BY:
-
-```bash
-sqlline> SELECT d.name, COUNT(*)
-. . . .> FROM emps AS e JOIN depts AS d ON e.deptno = d.deptno
-. . . .> GROUP BY d.name;
-+------------+---------+
-| NAME | EXPR$1 |
-+------------+---------+
-| Sales | 1 |
-| Marketing | 2 |
-+------------+---------+
-```
-
-Last, the VALUES operator generates a single row, and is a convenient
-way to test expressions and SQL built-in functions:
-
-```bash
-sqlline> VALUES CHAR_LENGTH('Hello, ' || 'world!');
-+---------+
-| EXPR$0 |
-+---------+
-| 13 |
-+---------+
-```
-
-Calcite has many other SQL features. We don't have time to cover them
-here. Write some more queries to experiment.
-
-## Schema discovery
-
-Now, how did Calcite find these tables? Remember, core Calcite does not
-know anything about CSV files. (As a "database without a storage
-layer", Calcite doesn't know about any file formats.) Calcite knows about
-those tables because we told it to run code in the calcite-example-csv
-project.
-
-There are a couple of steps in that chain. First, we define a schema
-based on a schema factory class in a model file. Then the schema
-factory creates a schema, and the schema creates several tables, each
-of which knows how to get data by scanning a CSV file. Last, after
-Calcite has parsed the query and planned it to use those tables, Calcite
-invokes the tables to read the data as the query is being
-executed. Now let's look at those steps in more detail.
-
-On the JDBC connect string we gave the path of a model in JSON
-format. Here is the model:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'SALES',
- schemas: [
- {
- name: 'SALES',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.csv.CsvSchemaFactory',
- operand: {
- directory: 'target/test-classes/sales'
- }
- }
- ]
-}
-```
-
-The model defines a single schema called 'SALES'. The schema is
-powered by a plugin class,
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchemaFactory.java">org.apache.calcite.adapter.csv.CsvSchemaFactory</a>,
-which is part of the
-calcite-example-csv project and implements the Calcite interface
-<a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/SchemaFactory.html">SchemaFactory</a>.
-Its <code>create</code> method instantiates a
-schema, passing in the <code>directory</code> argument from the model file:
-
-```java
-public Schema create(SchemaPlus parentSchema, String name,
- Map<String, Object> operand) {
- String directory = (String) operand.get("directory");
- String flavorName = (String) operand.get("flavor");
- CsvTable.Flavor flavor;
- if (flavorName == null) {
- flavor = CsvTable.Flavor.SCANNABLE;
- } else {
- flavor = CsvTable.Flavor.valueOf(flavorName.toUpperCase());
- }
- return new CsvSchema(
- new File(directory),
- flavor);
-}
-```
-
-Driven by the model, the schema factory instantiates a single schema
-called 'SALES'. The schema is an instance of
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvSchema.java">org.apache.calcite.adapter.csv.CsvSchema</a>
-and implements the Calcite interface <a
-href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/Schema.html">Schema</a>.
-
-A schema's job is to produce a list of tables. (It can also list sub-schemas and
-table-functions, but these are advanced features and calcite-example-csv does
-not support them.) The tables implement Calcite's
-<a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/Table.html">Table</a>
-interface. <code>CsvSchema</code> produces tables that are instances of
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTable.java">CsvTable</a>
-and its sub-classes.
-
-Here is the relevant code from <code>CsvSchema</code>, overriding the
-<code><a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/impl/AbstractSchema.html#getTableMap()">getTableMap()</a></code>
-method in the <code>AbstractSchema</code> base class.
-
-```java
-protected Map<String, Table> getTableMap() {
- // Look for files in the directory ending in ".csv", ".csv.gz", ".json",
- // ".json.gz".
- File[] files = directoryFile.listFiles(
- new FilenameFilter() {
- public boolean accept(File dir, String name) {
- final String nameSansGz = trim(name, ".gz");
- return nameSansGz.endsWith(".csv")
- || nameSansGz.endsWith(".json");
- }
- });
- if (files == null) {
- System.out.println("directory " + directoryFile + " not found");
- files = new File[0];
- }
- // Build a map from table name to table; each file becomes a table.
- final ImmutableMap.Builder<String, Table> builder = ImmutableMap.builder();
- for (File file : files) {
- String tableName = trim(file.getName(), ".gz");
- final String tableNameSansJson = trimOrNull(tableName, ".json");
- if (tableNameSansJson != null) {
- JsonTable table = new JsonTable(file);
- builder.put(tableNameSansJson, table);
- continue;
- }
- tableName = trim(tableName, ".csv");
- final Table table = createTable(file);
- builder.put(tableName, table);
- }
- return builder.build();
-}
-
-/** Creates different sub-type of table based on the "flavor" attribute. */
-private Table createTable(File file) {
- switch (flavor) {
- case TRANSLATABLE:
- return new CsvTranslatableTable(file, null);
- case SCANNABLE:
- return new CsvScannableTable(file, null);
- case FILTERABLE:
- return new CsvFilterableTable(file, null);
- default:
- throw new AssertionError("Unknown flavor " + flavor);
- }
-}
-```
-
-The schema scans the directory and finds all files whose name ends
-with ".csv" and creates tables for them. In this case, the directory
-is <code>target/test-classes/sales</code> and contains files
-<code>EMPS.csv</code> and <code>DEPTS.csv</code>, which these become
-the tables <code>EMPS</code> and <code>DEPTS</code>.
-
-## Tables and views in schemas
-
-Note how we did not need to define any tables in the model; the schema
-generated the tables automatically.
-
-You can define extra tables,
-beyond those that are created automatically,
-using the <code>tables</code> property of a schema.
-
-Let's see how to create
-an important and useful type of table, namely a view.
-
-A view looks like a table when you are writing a query, but it doesn't store data.
-It derives its result by executing a query.
-The view is expanded while the query is being planned, so the query planner
-can often perform optimizations like removing expressions from the SELECT
-clause that are not used in the final result.
-
-Here is a schema that defines a view:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'SALES',
- schemas: [
- {
- name: 'SALES',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.csv.CsvSchemaFactory',
- operand: {
- directory: 'target/test-classes/sales'
- },
- tables: [
- {
- name: 'FEMALE_EMPS',
- type: 'view',
- sql: 'SELECT * FROM emps WHERE gender = \'F\''
- }
- ]
- }
- ]
-}
-```
-
-The line <code>type: 'view'</code> tags <code>FEMALE_EMPS</code> as a view,
-as opposed to a regular table or a custom table.
-Note that single-quotes within the view definition are escaped using a
-back-slash, in the normal way for JSON.
-
-JSON doesn't make it easy to author long strings, so Calcite supports an
-alternative syntax. If your view has a long SQL statement, you can instead
-supply a list of lines rather than a single string:
-
-```json
- {
- name: 'FEMALE_EMPS',
- type: 'view',
- sql: [
- 'SELECT * FROM emps',
- 'WHERE gender = \'F\''
- ]
- }
-```
-
-Now we have defined a view, we can use it in queries just as if it were a table:
-
-```sql
-sqlline> SELECT e.name, d.name FROM female_emps AS e JOIN depts AS d on e.deptno = d.deptno;
-+--------+------------+
-| NAME | NAME |
-+--------+------------+
-| Wilma | Marketing |
-+--------+------------+
-```
-
-## Custom tables
-
-Custom tables are tables whose implementation is driven by user-defined code.
-They don't need to live in a custom schema.
-
-There is an example in <code>model-with-custom-table.json</code>:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'CUSTOM_TABLE',
- schemas: [
- {
- name: 'CUSTOM_TABLE',
- tables: [
- {
- name: 'EMPS',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.csv.CsvTableFactory',
- operand: {
- file: 'target/test-classes/sales/EMPS.csv.gz',
- flavor: "scannable"
- }
- }
- ]
- }
- ]
-}
-```
-
-We can query the table in the usual way:
-
-```sql
-sqlline> !connect jdbc:calcite:model=target/test-classes/model-with-custom-table.json admin admin
-sqlline> SELECT empno, name FROM custom_table.emps;
-+--------+--------+
-| EMPNO | NAME |
-+--------+--------+
-| 100 | Fred |
-| 110 | Eric |
-| 110 | John |
-| 120 | Wilma |
-| 130 | Alice |
-+--------+--------+
-```
-
-The schema is a regular one, and contains a custom table powered by
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableFactory.java">org.apache.calcite.adapter.csv.CsvTableFactory</a>,
-which implements the Calcite interface
-<a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/TableFactory.html">TableFactory</a>.
-Its <code>create</code> method instantiates a <code>CsvScannableTable</code>,
-passing in the <code>file</code> argument from the model file:
-
-```java
-public CsvTable create(SchemaPlus schema, String name,
- Map<String, Object> map, RelDataType rowType) {
- String fileName = (String) map.get("file");
- final File file = new File(fileName);
- final RelProtoDataType protoRowType =
- rowType != null ? RelDataTypeImpl.proto(rowType) : null;
- return new CsvScannableTable(file, protoRowType);
-}
-```
-
-Implementing a custom table is often a simpler alternative to implementing
-a custom schema. Both approaches might end up creating a similar implementation
-of the <code>Table</code> interface, but for the custom table you don't
-need to implement metadata discovery. (<code>CsvTableFactory</code>
-creates a <code>CsvScannableTable</code>, just as <code>CsvSchema</code> does,
-but the table implementation does not scan the filesystem for .csv files.)
-
-Custom tables require more work for the author of the model (the author
-needs to specify each table and its file explicitly) but also give the author
-more control (say, providing different parameters for each table).
-
-## Comments in models
-
-Models can include comments using `/* ... */` and `//` syntax:
-
-```json
-{
- version: '1.0',
- /* Multi-line
- comment. */
- defaultSchema: 'CUSTOM_TABLE',
- // Single-line comment.
- schemas: [
- ..
- ]
-}
-```
-
-(Comments are not standard JSON, but are a harmless extension.)
-
-## Optimizing queries using planner rules
-
-The table implementations we have seen so far are fine as long as the tables
-don't contain a great deal of data. But if your customer table has, say, a
-hundred columns and a million rows, you would rather that the system did not
-retrieve all of the data for every query. You would like Calcite to negotiate
-with the adapter and find a more efficient way of accessing the data.
-
-This negotiation is a simple form of query optimization. Calcite supports query
-optimization by adding <i>planner rules</i>. Planner rules operate by
-looking for patterns in the query parse tree (for instance a project on top
-of a certain kind of table), and
-
-Planner rules are also extensible, like schemas and tables. So, if you have a
-data store that you want to access via SQL, you first define a custom table or
-schema, and then you define some rules to make the access efficient.
-
-To see this in action, let's use a planner rule to access
-a subset of columns from a CSV file. Let's run the same query against two very
-similar schemas:
-
-```sql
-sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin
-sqlline> explain plan for select name from emps;
-+-----------------------------------------------------+
-| PLAN |
-+-----------------------------------------------------+
-| EnumerableCalcRel(expr#0..9=[{inputs}], NAME=[$t1]) |
-| EnumerableTableAccessRel(table=[[SALES, EMPS]]) |
-+-----------------------------------------------------+
-sqlline> !connect jdbc:calcite:model=target/test-classes/smart.json admin admin
-sqlline> explain plan for select name from emps;
-+-----------------------------------------------------+
-| PLAN |
-+-----------------------------------------------------+
-| EnumerableCalcRel(expr#0..9=[{inputs}], NAME=[$t1]) |
-| CsvTableScan(table=[[SALES, EMPS]]) |
-+-----------------------------------------------------+
-```
-
-What causes the difference in plan? Let's follow the trail of evidence. In the
-<code>smart.json</code> model file, there is just one extra line:
-
-```json
-flavor: "translatable"
-```
-
-This causes a <code>CsvSchema</code> to be created with
-<code>flavor = TRANSLATABLE</code>,
-and its <code>createTable</code> method creates instances of
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTranslatableTable.java">CsvTranslatableTable</a>
-rather than a <code>CsvScannableTable</code>.
-
-<code>CsvTranslatableTable</code> implements the
-<code><a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/schema/TranslatableTable.html#toRel()">TranslatableTable.toRel()</a></code>
-method to create
-<a href="../example/csv/src/main/java/org/apache/calcite/adapter/csv/CsvTableScan.java">CsvTableScan</a>.
-Table scans are the leaves of a query operator tree.
-The usual implementation is
-<code><a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/adapter/enumerable/EnumerableTableScan.html">EnumerableTableScan</a></code>,
-but we have created a distinctive sub-type that will cause rules to fire.
-
-Here is the rule in its entirety:
-
-```java
-public class CsvProjectTableScanRule extends RelOptRule {
- public static final CsvProjectTableScanRule INSTANCE =
- new CsvProjectTableScanRule();
-
- private CsvProjectTableScanRule() {
- super(
- operand(Project.class,
- operand(CsvTableScan.class, none())),
- "CsvProjectTableScanRule");
- }
-
- @Override
- public void onMatch(RelOptRuleCall call) {
- final Project project = call.rel(0);
- final CsvTableScan scan = call.rel(1);
- int[] fields = getProjectFields(project.getProjects());
- if (fields == null) {
- // Project contains expressions more complex than just field references.
- return;
- }
- call.transformTo(
- new CsvTableScan(
- scan.getCluster(),
- scan.getTable(),
- scan.csvTable,
- fields));
- }
-
- private int[] getProjectFields(List<RexNode> exps) {
- final int[] fields = new int[exps.size()];
- for (int i = 0; i < exps.size(); i++) {
- final RexNode exp = exps.get(i);
- if (exp instanceof RexInputRef) {
- fields[i] = ((RexInputRef) exp).getIndex();
- } else {
- return null; // not a simple projection
- }
- }
- return fields;
- }
-}
-```
-
-The constructor declares the pattern of relational expressions that will cause
-the rule to fire.
-
-The <code>onMatch</code> method generates a new relational expression and calls
-<code><a href="http://www.hydromatic.net/calcite/apidocs/org/apache/calcite/plan/RelOptRuleCall.html#transformTo(org.apache.calcite.rel.RelNode)">RelOptRuleCall.transformTo()</a></code>
-to indicate that the rule has fired successfully.
-
-## The query optimization process
-
-There's a lot to say about how clever Calcite's query planner is, but we won't
-say it here. The cleverness is designed to take the burden off you, the writer
-of planner rules.
-
-First, Calcite doesn't fire rules in a prescribed order. The query optimization
-process follows many branches of a branching tree, just like a chess playing
-program examines many possible sequences of moves. If rules A and B both match a
-given section of the query operator tree, then Calcite can fire both.
-
-Second, Calcite uses cost in choosing between plans, but the cost model doesn't
-prevent rules from firing which may seem to be more expensive in the short term.
-
-Many optimizers have a linear optimization scheme. Faced with a choice between
-rule A and rule B, as above, such an optimizer needs to choose immediately. It
-might have a policy such as "apply rule A to the whole tree, then apply rule B
-to the whole tree", or apply a cost-based policy, applying the rule that
-produces the cheaper result.
-
-Calcite doesn't require such compromises.
-This makes it simple to combine various sets of rules.
-If, say you want to combine rules to recognize materialized views with rules to
-read from CSV and JDBC source systems, you just give Calcite the set of all
-rules and tell it to go at it.
-
-Calcite does use a cost model. The cost model decides which plan to ultimately
-use, and sometimes to prune the search tree to prevent the search space from
-exploding, but it never forces you to choose between rule A and rule B. This is
-important, because it avoids falling into local minima in the search space that
-are not actually optimal.
-
-Also (you guessed it) the cost model is pluggable, as are the table and query
-operator statistics it is based upon. But that can be a subject for later.
-
-## JDBC adapter
-
-The JDBC adapter maps a schema in a JDBC data source as a Calcite schema.
-
-For example, this schema reads from a MySQL "foodmart" database:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'FOODMART',
- schemas: [
- {
- name: 'FOODMART',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory',
- operand: {
- jdbcDriver: 'com.mysql.jdbc.Driver',
- jdbcUrl: 'jdbc:mysql://localhost/foodmart',
- jdbcUser: 'foodmart',
- jdbcPassword: 'foodmart'
- }
- }
- ]
-}
-```
-
-(The FoodMart database will be familiar to those of you who have used
-the Mondrian OLAP engine, because it is Mondrian's main test data
-set. To load the data set, follow <a
-href="http://mondrian.pentaho.com/documentation/installation.php#2_Set_up_test_data">Mondrian's
-installation instructions</a>.)
-
-<b>Current limitations</b>: The JDBC adapter currently only pushes
-down table scan operations; all other processing (filtering, joins,
-aggregations and so forth) occurs within Calcite. Our goal is to push
-down as much processing as possible to the source system, translating
-syntax, data types and built-in functions as we go. If a Calcite query
-is based on tables from a single JDBC database, in principle the whole
-query should go to that database. If tables are from multiple JDBC
-sources, or a mixture of JDBC and non-JDBC, Calcite will use the most
-efficient distributed query approach that it can.
-
-## The cloning JDBC adapter
-
-The cloning JDBC adapter creates a hybrid database. The data is
-sourced from a JDBC database but is read into in-memory tables the
-first time each table is accessed. Calcite evaluates queries based on
-those in-memory tables, effectively a cache of the database.
-
-For example, the following model reads tables from a MySQL
-"foodmart" database:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'FOODMART_CLONE',
- schemas: [
- {
- name: 'FOODMART_CLONE',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.clone.CloneSchema$Factory',
- operand: {
- jdbcDriver: 'com.mysql.jdbc.Driver',
- jdbcUrl: 'jdbc:mysql://localhost/foodmart',
- jdbcUser: 'foodmart',
- jdbcPassword: 'foodmart'
- }
- }
- ]
-}
-```
-
-Another technique is to build a clone schema on top of an existing
-schema. You use the <code>source</code> property to reference a schema
-defined earlier in the model, like this:
-
-```json
-{
- version: '1.0',
- defaultSchema: 'FOODMART_CLONE',
- schemas: [
- {
- name: 'FOODMART',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.jdbc.JdbcSchema$Factory',
- operand: {
- jdbcDriver: 'com.mysql.jdbc.Driver',
- jdbcUrl: 'jdbc:mysql://localhost/foodmart',
- jdbcUser: 'foodmart',
- jdbcPassword: 'foodmart'
- }
- },
- {
- name: 'FOODMART_CLONE',
- type: 'custom',
- factory: 'org.apache.calcite.adapter.clone.CloneSchema$Factory',
- operand: {
- source: 'FOODMART'
- }
- }
- ]
-}
-```
-
-You can use this approach to create a clone schema on any type of
-schema, not just JDBC.
-
-The cloning adapter isn't the be-all and end-all. We plan to develop
-more sophisticated caching strategies, and a more complete and
-efficient implementation of in-memory tables, but for now the cloning
-JDBC adapter shows what is possible and allows us to try out our
-initial implementations.
-
-## Further topics
-
-### Defining a custom schema
-
-(To be written.)
-
-### Modifying data
-
-How to enable DML operations (INSERT, UPDATE and DELETE) on your schema.
-
-(To be written.)
-
-### Calling conventions
-
-(To be written.)
-
-### Statistics and cost
-
-(To be written.)
-
-### Defining and using user-defined functions
-
-(To be written.)
-
-### Defining tables in a schema
-
-(To be written.)
-
-### Defining custom tables
-
-(To be written.)
-
-### Built-in SQL implementation
-
-How does Calcite implement SQL, if an adapter does not implement all of the core
-relational operators?
-
-(To be written.)
-
-### Table functions
-
-(To be written.)
-
-## Further resources
-
-* <a href="http://calcite.incubator.apache.org">Apache Calcite</a> home
- page
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/.gitignore
----------------------------------------------------------------------
diff --git a/site/.gitignore b/site/.gitignore
new file mode 100644
index 0000000..09c86a2
--- /dev/null
+++ b/site/.gitignore
@@ -0,0 +1,2 @@
+.sass-cache
+Gemfile.lock
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/Gemfile
----------------------------------------------------------------------
diff --git a/site/Gemfile b/site/Gemfile
new file mode 100644
index 0000000..77ef869
--- /dev/null
+++ b/site/Gemfile
@@ -0,0 +1,3 @@
+source 'https://rubygems.org'
+gem 'github-pages'
+gem 'rouge'
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/README.md
----------------------------------------------------------------------
diff --git a/site/README.md b/site/README.md
new file mode 100644
index 0000000..85efa1d
--- /dev/null
+++ b/site/README.md
@@ -0,0 +1,37 @@
+# Apache Calcite docs site
+
+This directory contains the code for the Apache Calcite (incubating) web site,
+[calcite.incubator.apache.org](https://calcite.incubator.apache.org/).
+
+## Setup
+
+1. `cd site`
+2. svn co https://svn.apache.org/repos/asf/incubator/calcite/site target
+3. `sudo apt-get install rubygems ruby2.1-dev zlib1g-dev` (linux)
+4. `sudo gem install bundler github-pages jekyll`
+5. `bundle install`
+
+## Add javadoc
+
+1. `cd ..`
+2. `mvn -DskipTests site`
+3. `mv target/site/apidocs site/target`
+
+## Running locally
+
+Before opening a pull request, you can preview your contributions by
+running from within the directory:
+
+1. `bundle exec jekyll serve`
+2. Open [http://localhost:4000](http://localhost:4000)
+
+## Pushing to site
+
+1. `cd site/target`
+2. `svn status`
+3. You'll need to `svn add` any new files
+4. `svn ci`
+
+Within a few minutes, svnpubsub should kick in and you'll be able to
+see the results at
+[calcite.incubator.apache.org](https://calcite.incubator.apache.org/).
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_config.yml
----------------------------------------------------------------------
diff --git a/site/_config.yml b/site/_config.yml
new file mode 100644
index 0000000..a47de15
--- /dev/null
+++ b/site/_config.yml
@@ -0,0 +1,12 @@
+markdown: kramdown
+permalink: /news/:year/:month/:day/:title/
+excerpt_separator: ""
+
+repository: https://github.com/apache/incubator-calcite
+destination: target
+exclude: [README.md,Gemfile*]
+keep_files: [".git", ".svn", "apidocs"]
+
+collections:
+ docs:
+ output: true
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_data/contributors.yml
----------------------------------------------------------------------
diff --git a/site/_data/contributors.yml b/site/_data/contributors.yml
new file mode 100644
index 0000000..fc44c1e
--- /dev/null
+++ b/site/_data/contributors.yml
@@ -0,0 +1,58 @@
+- name: Alan Gates
+ apacheId: gates
+ githubId: alanfgates
+ role: Mentor
+- name: Aman Sinha
+ apacheId: amansinha
+ githubId: amansinha100
+ role: Committer
+- name: Ashutosh Chauhan
+ apacheId: hashutosh
+ githubId: ashutoshc
+ role: Champion
+- name: Chris Wensel
+ apacheId: cwensel
+ githubId: cwensel
+ role: PMC
+- name: James R. Taylor
+ apacheId: jamestaylor
+ githubId: JamesRTaylor
+ role: PMC
+- name: Jacques Nadeau
+ apacheId: jacques
+ githubId: jacques-n
+ role: PMC
+- name: Jesús Camacho Rodríguez
+ apacheId: jcamacho
+ githubId: jcamachor
+ role: Committer
+- name: Jinfeng Ni
+ apacheId: jni
+ githubId: jinfengni
+ role: Committer
+- name: John Pullokkaran
+ apacheId: jpullokk
+ githubId: jpullokkaran
+ role: Committer
+- name: Julian Hyde
+ apacheId: jhyde
+ githubId: julianhyde
+ role: PMC
+ homepage: http://people.apache.org/~jhyde
+- name: Nick Dimiduk
+ apacheId: ndimiduk
+ githubId: ndimiduk
+ role: Committer
+- name: Steven Noels
+ apacheId: stevenn
+ githubId: stevenn
+ role: Mentor
+- name: Ted Dunning
+ apacheId: tdunning
+ githubId: tdunning
+ role: Mentor
+ avatar: https://www.mapr.com/sites/default/files/otherpageimages/ted-circle-80.png
+- name: Vladimir Sitnikov
+ apacheId: vladimirsitnikov
+ githubId: vlsi
+ role: PMC
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_data/docs.yml
----------------------------------------------------------------------
diff --git a/site/_data/docs.yml b/site/_data/docs.yml
new file mode 100644
index 0000000..d016b2d
--- /dev/null
+++ b/site/_data/docs.yml
@@ -0,0 +1,25 @@
+- title: Overview
+ docs:
+ - index
+ - tutorial
+ - algebra
+
+- title: Advanced
+ docs:
+ - adapter
+ - avatica
+ - stream
+ - lattice
+
+- title: Reference
+ docs:
+ - reference
+ - model
+ - howto
+
+- title: Meta
+ docs:
+ - downloads
+ - history
+ - contributing
+ - api
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/adapter.md
----------------------------------------------------------------------
diff --git a/site/_docs/adapter.md b/site/_docs/adapter.md
new file mode 100644
index 0000000..af0d66c
--- /dev/null
+++ b/site/_docs/adapter.md
@@ -0,0 +1,21 @@
+---
+layout: docs
+title: Adapters
+permalink: /docs/adapter.html
+---
+
+## Adapters
+
+* <a href="https://github.com/apache/incubator-drill">Apache Drill adapter</a>
+* Cascading adapter (<a href="https://github.com/Cascading/lingual">Lingual</a>)
+* CSV adapter (example/csv)
+* JDBC adapter (part of <a href="/apidocs/org/apache/calcite/adapter/jdbc/package-summary.html">calcite-core</a>)
+* MongoDB adapter (<a href="/apidocs/org/apache/calcite/adapter/mongodb/package-summary.html">calcite-mongodb</a>)
+* Spark adapter (<a href="/apidocs/org/apache/calcite/adapter/spark/package-summary.html">calcite-spark</a>)
+* Splunk adapter (<a href="/apidocs/org/apache/calcite/adapter/splunk/package-summary.html">calcite-splunk</a>)
+* Eclipse Memory Analyzer (MAT) adapter (<a href="https://github.com/vlsi/mat-calcite-plugin">mat-calcite-plugin</a>)
+
+## Drivers
+
+* <a href="/apidocs/org/apache/calcite/jdbc/package-summary.html">JDBC driver</a>
+
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/algebra.md
----------------------------------------------------------------------
diff --git a/site/_docs/algebra.md b/site/_docs/algebra.md
new file mode 100644
index 0000000..f14f114
--- /dev/null
+++ b/site/_docs/algebra.md
@@ -0,0 +1,22 @@
+---
+layout: docs
+title: Algebra
+permalink: /docs/algebra.html
+---
+
+Relational algebra is at the heart of Calcite. Every query is
+represented as a tree of relational operators. You can translate from
+SQL to relational algebra, or you can build the tree directly.
+
+Planner rules transform expression trees using mathematical identities
+that preserve semantics. For example, it is valid to push a filter
+into an input of an inner join if the filter does not reference
+columns from the other input.
+
+Calcite optimizes queries by repeatedly applying planner rules to a
+relational expression. A cost model guides the process, and the
+planner engine generates an alternative expression that has the same
+semantics as the original but a lower cost.
+
+The planning process is extensible. You can add your own relational
+operators, planner rules, cost model, and statistics.
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/api.md
----------------------------------------------------------------------
diff --git a/site/_docs/api.md b/site/_docs/api.md
new file mode 100644
index 0000000..5ab031a
--- /dev/null
+++ b/site/_docs/api.md
@@ -0,0 +1,22 @@
+---
+title: API
+layout: external
+external_url: /apidocs
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/avatica.md
----------------------------------------------------------------------
diff --git a/site/_docs/avatica.md b/site/_docs/avatica.md
new file mode 100644
index 0000000..0458ece
--- /dev/null
+++ b/site/_docs/avatica.md
@@ -0,0 +1,102 @@
+---
+layout: docs
+title: Avatica
+permalink: /docs/avatica.html
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+Avatica is a framework for building JDBC and ODBC drivers for databases,
+and an RPC wire protocol.
+
+![Avatica Architecture](https://raw.githubusercontent.com/julianhyde/share/master/slides/avatica-architecture.png)
+
+Avatica's Java binding has very few dependencies.
+Even though it is part of Apache Calcite it does not depend on other parts of
+Calcite. It depends only on JDK 1.7+ and Jackson.
+
+Avatica's wire protocol is JSON over HTTP.
+The Java implementation uses Jackson to convert request/response command
+objects to/from JSON.
+
+Avatica-Server is a Java implementation of Avatica RPC.
+It embeds the Jetty HTTP server.
+
+Core concepts:
+
+* Meta is a local API sufficient to implement any Avatica provider
+* Factory creates implementations of the JDBC classes (Driver, Connection,
+ Statement, ResultSet) on top of a Meta
+* Service is an interface that implements the functions of Meta in terms
+ of request and response command objects
+
+## JDBC
+
+Avatica implements JDBC by means of Factory.
+Factory creates implementations of the JDBC classes (Driver, Connection,
+Statement, PreparedStatement, ResultSet) on top of a Meta.
+
+## ODBC
+
+Work has not started on Avatica ODBC.
+
+Avatica ODBC would use the same wire protocol and could use the same server
+implementation in Java. The ODBC client would be written in C or C++.
+
+Since the Avatica protocol abstracts many of the differences between providers,
+the same ODBC client could be used for different databases.
+
+## Project structure
+
+We know that it is important that client libraries have minimal dependencies.
+
+Avatica is currently part of Apache Calcite.
+It does not depend upon any other part of Calcite.
+At some point Avatica could become a separate project.
+
+Packages:
+
+* [org.apache.calcite.avatica](/apidocs/org/apache/calcite/avatica/package-summary.html) Core framework
+* [org.apache.calcite.avatica.remote](/apidocs/org/apache/calcite/avatica/remote/package-summary.html) JDBC driver that uses remote procedure calls
+* [org.apache.calcite.avatica.server](/apidocs/org/apache/calcite/avatica/server/package-summary.html) HTTP server
+* [org.apache.calcite.avatica.util](/apidocs/org/apache/calcite/avatica/util/package-summary.html) Utilities
+
+## Status
+
+### Implemented
+
+* Create connection, create statement, metadata, prepare, bind, execute, fetch
+* RPC using JSON over HTTP
+* Local implementation
+* Implementation over an existing JDBC driver
+* Composite RPCs (combining several requests into one round trip)
+ * Execute-Fetch
+ * Metadata-Fetch (metadata calls such as getTables return all rows)
+
+### Not implemented
+
+* ODBC
+* RPCs
+ * CloseStatement
+ * CloseConnection
+* Composite RPCs
+ * CreateStatement-Prepare
+ * CloseStatement-CloseConnection
+ * Prepare-Execute-Fetch (Statement.executeQuery should fetch first N rows)
+* Remove statements from statement table
+* DML (INSERT, UPDATE, DELETE)
+* Statement.execute applied to SELECT statement
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/contributing.md
----------------------------------------------------------------------
diff --git a/site/_docs/contributing.md b/site/_docs/contributing.md
new file mode 100644
index 0000000..8fbdd6d
--- /dev/null
+++ b/site/_docs/contributing.md
@@ -0,0 +1,72 @@
+---
+layout: docs
+title: Contributing
+permalink: /docs/contributing.html
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+We welcome contributions.
+
+If you are planning to make a large contribution, talk to us first! It
+helps to agree on the general approach. Log a
+[JIRA case](https://issues.apache.org/jira/browse/CALCITE) for your
+proposed feature or start a discussion on the dev list.
+
+Fork the github repository, and create a branch for your feature.
+
+Develop your feature and test cases, and make sure that
+`mvn install` succeeds. (Run extra tests if your change warrants it.)
+
+Commit your change to your branch, and use a comment that starts with
+the JIRA case number, like this:
+
+{% highlight text %}
+[CALCITE-345] AssertionError in RexToLixTranslator comparing to date literal
+{% endhighlight %}
+
+If your change had multiple commits, use `git rebase -i master` to
+combine them into a single commit, and to bring your code up to date
+with the latest on the main line.
+
+Then push your commit(s) to github, and create a pull request from
+your branch to the incubator-calcite master branch. Update the JIRA case
+to reference your pull request, and a committer will review your
+changes.
+
+## Getting started
+
+Calcite is a community, so the first step to joining the project is to introduce yourself.
+Join the [developers list](http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/)
+and send an email.
+
+If you have the chance to attend a [meetup](http://www.meetup.com/Apache-Calcite/),
+or meet [members of the community](http://calcite.incubator.apache.org/team-list.html)
+at a conference, that's also great.
+
+Choose an initial task to work on. It should be something really simple,
+such as a bug fix or a [Jira task that we have labeled
+"newbie"](https://issues.apache.org/jira/issues/?jql=labels%20%3D%20newbie%20%26%20project%20%3D%20Calcite%20%26%20status%20%3D%20Open).
+Follow the [contributing guidelines](#contributing) to get your change committed.
+
+After you have made several useful contributions we may
+[invite you to become a committer](https://community.apache.org/contributors/).
+We value all contributions that help to build a vibrant community, not just code.
+You can contribute by testing the code, helping verify a release,
+writing documentation or the web site,
+or just by answering questions on the list.
http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/site/_docs/downloads.md
----------------------------------------------------------------------
diff --git a/site/_docs/downloads.md b/site/_docs/downloads.md
new file mode 100644
index 0000000..efcff43
--- /dev/null
+++ b/site/_docs/downloads.md
@@ -0,0 +1,53 @@
+---
+layout: docs
+title: Downloads
+permalink: /docs/downloads.html
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Calcite is released as a source artifact, and also through Maven.
+
+# Source releases
+
+Release | Date | Commit | Notes | Download
+:--------------- | :--------- | :------- | :---- | :-------
+{% for post in site.categories.release %}{{ post.version }} | {{ post.date | date_to_string }} | <a href="https://github.com/apache/incubator-calcite/commit/{{ post.sha }}">{{ post.sha }}</a> | <a href="history.html#{{ post.tag }}">notes</a> | <a href="http://{% if forloop.index0 < 2 %}www.apache.org/dyn/closer.cgi{% else %}archive.apache.org/dist{% endif %}/incubator/calcite/{% if post.fullVersion %}{{ post.fullVersion }}{% else %}apache-calcite-{{ post.version }}{% endif %}">src</a>
+ {% endfor %}
+
+# Maven artifacts
+
+Add the following to the dependencies section of your `pom.xml` file:
+
+{% for post in site.categories.release limit:1 %}
+{% assign current_release = post %}
+{% endfor %}
+
+{% highlight xml %}
+<dependencies>
+ <dependency>
+ <groupId>org.apache.calcite</groupId>
+ <artifactId>calcite-core</artifactId>
+ <version>{{ current_release.version }}</version>
+ </dependency>
+</dependencies>
+{% endhighlight %}
+
+Also include `<dependency>` elements for any extension modules you
+need: `calcite-mongodb`, `calcite-spark`, `calcite-splunk`, and so
+forth.
\ No newline at end of file