You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@asterixdb.apache.org by bu...@apache.org on 2016/10/03 19:00:51 UTC

asterixdb git commit: Making the SQL++ reference manual a bit more generic in how it reads.

Repository: asterixdb
Updated Branches:
  refs/heads/master f7f3a7f2b -> c7a8a1505


Making the SQL++ reference manual a bit more generic in how it reads.

Change-Id: I184ede1398de3190b60bec2947d826bdc5278594
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1237
Sonar-Qube: Jenkins <je...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <je...@fulliautomatix.ics.uci.edu>
Reviewed-by: Yingyi Bu <bu...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/c7a8a150
Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/c7a8a150
Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/c7a8a150

Branch: refs/heads/master
Commit: c7a8a15056212367680dff2b133d4a025a5d7a3b
Parents: f7f3a7f
Author: Mike Carey <dt...@gmail.com>
Authored: Sun Oct 2 22:34:05 2016 -0700
Committer: Yingyi Bu <bu...@gmail.com>
Committed: Mon Oct 3 12:00:24 2016 -0700

----------------------------------------------------------------------
 .../src/main/markdown/sqlpp/1_intro.md          | 19 ++++++++--
 .../src/main/markdown/sqlpp/2_expr.md           | 18 +++++-----
 .../src/main/markdown/sqlpp/3_query.md          |  2 +-
 .../src/main/markdown/sqlpp/4_ddl.md            | 38 ++++++++++----------
 4 files changed, 46 insertions(+), 31 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
index 808d713..fdc04cb 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
@@ -19,7 +19,22 @@
 
 # <a id="Introduction">1. Introduction</a><font size="3"/>
 
-This document is intended as a reference guide to the full syntax and semantics of the SQL++ Query Language, a SQL-inspired language for working with semistructured data. SQL++ has much in common with SQL, but there are also differences due to the data model that the language is designed to serve. (SQL was designed in the 1970's for interacting with the flat, schema-ified world of relational databases, while SQL++ is designed for the nested, schema-less/schema-optional world of modern NoSQL systems.) In particular, SQL++ in the context of Apache AsterixDB is intended for working with the Asterix Data Model (ADM), which is a data model aimed at a superset of JSON with an enriched and flexible type system.
+This document is intended as a reference guide to the full syntax and semantics of
+the SQL++ Query Language, a SQL-inspired language for working with semistructured data.
+SQL++ has much in common with SQL, but some differences do exist due to the different
+data models that the two languages were designed to serve.
+SQL was designed in the 1970's for interacting with the flat, schema-ified world of
+relational databases, while SQL++ is much newer and targets the nested, schema-optional
+(or even schema-less) world of modern NoSQL systems.
 
-New AsterixDB users are encouraged to read and work through the (friendlier) guide "AsterixDB 101: An ADM and SQL++ Primer" before attempting to make use of this document. In addition, readers are advised to read and understand the Asterix Data Model (ADM) reference guide since a basic understanding of ADM concepts is a prerequisite to understanding SQL++. In what follows, we detail the features of the SQL++ language in a grammar-guided manner: we list and briefly explain each of the productions in the SQL++ grammar, offering examples (and results) for clarity.
+In the context of Apache AsterixDB, SQL++ is intended for working with the Asterix Data Model (ADM),
+a data model based on a superset of JSON with an enriched and flexible type system.
+New AsterixDB users are encouraged to read and work through the (much friendlier) guide
+"AsterixDB 101: An ADM and SQL++ Primer" before attempting to make use of this document.
+In addition, readers are advised to read through the Asterix Data Model (ADM) reference guide
+first as well, as an understanding of the data model is a prerequisite to understanding SQL++.
+
+In what follows, we detail the features of the SQL++ language in a grammar-guided manner.
+We list and briefly explain each of the productions in the SQL++ grammar, offering examples
+(and results) for clarity.
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
index c2bab77..732daa4 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
@@ -21,7 +21,7 @@
 
     Expression ::= OperatorExpression | CaseExpression | QuantifiedExpression
 
-SQL++ is a highly composable expression language. Each SQL++ expression returns zero or more Asterix Data Model (ADM) instances. There are three major kinds of expressions in SQL++. At the topmost level, a SQL++ expression can be an OperatorExpression (similar to a mathematical expression), an ConditionalExpression (to choose between alternative values), or a QuantifiedExpression (which yields a boolean value). Each will be detailed as we explore the full SQL++ grammar.
+SQL++ is a highly composable expression language. Each SQL++ expression returns zero or more data model instances. There are three major kinds of expressions in SQL++. At the topmost level, a SQL++ expression can be an OperatorExpression (similar to a mathematical expression), an ConditionalExpression (to choose between alternative values), or a QuantifiedExpression (which yields a boolean value). Each will be detailed as we explore the full SQL++ grammar.
 
 ## <a id="Primary_expressions">Primary Expressions</a>
 
@@ -29,9 +29,9 @@ SQL++ is a highly composable expression language. Each SQL++ expression returns
                   | VariableReference
                   | ParenthesizedExpression
                   | FunctionCallExpression
-                  | Constructor
+		  | Constructor
 
-The most basic building block for any SQL++ expression is PrimaryExpression. This can be a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized expression, a function call, or a newly constructed instance of the Asterix Data Model (such as a newly constructed ADM record or list of ADM instances).
+The most basic building block for any SQL++ expression is PrimaryExpression. This can be a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized expression, a function call, or a newly constructed instance of the data model (such as a newly constructed record or list of data model instances).
 
 ### <a id="Literals">Literals</a>
 
@@ -75,7 +75,7 @@ Different from standard SQL, double quotes play the same role as single quotes a
     <LETTER>    ::= ["A" - "Z", "a" - "z"]
     DelimitedIdentifier   ::= "\`" (<ESCAPE_APOS> | ~["\'"])* "\`"
 
-A variable in SQL++ can be bound to any legal ADM value. A variable reference refers to the value to which an in-scope variable is bound. (E.g., a variable binding may originate from one of the `FROM`, `WITH` or `LET` clauses of a `SELECT` statement or from an input parameter in the context of a function body.) Backticks, e.g., \`id\`, are used for delimited identifiers. Delimiting is needed when a variable's desired name clashes with a SQL++ keyword or includes characters not allowed in regular identifiers.
+A variable in SQL++ can be bound to any legal data model value. A variable reference refers to the value to which an in-scope variable is bound. (E.g., a variable binding may originate from one of the `FROM`, `WITH` or `LET` clauses of a `SELECT` statement or from an input parameter in the context of a function body.) Backticks, e.g., \`id\`, are used for delimited identifiers. Delimiting is needed when a variable's desired name clashes with a SQL++ keyword or includes characters not allowed in regular identifiers.
 
 ##### Examples
 
@@ -100,7 +100,7 @@ The following expression evaluates to the value 2.
 
     FunctionCallExpression ::= FunctionName "(" ( Expression ( "," Expression )* )? ")"
 
-Functions are included in SQL++, like most languages, as a way to package useful functionality or to componentize complicated or reusable SQL++ computations. A function call is a legal SQL++ query expression that represents the ADM value resulting from the evaluation of its body expression with the given parameter bindings; the parameter value bindings can themselves be any SQL++ expressions.
+Functions are included in SQL++, like most languages, as a way to package useful functionality or to componentize complicated or reusable SQL++ computations. A function call is a legal SQL++ query expression that represents the value resulting from the evaluation of its body expression with the given parameter bindings; the parameter value bindings can themselves be any SQL++ expressions.
 
 The following example is a (built-in) function call expression whose value is 8.
 
@@ -116,7 +116,7 @@ The following example is a (built-in) function call expression whose value is 8.
     RecordConstructor        ::= "{" ( FieldBinding ( "," FieldBinding )* )? "}"
     FieldBinding             ::= Expression ":" Expression
 
-A major feature of SQL++ is its ability to construct new ADM data instances. This is accomplished using its constructors for each of the major ADM complex object structures, namely lists (ordered or unordered) and records. Ordered lists are like JSON arrays, while unordered lists have multiset (bag) semantics. Records are built from attributes that are field-name/field-value pairs, again like JSON. (See the AsterixDB Data Model document for more details on each.)
+A major feature of SQL++ is its ability to construct new data model instances. This is accomplished using its constructors for each of the model's complex object structures, namely lists (ordered or unordered) and records. Ordered lists are like JSON arrays, while unordered lists have multiset (bag) semantics. Records are built from attributes that are field-name/field-value pairs, again like JSON. (See the data model document for more details on each.)
 
 The following examples illustrate how to construct a new ordered list with 3 items, a new record with 2 fields, and a new unordered list with 4 items, respectively. List elements can be homogeneous (as in the first example), which is the common case, or they may be heterogeneous (as in the third example). The data values and field name values used to construct lists and records in constructors are all simply SQL++ expressions. Thus, the list elements, field names, and field values used in constructors can be simple literals or they can come from query variable references or even arbitrarily complex SQL++ expressions (subqueries).
 
@@ -125,8 +125,8 @@ The following examples illustrate how to construct a new ordered list with 3 ite
     [ 'a', 'b', 'c' ]
 
     {
-      'project name': 'AsterixDB',
-      'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras' ]
+      'project name': 'Hyracks',
+      'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras', 'tillw' ]
     }
 
     {{ 42, "forty-two!", { "rank": "Captain", "name": "America" }, 3.14159 }}
@@ -137,7 +137,7 @@ The following examples illustrate how to construct a new ordered list with 3 ite
     Field           ::= "." Identifier
     Index           ::= "[" ( Expression | "?" ) "]"
 
-Components of complex types in ADM are accessed via path expressions. Path access can be applied to the result of a SQL++ expression that yields an instance of  a complex type, e.g., a record or list instance. For records, path access is based on field names. For ordered lists, path access is based on (zero-based) array-style indexing. SQL++ also supports an "I'm feeling lucky" style index accessor, [?], for selecting an arbitrary element from an ordered list. Attempts to access non-existent fields or out-of-bound list elements produce the special value `MISSING`.
+Components of complex types in the data model are accessed via path expressions. Path access can be applied to the result of a SQL++ expression that yields an instance of  a complex type, e.g., a record or list instance. For records, path access is based on field names. For ordered lists, path access is based on (zero-based) array-style indexing. SQL++ also supports an "I'm feeling lucky" style index accessor, [?], for selecting an arbitrary element from an ordered list. Attempts to access non-existent fields or out-of-bound list elements produce the special value `MISSING`.
 
 The following examples illustrate field access for a record, index-based element access for an ordered list, and also a composition thereof.
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index c6dcf61..bfe4f0e 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -72,7 +72,7 @@ The following shows the (rich) grammar for the `SELECT` statement in SQL++.
     OrderbyClause      ::= <ORDER> <BY> Expression ( <ASC> | <DESC> )? ( "," Expression ( <ASC> | <DESC> )? )*
     LimitClause        ::= <LIMIT> Expression ( <OFFSET> Expression )?
 
-In this section, we will make use of two stored collections of records (datasets in ADM parlance), `GleambookUsers` and `GleambookMessages`, in a series of running examples to explain `SELECT` queries. The contents of the example collections are as follows:
+In this section, we will make use of two stored collections of records (datasets), `GleambookUsers` and `GleambookMessages`, in a series of running examples to explain `SELECT` queries. The contents of the example collections are as follows:
 
 `GleambookUsers` collection:
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
index a2eebbd..217a670 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
@@ -30,15 +30,16 @@
                       | DeleteStatement
                       | Query ";"
 
-In addition to queries, the AsterixDB implementation of SQL++ supports statements for data definition and
-manipulation purposes as well as controlling the context to be used in evaluating SQL++ expressions.
-This section details the DDL and DML statements supported in the SQL++ language as realized in Apache AsterixDB.
+In addition to queries, an implementation of SQL++ needs to support statements for data definition
+and manipulation purposes as well as controlling the context to be used in evaluating SQL++ expressions.
+This section details the DDL and DML statements supported in the SQL++ language as realized today in
+Apache AsterixDB.
 
 ## <a id="Declarations">Declarations</a>
 
     DatabaseDeclaration ::= "USE" Identifier
 
-The world of data in an AsterixDB instance is organized into data namespaces called **dataverses**.
+At the uppermost level, the world of data is organized into data namespaces called **dataverses**.
 To set the default dataverse for a series of statements, the USE statement is provided in SQL++.
 
 As an example, the following statement sets the default dataverse to be "TinySocial".
@@ -116,15 +117,15 @@ The following example creates a new dataverse named TinySocial if one does not a
     OrderedListTypeDef   ::= "[" ( TypeExpr ) "]"
     UnorderedListTypeDef ::= "{{" ( TypeExpr ) "}}"
 
-The CREATE TYPE statement is used to create a new named ADM datatype.
-This type can then be used to create stored collections or utilized when defining one or more other ADM datatypes.
-Much more information about the Asterix Data Model (ADM) is available in the [data model reference guide](datamodel.html) to ADM.
+The CREATE TYPE statement is used to create a new named datatype.
+This type can then be used to create stored collections or utilized when defining one or more other datatypes.
+Much more information about the data model is available in the [data model reference guide](datamodel.html).
 A new type can be a record type, a renaming of another type, an ordered list type, or an unordered list type.
 A record type can be defined as being either open or closed.
 Instances of a closed record type are not permitted to contain fields other than those specified in the create type statement.
 Instances of an open record type may carry additional fields, and open is the default for new types if neither option is specified.
 
-The following example creates a new ADM record type called GleambookUser type.
+The following example creates a new record type called GleambookUser type.
 Since it is defined as (defaulting to) being an open type,
 instances will be permitted to contain more than what is specified in the type definition.
 The first four fields are essentially traditional typed name/value pairs (much like SQL fields).
@@ -142,7 +143,7 @@ The employment field is an ordered list of instances of another named record typ
       employment: [ EmploymentType ]
     };
 
-The next example creates a new ADM record type, closed this time, called MyUserTupleType.
+The next example creates a new record type, closed this time, called MyUserTupleType.
 Instances of this closed type will not be permitted to have extra fields,
 although the alias field is marked as optional and may thus be NULL or MISSING in legal instances of the type.
 Note that the type of the id field in the example is UUID.
@@ -177,7 +178,7 @@ This field type can be used if you want to have this field be an autogenerated-P
     CompactionPolicy     ::= Identifier
 
 The CREATE DATASET statement is used to create a new dataset.
-Datasets are named, unordered collections of ADM record type instances;
+Datasets are named, unordered collections of record type instances;
 they are where data lives persistently and are the usual targets for SQL++ queries.
 Datasets are typed, and the system ensures that their contents conform to their type definitions.
 An Internal dataset (the default kind) is a dataset whose content lives within and is managed by the system.
@@ -190,8 +191,8 @@ In this case, unlike other non-optional fields, a value for the auto-generated P
 
 Another advanced option, when creating an Internal dataset, is to specify the merge policy to control which of the
 underlying LSM storage components to be merged.
-(AsterixDB supports Log-Structured Merge tree based physical storage for Internal datasets.)
-Apache AsterixDB currently supports four different component merging policies that can be chosen per dataset:
+(The system supports Log-Structured Merge tree based physical storage for Internal datasets.)
+Currently the system supports four different component merging policies that can be chosen per dataset:
 no-merge, constant, prefix, and correlated-prefix.
 The no-merge policy simply never merges disk components.
 The constant policy merges disk components when the number of components reaches a constant number k that can be configured by the user.
@@ -200,14 +201,14 @@ It works by first trying to identify the smallest ordered (oldest to newest) seq
 If such a sequence exists, the components in the sequence are merged together to form a single component.
 Finally, the correlated-prefix policy is similar to the prefix policy, but it delegates the decision of merging the disk components of all the indexes in a dataset to the primary index.
 When the correlated-prefix policy decides that the primary index needs to be merged (using the same decision criteria as for the prefix policy), then it will issue successive merge requests on behalf of all other indexes associated with the same dataset.
-The default policy for AsterixDB is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.
+The system's default policy is the prefix policy except when there is a filter on a dataset, where the preferred policy for filters is the correlated-prefix.
 
 Another advanced option shown in the syntax above, related to performance and mentioned above, is that a **filter** can optionally be created on a field to further optimize range queries with predicates on the filter's field.
 Filters allow some range queries to avoid searching all LSM components when the query conditions match the filter.
 (Refer to [Filter-Based LSM Index Acceleration](filters.html) for more information about filters.)
 
 An External dataset, in contrast to an Internal dataset, has data stored outside of the system's control.
-Files living in HDFS or in the local filesystem(s) of a cluster's nodes are currently supported in AsterixDB.
+Files living in HDFS or in the local filesystem(s) of a cluster's nodes are currently supported.
 External dataset support allows SQL++ queries to treat foreign data as though it were stored in the system,
 making it possible to query "legacy" file data (e.g., Hive data) without having to physically import it.
 When defining an External dataset, an appropriate adapter type must be selected for the desired external data.
@@ -369,7 +370,7 @@ The LOAD statement accepts the same adapters and the same parameters as discusse
 (See the [guide to external data](externaldata.html) for more information on the available adapters.)
 If a dataset has an auto-generated primary key field, the file to be imported should not include that field in it.
 
-The following example shows how to bulk load the GleambookUsers dataset from an external file containing data that has been prepared in ADM format.
+The following example shows how to bulk load the GleambookUsers dataset from an external file containing data that has been prepared in ADM (Asterix Data Model) format.
 
 ##### Example
 
@@ -390,7 +391,7 @@ value for that field in it.
 (The system will automatically extend the provided record with this additional field and a corresponding value.)
 Insertion will fail if the dataset already has data with the primary key value(s) being inserted.
 
-In AsterixDB, inserts are processed transactionally.
+Inserts are processed transactionally by the system.
 The transactional scope of each insert transaction is the insertion of a single object plus its affiliated secondary index entries (if any).
 If the query part of an insert returns a single object, then the INSERT statement will be a single, atomic transaction.
 If the query part returns multiple objects, each object being inserted will be treated as a separate tranaction.
@@ -414,8 +415,7 @@ The following example illustrates a query-based upsert operation.
 
     UPSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
 
-*Editor's note: Upserts currently work in AQL but are apparently disabled at the moment in SQL++.
-(@Yingyi, is that indeed the case?)*
+*Editor's note: Upserts currently work in AQL but are not yet enabled (at the moment) in SQL++.
 
 ### <a id="Deletes">DELETEs</a>
 
@@ -424,7 +424,7 @@ The following example illustrates a query-based upsert operation.
 The SQL++ DELETE statement is used to delete data from a target dataset.
 The data to be deleted is identified by a boolean expression involving the variable bound to the target dataset in the DELETE statement.
 
-Deletes in AsterixDB are processed transactionally.
+Deletes are processed transactionally by the system.
 The transactional scope of each delete transaction is the deletion of a single object plus its affiliated secondary index entries (if any).
 If the boolean expression for a delete identifies a single object, then the DELETE statement itself will be a single, atomic transaction.
 If the expression identifies multiple objects, then each object deleted will be handled as a separate transaction.