You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@drill.apache.org by dz...@apache.org on 2023/03/03 11:26:05 UTC

[drill-site] branch master updated (e3968908c -> 652aa27c1)

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git


    from e3968908c Document new statistical functions.
     new e893ec7f0 Document new statistical functions (fixup).
     new 652aa27c1 Remove duplicated Drill 1.21 blog post and add cgivre to authors.json.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 _data/authors.json                                 |  8 ++-
 .../sql-reference/sql-functions/015-statistical.md | 37 +++++++++++
 blog/_posts/en/2023-02-21-drill-1.21.0-released.md | 23 -------
 .../en/2023-03-02-drill-1.21-announcement.md       | 73 ----------------------
 4 files changed, 44 insertions(+), 97 deletions(-)
 create mode 100644 _docs/en/sql-reference/sql-functions/015-statistical.md
 delete mode 100644 blog/_posts/en/2023-02-21-drill-1.21.0-released.md
 delete mode 100644 blog/_posts/en/2023-03-02-drill-1.21-announcement.md

[drill-site] 02/02: Remove duplicated Drill 1.21 blog post and add cgivre to authors.json.

Posted by dz...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git

commit 652aa27c131a2e70596c78704b33fe1ef039ee31
Author: James Turton <ja...@somecomputer.xyz>
AuthorDate: Fri Mar 3 13:24:16 2023 +0200

    Remove duplicated Drill 1.21 blog post and add cgivre to authors.json.
---
 _data/authors.json                                 |  8 ++-
 blog/_posts/en/2023-02-21-drill-1.21.0-released.md | 23 -------
 .../en/2023-03-02-drill-1.21-announcement.md       | 73 ----------------------
 3 files changed, 7 insertions(+), 97 deletions(-)

diff --git a/_data/authors.json b/_data/authors.json
index b1e58c740..a2caeec34 100644
--- a/_data/authors.json
+++ b/_data/authors.json
@@ -58,5 +58,11 @@
     "title": "Committer",
     "org": "Apache Software Foundation",
     "email": "laurent@apache.org"
-  }
+  },
+  "cgivre": {
+    "name": "Charles Givre",
+    "title": "PMC Chair and Committer",
+    "org": "Apache Drill",
+    "email": "cgivre@apache.org"
+  },
 }
diff --git a/blog/_posts/en/2023-02-21-drill-1.21.0-released.md b/blog/_posts/en/2023-02-21-drill-1.21.0-released.md
deleted file mode 100644
index 54a619092..000000000
--- a/blog/_posts/en/2023-02-21-drill-1.21.0-released.md
+++ /dev/null
@@ -1,23 +0,0 @@
----
-layout: post
-title: "Drill 1.21.0 Released"
-code: drill-1.21.0-released
-excerpt: Apache Drill 1.21.0 has been released.
-
-authors: ["jturton"]
----
-
-Today, we're happy to announce the availability of Drill 1.21.0. You can download it [here](https://drill.apache.org/download/).
-
-## This release provides the following new features:
-
-* A major upgrade of the parsing and planning library Calcite from 1.21 to 1.33 enabled by the elimination of Drill's fork of Calcite.
-* Upgrades of most format plugins to the internal EVF2 reader framework included support for provided schemas.
-* A new native Drill storage plugin enabling "Drill-on-Drill" federated deployments.
-* INSERT support, currently in the JDBC, Splunk and Google Sheets plugins.
-* New SQL syntax including filtered aggregates, PIVOT, UNPIVOT, INTERSECT and EXCEPT.
-* Support for new authentication modes in storage plugins including user translation for using different external credentials for different Drill users.
-* An overhaul of the implicit type casting logic for a more consistent user experience.
-* New functions and storage plugins including Delta Lake, Google Sheets, MS Access, threat hunting functions and statistical distribution functions.
-
-You can find a complete list of improvements and JIRAs resolved in the 1.21.0 release [here]({{site.baseurl}}/docs/apache-drill-1-21-0-release-notes/).
diff --git a/blog/_posts/en/2023-03-02-drill-1.21-announcement.md b/blog/_posts/en/2023-03-02-drill-1.21-announcement.md
deleted file mode 100644
index f610b3342..000000000
--- a/blog/_posts/en/2023-03-02-drill-1.21-announcement.md
+++ /dev/null
@@ -1,73 +0,0 @@
----
-layout: post
-title: "Announcing Drill 1.21!"
-code: drill-1.21-announcement
-excerpt: "Announcing Drill 1.21: New Connectors, Functions and Much Better Stability."
-
-authors: ["cgivre"]
----
-
-
-# Announcing Drill 1.21: New Connectors, Functions and Much Better Stability
-The Apache Drill PMC is pleased to announce a milestone release of Apache Drill. Since the last release of Drill the team has been hard at work quashing bugs and making overall functionality improvements. The TL;DR includes the following:
-
-* New connectors including Apache Iceberg, Delta Lake, Microsoft Access, GoogleSheets, and Box
-* Efficient cross-cloud query capability
-* Greatly improved access controls to include user translation support for all storage plugins
-* Greatly improved query planning and implicit casting.
-* New BI-focused SQL operators including `PIVOT`, `UNPIVOT`, `EXCEPT` and `INTERSECT`
-* New functions for computing regression lines and trends.
-* New and updated date manipulation functions.
-
-Overall, Drill 1.21 is much more capable and stable than previous versions. 
-
-## Calcite, We’re Back!
-Drill relies on another open source project, Apache Calcite for its query planning. The query planning process is a huge part of the overall functionality of Drill. Unfortunately, about three years ago, there were some issues in Calcite which forced Drill to fork it and rely on that fork. As a result, Drill was essentially stuck with a three year old query planner, but more importantly, bugs that were fixed in Calcite, as well as new capabilities were not finding their way into Drill. 
-
-That is no longer the case. Drill 1.21 is now running on the latest stable version of Calcite, version 1.33. As a result, we’ve been able to close countless JIRA tickets of various queries failing and other random bugs that were the result of query planning bugs.
-
-What this means for you as a user is that you’ll see much fewer queries failing and better overall performance in terms of speed and stability. You’ll see better optimizations being pushed down to JDBC data sources as well as support for BigQuery, Athena and other JDBC data sources. We hope to keep Drill away from Calcite forks so I hope that we will work with the Calcite community to keep our tools in sync.
-
-## Improved Implicit Casting Rules Reduce Schema Change Failures
-From this author’s perspective, one of the biggest improvements in Drill is one of the least noticeable and that is the result of improved implicit casting. One of Drill’s unique features is its ability to infer the structure, or schema of your data. However, this can be problematic when the schema changes. When I used to teach Drill, I used to have spend a considerable amount of time teaching students how to cast data from one data type to another to ensure that the queries would succeed.
-
-When using latest version of Drill, you’ll find that queries will work without the need for much if any casting. In short, they’ll do what you expect them to do. It’s really a high on magic functionality. 
-
-## Integrations with the Modern and Not-so-Modern Data Stack
-The new version of Drill features several new connectors and readers that will enable users to connect to the “modern data stack”, specifically support for Apache Iceberg and Delta Lake. 
-
-### Breaking the Iceberg
-Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Drill to safely work with the same tables, at the same time. In addition to being able to query data directly from Iceberg tables, Drill also allows users to query the Iceberg table metadata as well as snapshots.  [Complete documentation is available here](https://drill.apache.org/docs/iceberg-format-plugin/).
-
-### Querying Delta Lake
-Lest we offend someone, we’re not going to get into the debate between Iceberg and Delta lake (after all, let’s not argue about who killed whom), but Delta Lake, if you aren’t familiar with it, is another modern table format which allows ACID transactions, versioning etc. In version 1.21, Drill adds support for Delta Lake tables, so users can query Delta Lake tables as well as associated metadata. You can also query specific versions of files in delta lake.  [Complete documentation is av [...]
-
-### Accessing Access
-A surprising number of people use Microsoft Access as a database for their business data. With version 1.21, Apache Drill can now natively query Microsoft Access database files using Drill. This can be a major benefit for those looking to migrate data from Access into more modern formats such as parquet or even other relational databases. Drill will support Access files from version 1997 and up. 
-
-### Oh Sheets!
-In addition to all of the above, Drill can now query data directly from GoogleSheets. In addition to being able to query this data source, Drill can read, write, delete and append to GoogleSheets. Google doesn’t make it easy, so if this is a feature you are interested in, you’ll definitely want to [read the documentation here](https://drill.apache.org/docs/google-sheets-storage-plugin/).
-
-### Remote Data
-As you can see, Drill has significantly expanded the number of data sources and types that it can query. A part of this work has also been to improve the implementation behind filesystems. As a result, Drill can now query data stored on Dropbox, and Box. We added support for filesystems which use OAuth 2.0 for authorization so this means that more extended file systems are likely coming your way for the next release.
-
-## Greatly Improved Access Controls
-Managing access controls and credentials on a federated query engine is a complicated task. Drill has supported a concept called user impersonation which basically means that Drill can execute queries using the credentials of the logged in user. This concept works well for querying file systems such as Hadoop, and other data sources that have the same concepts, however it does not work at all with data sources that have different concepts of users, or in the case of OAuth enabled plugins [...]
-
-To answer this challenge, Drill 1.21 introduces the concept of user translation. The idea of user translation is that, when enabled, every user will have their own unique credentials for specific data sources. Thus, when that user queries a specific data source, that user’s credentials are used to execute the query. This is configurable on an individual data source basis. Ultimately, what this means is that you no longer have to create service accounts to access data via Drill. 
-
-## Drilling Across the Clouds
-While we’re on the subject of clouds, as you may be aware, Drill can query data stored in cloud-based file systems such as S3, Azure, GCP etc. One of the challenges however, is that if you have data stored in multiple clouds, it can become very inefficient to query this data, especially from the perspective of network IO. As of Drill 1.21, Drill adds a storage plugin which we are calling Drill on Drill.
-
-Let’s say that you had a Drill cluster in S3, but you had data in both S3 and Azure. With the new Drill on Drill capability, you could install an additional Drill cluster in Azure, then query both from either Drill cluster. The advantage is that the queries would be pushed down to the Drill cluster where the data resides. So if you query Azure from S3, you aren’t sending tons of data back and forth. 
-
-## Drill Now Supports More BI Operators
-While Drill held more or less to the SQL standard, it was missing some BI operators that had become commonplace among SQL platforms. Drill 1.21 introduces the `PIVOT`, and `UNPIVOT` operators which covert rows to columns or vice versa, much in the same way a pivot table works in Excel. Additionally, we added set operators `INTERSECT` and `EXCEPT` which have become part of the SQL standard.
-
-## New Statistical Functions
-Drill 1.21 adds new SQL functions for statistical summaries including `kendall_correlation` for calculating correlation coefficients, `width_bucket` which is a SQL function for computing histograms and distributions, and two other functions for computing regression lines. 
-
-Lastly, we’ve also added additional date/time manipulation functions which will make working with dates significantly easier. 
-
-## What’s Next?
-The big question is where do we go from here? We’ve already started working on adding support for additional BI operators such as `CUBE`, `GROUPING SETS` and `ROLLUP`, as well as `REGEXP_EXTRACT`. Since the new version of Calcite has support for numerous optimizations around materialized views this is also something which is being discussed. If you like what you are seeing, please download Drill and try it out. Feedback is always welcome on the Drill slack channel or on our mailing lists [...]

[drill-site] 01/02: Document new statistical functions (fixup).

Posted by dz...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/drill-site.git

commit e893ec7f04c877d57f3b7602701a8b76e7c1f4ee
Author: James Turton <ja...@somecomputer.xyz>
AuthorDate: Fri Mar 3 13:24:03 2023 +0200

    Document new statistical functions (fixup).
---
 .../sql-reference/sql-functions/015-statistical.md | 37 ++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/_docs/en/sql-reference/sql-functions/015-statistical.md b/_docs/en/sql-reference/sql-functions/015-statistical.md
new file mode 100644
index 000000000..9e9497f76
--- /dev/null
+++ b/_docs/en/sql-reference/sql-functions/015-statistical.md
@@ -0,0 +1,37 @@
+---
+title: "Statistical"
+slug: "Statistical"
+parent: "SQL Functions"
+---
+
+Drill supports the scalar statistical functions shown in the following table.
+
+## Table of statistical functions
+
+| Function     | Return Type | Description                                                                                                                    |
+| ------------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------ |
+| width_bucket | INT         | Returns the 1-based bucket index of the value after dividing the interval between min and max into the given number of buckets |
+
+## WIDTH_BUCKET
+
+Returns the 1-based bucket index of _value_ after dividing the interval between _min_ and_ max_ into the given number of buckets. A _value_ that falls outside the given range are given an index of 0 (_value_ < _min_) or _buckets_ + 1 (_value_ > _max_).
+
+### WIDTH_BUCKET Syntax
+
+```
+WIDTH_BUCKET( value, min, max, buckets )
+```
+
+### WIDTH_BUCKET Examples
+
+```
+apache drill> select width_bucket(3, 0, 10, 5);
+EXPR$0  2
+
+1 row selected (0.201 seconds)
+
+apache drill> select width_bucket(1000, 0, 10, 5);
+EXPR$0  6
+
+1 row selected (0.131 seconds)
+```