You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2023/04/05 19:03:07 UTC
[arrow-datafusion] branch asf-site updated: Publish built docs triggered by 513f78b47c2105d0849bb8d7f9d33d6928338315
This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e85f69b2f3 Publish built docs triggered by 513f78b47c2105d0849bb8d7f9d33d6928338315
e85f69b2f3 is described below
commit e85f69b2f3f6bee2918ed5900004e9a5b9bd849d
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Wed Apr 5 19:03:01 2023 +0000
Publish built docs triggered by 513f78b47c2105d0849bb8d7f9d33d6928338315
---
_sources/user-guide/cli.md.txt | 110 +++++++++++++++++++++++++++++++------
searchindex.js | 2 +-
user-guide/cli.html | 119 ++++++++++++++++++++++++++++++++++-------
3 files changed, 197 insertions(+), 34 deletions(-)
diff --git a/_sources/user-guide/cli.md.txt b/_sources/user-guide/cli.md.txt
index d3512a6dca..ef65561f28 100644
--- a/_sources/user-guide/cli.md.txt
+++ b/_sources/user-guide/cli.md.txt
@@ -180,15 +180,49 @@ STORED AS CSV
LOCATION '/path/to/aggregate_test_100.csv';
```
-## Querying S3 Data Sources
+## Registering S3 Data Sources
-The CLI can query data in S3 if the following environment variables are defined:
+[AWS S3](https://aws.amazon.com/s3/) data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
-- `AWS_DEFAULT_REGION`
-- `AWS_ACCESS_KEY_ID`
-- `AWS_SECRET_ACCESS_KEY`
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS PARQUET
+OPTIONS(
+ 'access_key_id' '******',
+ 'secret_access_key' '******',
+ 'region' 'us-east-2'
+)
+LOCATION 's3://bucket/path/file.parquet';
+```
+
+The supported OPTIONS are:
+
+- access_key_id
+- secret_access_key
+- session_token
+- region
+
+It is also possible to simplify sql statements by environment variables.
+
+```bash
+$ export AWS_DEFAULT_REGION=us-east-2
+$ export AWS_SECRET_ACCESS_KEY=******
+$ export AWS_ACCESS_KEY_ID=******
+
+$ datafusion-cli
+DataFusion CLI v21.0.0
+❯ create external table test stored as parquet location 's3://bucket/path/file.parquet';
+0 rows in set. Query took 0.374 seconds.
+❯ select * from test;
++----------+----------+
+| column_1 | column_2 |
++----------+----------+
+| 1 | 2 |
++----------+----------+
+1 row in set. Query took 0.171 seconds.
+```
-Details of the environment variables that can be used are
+Details of the environment variables that can be used are:
- AWS_ACCESS_KEY_ID -> access_key_id
- AWS_SECRET_ACCESS_KEY -> secret_access_key
@@ -198,19 +232,56 @@ Details of the environment variables that can be used are
- AWS_CONTAINER_CREDENTIALS_RELATIVE_URI -> <https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html>
- AWS_ALLOW_HTTP -> set to "true" to permit HTTP connections without TLS
-Example:
+## Registering OSS Data Sources
-```bash
-$ aws s3 cp test.csv s3://my-bucket/
-upload: ./test.csv to s3://my-bucket/test.csv
+[Alibaba cloud OSS](https://www.alibabacloud.com/product/object-storage-service) data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
-$ export AWS_DEFAULT_REGION=us-east-2
-$ export AWS_SECRET_ACCESS_KEY=***************************
-$ export AWS_ACCESS_KEY_ID=**************
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS PARQUET
+OPTIONS(
+ 'access_key_id' '******',
+ 'secret_access_key' '******',
+ 'endpoint' 'https://bucket.oss-cn-hangzhou.aliyuncs.com'
+)
+LOCATION 'oss://bucket/path/file.parquet';
+```
+
+The supported OPTIONS are:
+
+- access_key_id
+- secret_access_key
+- endpoint
+
+Note that the `endpoint` format of oss needs to be: `https://{bucket}.{oss-region-endpoint}`
+
+## Registering GCS Data Sources
+
+[Google Cloud Storage](https://cloud.google.com/storage) data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
+
+```sql
+CREATE EXTERNAL TABLE test
+STORED AS PARQUET
+OPTIONS(
+ 'service_account_path' '/tmp/gcs.json',
+)
+LOCATION 'gs://bucket/path/file.parquet';
+```
+
+The supported OPTIONS are:
+
+- service_account_path -> location of service account file
+- service_account_key -> JSON serialized service account key
+- application_credentials_path -> location of application credentials file
+
+It is also possible to simplify sql statements by environment variables.
+
+```bash
+$ export GOOGLE_SERVICE_ACCOUNT=/tmp/gcs.json
$ datafusion-cli
-DataFusion CLI v14.0.0
-❯ create external table test stored as csv location 's3://my-bucket/test.csv';
+DataFusion CLI v21.0.0
+❯ create external table test stored as parquet location 'gs://bucket/path/file.parquet';
0 rows in set. Query took 0.374 seconds.
❯ select * from test;
+----------+----------+
@@ -221,6 +292,15 @@ DataFusion CLI v14.0.0
1 row in set. Query took 0.171 seconds.
```
+Details of the environment variables that can be used are:
+
+- GOOGLE_SERVICE_ACCOUNT: location of service account file
+- GOOGLE_SERVICE_ACCOUNT_PATH: (alias) location of service account file
+- SERVICE_ACCOUNT: (alias) location of service account file
+- GOOGLE_SERVICE_ACCOUNT_KEY: JSON serialized service account key
+- GOOGLE_BUCKET: bucket name
+- GOOGLE_BUCKET_NAME: (alias) bucket name
+
## Commands
Available commands inside DataFusion CLI are:
diff --git a/searchindex.js b/searchindex.js
index a7f9d46f2e..ad809a2944 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "user-guide/cli", "user-guide/comparison", "user-guide/configs", "user-guide/dataframe", "user-guide/example-usage", "user-guide/expressions [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "user-guide/cli", "user-guide/comparison", "user-guide/configs", "user-guide/dataframe", "user-guide/example-usage", "user-guide/expressions [...]
\ No newline at end of file
diff --git a/user-guide/cli.html b/user-guide/cli.html
index 44428c5a45..891bce081e 100644
--- a/user-guide/cli.html
+++ b/user-guide/cli.html
@@ -352,8 +352,18 @@
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#querying-s3-data-sources">
- Querying S3 Data Sources
+ <a class="reference internal nav-link" href="#registering-s3-data-sources">
+ Registering S3 Data Sources
+ </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#registering-oss-data-sources">
+ Registering OSS Data Sources
+ </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#registering-gcs-data-sources">
+ Registering GCS Data Sources
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
@@ -560,15 +570,45 @@ via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <
</pre></div>
</div>
</section>
-<section id="querying-s3-data-sources">
-<h2>Querying S3 Data Sources<a class="headerlink" href="#querying-s3-data-sources" title="Permalink to this heading">¶</a></h2>
-<p>The CLI can query data in S3 if the following environment variables are defined:</p>
+<section id="registering-s3-data-sources">
+<h2>Registering S3 Data Sources<a class="headerlink" href="#registering-s3-data-sources" title="Permalink to this heading">¶</a></h2>
+<p><a class="reference external" href="https://aws.amazon.com/s3/">AWS S3</a> data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement.</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">OPTIONS</span><span class="p">(</span>
+<span class="w"> </span><span class="s1">'access_key_id'</span><span class="w"> </span><span class="s1">'******'</span><span class="p">,</span>
+<span class="w"> </span><span class="s1">'secret_access_key'</span><span class="w"> </span><span class="s1">'******'</span><span class="p">,</span>
+<span class="w"> </span><span class="s1">'region'</span><span class="w"> </span><span class="s1">'us-east-2'</span>
+<span class="p">)</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'s3://bucket/path/file.parquet'</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The supported OPTIONS are:</p>
<ul class="simple">
-<li><p><code class="docutils literal notranslate"><span class="pre">AWS_DEFAULT_REGION</span></code></p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">AWS_ACCESS_KEY_ID</span></code></p></li>
-<li><p><code class="docutils literal notranslate"><span class="pre">AWS_SECRET_ACCESS_KEY</span></code></p></li>
+<li><p>access_key_id</p></li>
+<li><p>secret_access_key</p></li>
+<li><p>session_token</p></li>
+<li><p>region</p></li>
</ul>
-<p>Details of the environment variables that can be used are</p>
+<p>It is also possible to simplify sql statements by environment variables.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_DEFAULT_REGION</span><span class="o">=</span>us-east-2
+$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>******
+$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>******
+
+$<span class="w"> </span>datafusion-cli
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v21.0.0
+❯<span class="w"> </span>create<span class="w"> </span>external<span class="w"> </span>table<span class="w"> </span><span class="nb">test</span><span class="w"> </span>stored<span class="w"> </span>as<span class="w"> </span>parquet<span class="w"> </span>location<span class="w"> </span><span class="s1">'s3://bucket/path/file.parquet'</span><span class="p">;</span>
+<span class="m">0</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.374<span class="w"> </span>seconds.
+❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span>test<span class="p">;</span>
++----------+----------+
+<span class="p">|</span><span class="w"> </span>column_1<span class="w"> </span><span class="p">|</span><span class="w"> </span>column_2<span class="w"> </span><span class="p">|</span>
++----------+----------+
+<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
++----------+----------+
+<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.171<span class="w"> </span>seconds.
+</pre></div>
+</div>
+<p>Details of the environment variables that can be used are:</p>
<ul class="simple">
<li><p>AWS_ACCESS_KEY_ID -> access_key_id</p></li>
<li><p>AWS_SECRET_ACCESS_KEY -> secret_access_key</p></li>
@@ -578,17 +618,51 @@ via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <
<li><p>AWS_CONTAINER_CREDENTIALS_RELATIVE_URI -> <a class="reference external" href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html">https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html</a></p></li>
<li><p>AWS_ALLOW_HTTP -> set to “true” to permit HTTP connections without TLS</p></li>
</ul>
-<p>Example:</p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>aws<span class="w"> </span>s3<span class="w"> </span>cp<span class="w"> </span>test.csv<span class="w"> </span>s3://my-bucket/
-upload:<span class="w"> </span>./test.csv<span class="w"> </span>to<span class="w"> </span>s3://my-bucket/test.csv
-
-$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_DEFAULT_REGION</span><span class="o">=</span>us-east-2
-$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_SECRET_ACCESS_KEY</span><span class="o">=</span>***************************
-$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">AWS_ACCESS_KEY_ID</span><span class="o">=</span>**************
+</section>
+<section id="registering-oss-data-sources">
+<h2>Registering OSS Data Sources<a class="headerlink" href="#registering-oss-data-sources" title="Permalink to this heading">¶</a></h2>
+<p><a class="reference external" href="https://www.alibabacloud.com/product/object-storage-service">Alibaba cloud OSS</a> data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement.</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">OPTIONS</span><span class="p">(</span>
+<span class="w"> </span><span class="s1">'access_key_id'</span><span class="w"> </span><span class="s1">'******'</span><span class="p">,</span>
+<span class="w"> </span><span class="s1">'secret_access_key'</span><span class="w"> </span><span class="s1">'******'</span><span class="p">,</span>
+<span class="w"> </span><span class="s1">'endpoint'</span><span class="w"> </span><span class="s1">'https://bucket.oss-cn-hangzhou.aliyuncs.com'</span>
+<span class="p">)</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'oss://bucket/path/file.parquet'</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The supported OPTIONS are:</p>
+<ul class="simple">
+<li><p>access_key_id</p></li>
+<li><p>secret_access_key</p></li>
+<li><p>endpoint</p></li>
+</ul>
+<p>Note that the <code class="docutils literal notranslate"><span class="pre">endpoint</span></code> format of oss needs to be: <code class="docutils literal notranslate"><span class="pre">https://{bucket}.{oss-region-endpoint}</span></code></p>
+</section>
+<section id="registering-gcs-data-sources">
+<h2>Registering GCS Data Sources<a class="headerlink" href="#registering-gcs-data-sources" title="Permalink to this heading">¶</a></h2>
+<p><a class="reference external" href="https://cloud.google.com/storage">Google Cloud Storage</a> data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement.</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">test</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">OPTIONS</span><span class="p">(</span>
+<span class="w"> </span><span class="s1">'service_account_path'</span><span class="w"> </span><span class="s1">'/tmp/gcs.json'</span><span class="p">,</span>
+<span class="p">)</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'gs://bucket/path/file.parquet'</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>The supported OPTIONS are:</p>
+<ul class="simple">
+<li><p>service_account_path -> location of service account file</p></li>
+<li><p>service_account_key -> JSON serialized service account key</p></li>
+<li><p>application_credentials_path -> location of application credentials file</p></li>
+</ul>
+<p>It is also possible to simplify sql statements by environment variables.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">export</span><span class="w"> </span><span class="nv">GOOGLE_SERVICE_ACCOUNT</span><span class="o">=</span>/tmp/gcs.json
$<span class="w"> </span>datafusion-cli
-DataFusion<span class="w"> </span>CLI<span class="w"> </span>v14.0.0
-❯<span class="w"> </span>create<span class="w"> </span>external<span class="w"> </span>table<span class="w"> </span><span class="nb">test</span><span class="w"> </span>stored<span class="w"> </span>as<span class="w"> </span>csv<span class="w"> </span>location<span class="w"> </span><span class="s1">'s3://my-bucket/test.csv'</span><span class="p">;</span>
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v21.0.0
+❯<span class="w"> </span>create<span class="w"> </span>external<span class="w"> </span>table<span class="w"> </span><span class="nb">test</span><span class="w"> </span>stored<span class="w"> </span>as<span class="w"> </span>parquet<span class="w"> </span>location<span class="w"> </span><span class="s1">'gs://bucket/path/file.parquet'</span><span class="p">;</span>
<span class="m">0</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.374<span class="w"> </span>seconds.
❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span>test<span class="p">;</span>
+----------+----------+
@@ -599,6 +673,15 @@ DataFusion<span class="w"> </span>CLI<span class="w"> </span>v14.0.0
<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.171<span class="w"> </span>seconds.
</pre></div>
</div>
+<p>Details of the environment variables that can be used are:</p>
+<ul class="simple">
+<li><p>GOOGLE_SERVICE_ACCOUNT: location of service account file</p></li>
+<li><p>GOOGLE_SERVICE_ACCOUNT_PATH: (alias) location of service account file</p></li>
+<li><p>SERVICE_ACCOUNT: (alias) location of service account file</p></li>
+<li><p>GOOGLE_SERVICE_ACCOUNT_KEY: JSON serialized service account key</p></li>
+<li><p>GOOGLE_BUCKET: bucket name</p></li>
+<li><p>GOOGLE_BUCKET_NAME: (alias) bucket name</p></li>
+</ul>
</section>
<section id="commands">
<h2>Commands<a class="headerlink" href="#commands" title="Permalink to this heading">¶</a></h2>