You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2023/09/19 10:42:52 UTC
[arrow-datafusion] branch asf-site updated: Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae
This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2412dc4041 Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae
2412dc4041 is described below
commit 2412dc40413e04b3c8c184f5723e4762f5e9ff9b
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Tue Sep 19 10:42:46 2023 +0000
Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae
---
_sources/user-guide/cli.md.txt | 112 +++++++++++++++++++++++----------------
searchindex.js | 2 +-
user-guide/cli.html | 117 ++++++++++++++++++++++++-----------------
3 files changed, 136 insertions(+), 95 deletions(-)
diff --git a/_sources/user-guide/cli.md.txt b/_sources/user-guide/cli.md.txt
index e3a8cd74c3..e1f332baf3 100644
--- a/_sources/user-guide/cli.md.txt
+++ b/_sources/user-guide/cli.md.txt
@@ -23,49 +23,6 @@ The DataFusion CLI is a command-line interactive SQL utility for executing
queries against any supported data files. It is a convenient way to
try DataFusion's SQL support with your own data.
-## Example
-
-Create a CSV file to query.
-
-```shell
-$ echo "a,b" > data.csv
-$ echo "1,2" >> data.csv
-```
-
-Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)
-
-```shell
-$ datafusion-cli
-DataFusion CLI v17.0.0
-❯ select * from 'data.csv';
-+---+---+
-| a | b |
-+---+---+
-| 1 | 2 |
-+---+---+
-1 row in set. Query took 0.007 seconds.
-```
-
-You can also query directories of files with compatible schemas:
-
-```shell
-$ ls data_dir/
-data.csv data2.csv
-```
-
-```shell
-$ datafusion-cli
-DataFusion CLI v16.0.0
-❯ select * from 'data_dir';
-+---+---+
-| a | b |
-+---+---+
-| 3 | 4 |
-| 1 | 2 |
-+---+---+
-2 rows in set. Query took 0.007 seconds.
-```
-
## Installation
### Install and run using Cargo
@@ -131,17 +88,64 @@ OPTIONS:
-V, --version Print version information
```
-## Selecting files directly
+## Querying data from the files directly
Files can be queried directly by enclosing the file or
directory name in single `'` quotes as shown in the example.
+## Example
+
+Create a CSV file to query.
+
+```shell
+$ echo "a,b" > data.csv
+$ echo "1,2" >> data.csv
+```
+
+Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)
+
+```shell
+$ datafusion-cli
+DataFusion CLI v17.0.0
+❯ select * from 'data.csv';
++---+---+
+| a | b |
++---+---+
+| 1 | 2 |
++---+---+
+1 row in set. Query took 0.007 seconds.
+```
+
+You can also query directories of files with compatible schemas:
+
+```shell
+$ ls data_dir/
+data.csv data2.csv
+```
+
+```shell
+$ datafusion-cli
+DataFusion CLI v16.0.0
+❯ select * from 'data_dir';
++---+---+
+| a | b |
++---+---+
+| 3 | 4 |
+| 1 | 2 |
++---+---+
+2 rows in set. Query took 0.007 seconds.
+```
+
+## Creating external tables
+
It is also possible to create a table backed by files by explicitly
-via `CREATE EXTERNAL TABLE` as shown below.
+via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported
## Registering Parquet Data Sources
-Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files.
+Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema information will be derived automatically.
+
+Register a single file parquet datasource
```sql
CREATE EXTERNAL TABLE taxi
@@ -149,6 +153,22 @@ STORED AS PARQUET
LOCATION '/mnt/nyctaxi/tripdata.parquet';
```
+Register a single folder parquet datasource. All files inside must be valid parquet files!
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/';
+```
+
+Register a single folder parquet datasource by specifying a wildcard for files to read
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/*.parquet';
+```
+
## Registering CSV Data Sources
CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
diff --git a/searchindex.js b/searchindex.js
index 02ff2d66a4..47e9676b70 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
diff --git a/user-guide/cli.html b/user-guide/cli.html
index 8557f32b9c..857acb8420 100644
--- a/user-guide/cli.html
+++ b/user-guide/cli.html
@@ -346,11 +346,6 @@
<nav id="bd-toc-nav">
<ul class="visible nav section-nav flex-column">
- <li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#example">
- Example
- </a>
- </li>
<li class="toc-h2 nav-item toc-entry">
<a class="reference internal nav-link" href="#installation">
Installation
@@ -379,8 +374,18 @@
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
- <a class="reference internal nav-link" href="#selecting-files-directly">
- Selecting files directly
+ <a class="reference internal nav-link" href="#querying-data-from-the-files-directly">
+ Querying data from the files directly
+ </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#example">
+ Example
+ </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+ <a class="reference internal nav-link" href="#creating-external-tables">
+ Creating external tables
</a>
</li>
<li class="toc-h2 nav-item toc-entry">
@@ -469,43 +474,6 @@
<p>The DataFusion CLI is a command-line interactive SQL utility for executing
queries against any supported data files. It is a convenient way to
try DataFusion’s SQL support with your own data.</p>
-<section id="example">
-<h2>Example<a class="headerlink" href="#example" title="Link to this heading">¶</a></h2>
-<p>Create a CSV file to query.</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"a,b"</span><span class="w"> </span>><span class="w"> </span>data.csv
-$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"1,2"</span><span class="w"> </span>>><span class="w"> </span>data.csv
-</pre></div>
-</div>
-<p>Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
-DataFusion<span class="w"> </span>CLI<span class="w"> </span>v17.0.0
-❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">'data.csv'</span><span class="p">;</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
-</pre></div>
-</div>
-<p>You can also query directories of files with compatible schemas:</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>data_dir/
-data.csv<span class="w"> </span>data2.csv
-</pre></div>
-</div>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
-DataFusion<span class="w"> </span>CLI<span class="w"> </span>v16.0.0
-❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">'data_dir'</span><span class="p">;</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="p">|</span>
-<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="m">2</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
-</pre></div>
-</div>
-</section>
<section id="installation">
<h2>Installation<a class="headerlink" href="#installation" title="Link to this heading">¶</a></h2>
<section id="install-and-run-using-cargo">
@@ -568,21 +536,74 @@ OPTIONS:
</pre></div>
</div>
</section>
-<section id="selecting-files-directly">
-<h2>Selecting files directly<a class="headerlink" href="#selecting-files-directly" title="Link to this heading">¶</a></h2>
+<section id="querying-data-from-the-files-directly">
+<h2>Querying data from the files directly<a class="headerlink" href="#querying-data-from-the-files-directly" title="Link to this heading">¶</a></h2>
<p>Files can be queried directly by enclosing the file or
directory name in single <code class="docutils literal notranslate"><span class="pre">'</span></code> quotes as shown in the example.</p>
+</section>
+<section id="example">
+<h2>Example<a class="headerlink" href="#example" title="Link to this heading">¶</a></h2>
+<p>Create a CSV file to query.</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"a,b"</span><span class="w"> </span>><span class="w"> </span>data.csv
+$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"1,2"</span><span class="w"> </span>>><span class="w"> </span>data.csv
+</pre></div>
+</div>
+<p>Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v17.0.0
+❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">'data.csv'</span><span class="p">;</span>
++---+---+
+<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
+</pre></div>
+</div>
+<p>You can also query directories of files with compatible schemas:</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>data_dir/
+data.csv<span class="w"> </span>data2.csv
+</pre></div>
+</div>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v16.0.0
+❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">'data_dir'</span><span class="p">;</span>
++---+---+
+<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="p">|</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="p">|</span>
+<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="m">2</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
+</pre></div>
+</div>
+</section>
+<section id="creating-external-tables">
+<h2>Creating external tables<a class="headerlink" href="#creating-external-tables" title="Link to this heading">¶</a></h2>
<p>It is also possible to create a table backed by files by explicitly
-via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below.</p>
+via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below. Filemask wildcards supported</p>
</section>
<section id="registering-parquet-data-sources">
<h2>Registering Parquet Data Sources<a class="headerlink" href="#registering-parquet-data-sources" title="Link to this heading">¶</a></h2>
-<p>Parquet data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement. It is not necessary to provide schema information for Parquet files.</p>
+<p>Parquet data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement. The schema information will be derived automatically.</p>
+<p>Register a single file parquet datasource</p>
<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'/mnt/nyctaxi/tripdata.parquet'</span><span class="p">;</span>
</pre></div>
</div>
+<p>Register a single folder parquet datasource. All files inside must be valid parquet files!</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'/mnt/nyctaxi/'</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>Register a single folder parquet datasource by specifying a wildcard for files to read</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">'/mnt/nyctaxi/*.parquet'</span><span class="p">;</span>
+</pre></div>
+</div>
</section>
<section id="registering-csv-data-sources">
<h2>Registering CSV Data Sources<a class="headerlink" href="#registering-csv-data-sources" title="Link to this heading">¶</a></h2>