You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2023/09/19 10:42:52 UTC

[arrow-datafusion] branch asf-site updated: Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 2412dc4041 Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae
2412dc4041 is described below

commit 2412dc40413e04b3c8c184f5723e4762f5e9ff9b
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Tue Sep 19 10:42:46 2023 +0000

    Publish built docs triggered by bb6c57f81fc3648530ec81ac2a636e55b91238ae
---
 _sources/user-guide/cli.md.txt | 112 +++++++++++++++++++++++----------------
 searchindex.js                 |   2 +-
 user-guide/cli.html            | 117 ++++++++++++++++++++++++-----------------
 3 files changed, 136 insertions(+), 95 deletions(-)

diff --git a/_sources/user-guide/cli.md.txt b/_sources/user-guide/cli.md.txt
index e3a8cd74c3..e1f332baf3 100644
--- a/_sources/user-guide/cli.md.txt
+++ b/_sources/user-guide/cli.md.txt
@@ -23,49 +23,6 @@ The DataFusion CLI is a command-line interactive SQL utility for executing
 queries against any supported data files. It is a convenient way to
 try DataFusion's SQL support with your own data.
 
-## Example
-
-Create a CSV file to query.
-
-```shell
-$ echo "a,b" > data.csv
-$ echo "1,2" >> data.csv
-```
-
-Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)
-
-```shell
-$ datafusion-cli
-DataFusion CLI v17.0.0
-❯ select * from 'data.csv';
-+---+---+
-| a | b |
-+---+---+
-| 1 | 2 |
-+---+---+
-1 row in set. Query took 0.007 seconds.
-```
-
-You can also query directories of files with compatible schemas:
-
-```shell
-$ ls data_dir/
-data.csv   data2.csv
-```
-
-```shell
-$ datafusion-cli
-DataFusion CLI v16.0.0
-❯ select * from 'data_dir';
-+---+---+
-| a | b |
-+---+---+
-| 3 | 4 |
-| 1 | 2 |
-+---+---+
-2 rows in set. Query took 0.007 seconds.
-```
-
 ## Installation
 
 ### Install and run using Cargo
@@ -131,17 +88,64 @@ OPTIONS:
     -V, --version                           Print version information
 ```
 
-## Selecting files directly
+## Querying data from the files directly
 
 Files can be queried directly by enclosing the file or
 directory name in single `'` quotes as shown in the example.
 
+## Example
+
+Create a CSV file to query.
+
+```shell
+$ echo "a,b" > data.csv
+$ echo "1,2" >> data.csv
+```
+
+Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)
+
+```shell
+$ datafusion-cli
+DataFusion CLI v17.0.0
+❯ select * from 'data.csv';
++---+---+
+| a | b |
++---+---+
+| 1 | 2 |
++---+---+
+1 row in set. Query took 0.007 seconds.
+```
+
+You can also query directories of files with compatible schemas:
+
+```shell
+$ ls data_dir/
+data.csv   data2.csv
+```
+
+```shell
+$ datafusion-cli
+DataFusion CLI v16.0.0
+❯ select * from 'data_dir';
++---+---+
+| a | b |
++---+---+
+| 3 | 4 |
+| 1 | 2 |
++---+---+
+2 rows in set. Query took 0.007 seconds.
+```
+
+## Creating external tables
+
 It is also possible to create a table backed by files by explicitly
-via `CREATE EXTERNAL TABLE` as shown below.
+via `CREATE EXTERNAL TABLE` as shown below. Filemask wildcards supported
 
 ## Registering Parquet Data Sources
 
-Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. It is not necessary to provide schema information for Parquet files.
+Parquet data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement. The schema information will be derived automatically.
+
+Register a single file parquet datasource
 
 ```sql
 CREATE EXTERNAL TABLE taxi
@@ -149,6 +153,22 @@ STORED AS PARQUET
 LOCATION '/mnt/nyctaxi/tripdata.parquet';
 ```
 
+Register a single folder parquet datasource. All files inside must be valid parquet files!
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/';
+```
+
+Register a single folder parquet datasource by specifying a wildcard for files to read
+
+```sql
+CREATE EXTERNAL TABLE taxi
+STORED AS PARQUET
+LOCATION '/mnt/nyctaxi/*.parquet';
+```
+
 ## Registering CSV Data Sources
 
 CSV data sources can be registered by executing a `CREATE EXTERNAL TABLE` SQL statement.
diff --git a/searchindex.js b/searchindex.js
index 02ff2d66a4..47e9676b70 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
diff --git a/user-guide/cli.html b/user-guide/cli.html
index 8557f32b9c..857acb8420 100644
--- a/user-guide/cli.html
+++ b/user-guide/cli.html
@@ -346,11 +346,6 @@
 
 <nav id="bd-toc-nav">
     <ul class="visible nav section-nav flex-column">
- <li class="toc-h2 nav-item toc-entry">
-  <a class="reference internal nav-link" href="#example">
-   Example
-  </a>
- </li>
  <li class="toc-h2 nav-item toc-entry">
   <a class="reference internal nav-link" href="#installation">
    Installation
@@ -379,8 +374,18 @@
   </a>
  </li>
  <li class="toc-h2 nav-item toc-entry">
-  <a class="reference internal nav-link" href="#selecting-files-directly">
-   Selecting files directly
+  <a class="reference internal nav-link" href="#querying-data-from-the-files-directly">
+   Querying data from the files directly
+  </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#example">
+   Example
+  </a>
+ </li>
+ <li class="toc-h2 nav-item toc-entry">
+  <a class="reference internal nav-link" href="#creating-external-tables">
+   Creating external tables
   </a>
  </li>
  <li class="toc-h2 nav-item toc-entry">
@@ -469,43 +474,6 @@
 <p>The DataFusion CLI is a command-line interactive SQL utility for executing
 queries against any supported data files. It is a convenient way to
 try DataFusion’s SQL support with your own data.</p>
-<section id="example">
-<h2>Example<a class="headerlink" href="#example" title="Link to this heading">¶</a></h2>
-<p>Create a CSV file to query.</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;a,b&quot;</span><span class="w"> </span>&gt;<span class="w"> </span>data.csv
-$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;1,2&quot;</span><span class="w"> </span>&gt;&gt;<span class="w"> </span>data.csv
-</pre></div>
-</div>
-<p>Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
-DataFusion<span class="w"> </span>CLI<span class="w"> </span>v17.0.0
-❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">&#39;data.csv&#39;</span><span class="p">;</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
-</pre></div>
-</div>
-<p>You can also query directories of files with compatible schemas:</p>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>data_dir/
-data.csv<span class="w">   </span>data2.csv
-</pre></div>
-</div>
-<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
-DataFusion<span class="w"> </span>CLI<span class="w"> </span>v16.0.0
-❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">&#39;data_dir&#39;</span><span class="p">;</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="p">|</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="p">|</span>
-<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
-+---+---+
-<span class="m">2</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
-</pre></div>
-</div>
-</section>
 <section id="installation">
 <h2>Installation<a class="headerlink" href="#installation" title="Link to this heading">¶</a></h2>
 <section id="install-and-run-using-cargo">
@@ -568,21 +536,74 @@ OPTIONS:
 </pre></div>
 </div>
 </section>
-<section id="selecting-files-directly">
-<h2>Selecting files directly<a class="headerlink" href="#selecting-files-directly" title="Link to this heading">¶</a></h2>
+<section id="querying-data-from-the-files-directly">
+<h2>Querying data from the files directly<a class="headerlink" href="#querying-data-from-the-files-directly" title="Link to this heading">¶</a></h2>
 <p>Files can be queried directly by enclosing the file or
 directory name in single <code class="docutils literal notranslate"><span class="pre">'</span></code> quotes as shown in the example.</p>
+</section>
+<section id="example">
+<h2>Example<a class="headerlink" href="#example" title="Link to this heading">¶</a></h2>
+<p>Create a CSV file to query.</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;a,b&quot;</span><span class="w"> </span>&gt;<span class="w"> </span>data.csv
+$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;1,2&quot;</span><span class="w"> </span>&gt;&gt;<span class="w"> </span>data.csv
+</pre></div>
+</div>
+<p>Query that single file (the CLI also supports parquet, compressed csv, avro, json and more)</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v17.0.0
+❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">&#39;data.csv&#39;</span><span class="p">;</span>
++---+---+
+<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="m">1</span><span class="w"> </span>row<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
+</pre></div>
+</div>
+<p>You can also query directories of files with compatible schemas:</p>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>data_dir/
+data.csv<span class="w">   </span>data2.csv
+</pre></div>
+</div>
+<div class="highlight-shell notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>datafusion-cli
+DataFusion<span class="w"> </span>CLI<span class="w"> </span>v16.0.0
+❯<span class="w"> </span><span class="k">select</span><span class="w"> </span>*<span class="w"> </span>from<span class="w"> </span><span class="s1">&#39;data_dir&#39;</span><span class="p">;</span>
++---+---+
+<span class="p">|</span><span class="w"> </span>a<span class="w"> </span><span class="p">|</span><span class="w"> </span>b<span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="p">|</span><span class="w"> </span><span class="m">3</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="p">|</span>
+<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
++---+---+
+<span class="m">2</span><span class="w"> </span>rows<span class="w"> </span><span class="k">in</span><span class="w"> </span>set.<span class="w"> </span>Query<span class="w"> </span>took<span class="w"> </span><span class="m">0</span>.007<span class="w"> </span>seconds.
+</pre></div>
+</div>
+</section>
+<section id="creating-external-tables">
+<h2>Creating external tables<a class="headerlink" href="#creating-external-tables" title="Link to this heading">¶</a></h2>
 <p>It is also possible to create a table backed by files by explicitly
-via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below.</p>
+via <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> as shown below. Filemask wildcards supported</p>
 </section>
 <section id="registering-parquet-data-sources">
 <h2>Registering Parquet Data Sources<a class="headerlink" href="#registering-parquet-data-sources" title="Link to this heading">¶</a></h2>
-<p>Parquet data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement. It is not necessary to provide schema information for Parquet files.</p>
+<p>Parquet data sources can be registered by executing a <code class="docutils literal notranslate"><span class="pre">CREATE</span> <span class="pre">EXTERNAL</span> <span class="pre">TABLE</span></code> SQL statement. The schema information will be derived automatically.</p>
+<p>Register a single file parquet datasource</p>
 <div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
 <span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
 <span class="k">LOCATION</span><span class="w"> </span><span class="s1">&#39;/mnt/nyctaxi/tripdata.parquet&#39;</span><span class="p">;</span>
 </pre></div>
 </div>
+<p>Register a single folder parquet datasource. All files inside must be valid parquet files!</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">&#39;/mnt/nyctaxi/&#39;</span><span class="p">;</span>
+</pre></div>
+</div>
+<p>Register a single folder parquet datasource by specifying a wildcard for files to read</p>
+<div class="highlight-sql notranslate"><div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">EXTERNAL</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">taxi</span>
+<span class="n">STORED</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">PARQUET</span>
+<span class="k">LOCATION</span><span class="w"> </span><span class="s1">&#39;/mnt/nyctaxi/*.parquet&#39;</span><span class="p">;</span>
+</pre></div>
+</div>
 </section>
 <section id="registering-csv-data-sources">
 <h2>Registering CSV Data Sources<a class="headerlink" href="#registering-csv-data-sources" title="Link to this heading">¶</a></h2>