You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@drill.apache.org by br...@apache.org on 2015/07/08 01:07:39 UTC

[1/2] drill git commit: add Jason's examples

Repository: drill
Updated Branches:
  refs/heads/gh-pages 558caec08 -> 02af103d9


add Jason's examples


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a9b1831f
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a9b1831f
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a9b1831f

Branch: refs/heads/gh-pages
Commit: a9b1831f13b34476a820cd9f34ee663b7323cc7c
Parents: c7e53cb
Author: Kristine Hahn <kh...@maprtech.com>
Authored: Mon Jul 6 13:10:53 2015 -0700
Committer: Kristine Hahn <kh...@maprtech.com>
Committed: Mon Jul 6 13:13:10 2015 -0700

----------------------------------------------------------------------
 .../060-text-files-csv-tsv-psv.md               |  89 ++++++++++++++++---
 _docs/img/csv_no_header.png                     | Bin 0 -> 10877 bytes
 _docs/img/csv_with_comments.png                 | Bin 0 -> 14602 bytes
 _docs/img/csv_with_escape.png                   | Bin 0 -> 14171 bytes
 _docs/img/csv_with_header.png                   | Bin 0 -> 12451 bytes
 5 files changed, 79 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/a9b1831f/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
----------------------------------------------------------------------
diff --git a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
index b3dce16..5b0d874 100644
--- a/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
+++ b/_docs/data-sources-and-file-formats/060-text-files-csv-tsv-psv.md
@@ -6,7 +6,7 @@ parent: "Data Sources and File Formats"
 Best practices for reading text files are:
 
 * Select data from particular columns  
-* CAST data  
+* Cast data
 * Use a distributed file system  
 
 ### Select Data from Particular Columns
@@ -29,16 +29,16 @@ You can also improve performance by casting the VARCHAR data to INT, FLOAT, DATE
 Using a distributed file system, such as HDFS, instead of a local file system to query the files also improves performance because currently Drill does not split files on block splits.
 
 ## Configuring Drill to Read Text Files
-In the storage plugin configuration, you can set the following attributes that affect how Drill reads CSV, TSV, PSV (comma-, tab-, pipe-separated) files.  ["List of Attributes and Definitions"]({{site.baseurl}}/docs/plugin-configuration-basics/#list-of-attributes-and-definitions):
+In the storage plugin configuration, you can set the following attributes that affect how Drill reads CSV, TSV, PSV (comma-, tab-, pipe-separated) files.  
 
 * String lineDelimiter = "\n";  
   One or more characters used to denote a new record. Allows reading files with windows line endings.  
 * char fieldDelimiter = ',';  
   A single character used to separate each value.  
 * char quote = '"';  
-  A single character used to start/end a quoted value.  
+  A single character used to start/end a value enclosed in quotation marks.  
 * char escape = '"';  
-  A single character used to escape a quote inside of a value.  
+  A single character used to escape a quototation mark inside of a value.  
 * char comment = '#';  
   A single character used to denote a comment line.  
 * boolean skipFirstLine = false;  
@@ -49,9 +49,7 @@ For more information about storage plugin configuration, see ["List of Attribute
 You can deal with a mix of text files with and without headers either by creating two separate format plugins or by creating two format plugins within the same storage plugin. The former approach is typically easier than the latter.
 
 ### Creating Two Separate Format Plugins
-Format plugins are associated with a particular storage plugin. Storage plugins define a root directory that Drill targets when using the storage plugin. You can define separate storage plugins for different root directories, and define each of the format attributes to match the files stored below that directory. All files can use the .csv extension.
-
-For example:
+Format plugins are associated with a particular storage plugin. Storage plugins define a root directory that Drill targets when using the storage plugin. You can define separate storage plugins for different root directories, and define each of the format attributes to match the files stored below that directory. All files can use the .csv extension, as shown in the following example:
 
 Storage Plugin A
 
@@ -78,9 +76,7 @@ Storage Plugin B
     },
 
 ### Creating Two Format Plugins within the Same Storage Plugin
-Give a different extension to files with a header and to files without a header, and use a storage plugin that looks something like the following example. This method requires renaming some files to use the csv2 extension.
-
-For example:
+Give a different extension to files with a header and to files without a header, and use a storage plugin that looks something like the following example. This method requires renaming some files to use the csv2 extension, as shown in the following example:
 
     "csv": {
       "type": "text",
@@ -99,3 +95,76 @@ For example:
       "delimiter": ","
     },
 
+## Examples of Querying Text Files
+The examples in this section show the results of querying CSV files that use and do not use a header, include comments, and use an escape character:
+
+### Using a Header in a File
+
+![CSV with header]({{ site.baseurl }}/docs/img/csv_with_header.png)
+
+    0: jdbc:drill:zk=local> SELECT * FROM dfs.`/tmp/csv_with_header.csv2`;
+    +------------------------+
+    |        columns         |
+    +------------------------+
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    +------------------------+
+
+### Not Using a Header in a File
+
+![CSV no header]({{ site.baseurl }}/docs/img/csv_no_header.png)
+
+    0: jdbc:drill:zk=local> SELECT * FROM dfs.`/tmp/csv_no_header.csv`;
+    +------------------------+
+    |        columns         |
+    +------------------------+
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    +------------------------+
+    7 rows selected (0.112 seconds)
+
+### Escaping a Character in a File
+
+![CSV with escape]({{ site.baseurl }}/docs/img/csv_with_escape.png)
+
+    0: jdbc:drill:zk=local> SELECT * FROM dfs.`/tmp/csv_with_escape.csv`;
+    +------------------------------------------------------------------------+
+    |                                columns                                 |
+    +------------------------------------------------------------------------+
+    | ["hello","1","2","3 \" double quote is the default escape character"]  |
+    | ["hello","1","2","3"]                                                  |
+    | ["hello","1","2","3"]                                                  |
+    | ["hello","1","2","3"]                                                  |
+    | ["hello","1","2","3"]                                                  |
+    | ["hello","1","2","3"]                                                  |
+    | ["hello","1","2","3"]                                                  |
+    +------------------------------------------------------------------------+
+    7 rows selected (0.104 seconds)
+
+### Adding Comments to a File
+
+![CSV with comments]({{ site.baseurl }}/docs/img/csv_with_comments.png)
+
+    0: jdbc:drill:zk=local> SELECT * FROM dfs.`/tmp/csv_with_comments.csv2`;
+    +------------------------+
+    |        columns         |
+    +------------------------+
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    | ["hello","1","2","3"]  |
+    +------------------------+
+    7 rows selected (0.111 seconds)

http://git-wip-us.apache.org/repos/asf/drill/blob/a9b1831f/_docs/img/csv_no_header.png
----------------------------------------------------------------------
diff --git a/_docs/img/csv_no_header.png b/_docs/img/csv_no_header.png
new file mode 100644
index 0000000..121e250
Binary files /dev/null and b/_docs/img/csv_no_header.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/a9b1831f/_docs/img/csv_with_comments.png
----------------------------------------------------------------------
diff --git a/_docs/img/csv_with_comments.png b/_docs/img/csv_with_comments.png
new file mode 100644
index 0000000..3f23eb7
Binary files /dev/null and b/_docs/img/csv_with_comments.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/a9b1831f/_docs/img/csv_with_escape.png
----------------------------------------------------------------------
diff --git a/_docs/img/csv_with_escape.png b/_docs/img/csv_with_escape.png
new file mode 100644
index 0000000..31b5d1f
Binary files /dev/null and b/_docs/img/csv_with_escape.png differ

http://git-wip-us.apache.org/repos/asf/drill/blob/a9b1831f/_docs/img/csv_with_header.png
----------------------------------------------------------------------
diff --git a/_docs/img/csv_with_header.png b/_docs/img/csv_with_header.png
new file mode 100644
index 0000000..d15b3c6
Binary files /dev/null and b/_docs/img/csv_with_header.png differ


[2/2] drill git commit: Merge remote-tracking branch 'remotes/bbevens/gh-pages' into gh-pages

Posted by br...@apache.org.
Merge remote-tracking branch 'remotes/bbevens/gh-pages' into gh-pages


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/02af103d
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/02af103d
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/02af103d

Branch: refs/heads/gh-pages
Commit: 02af103d9b790323a884748fab5293197f992036
Parents: 558caec a9b1831
Author: Bridget Bevens <bb...@maprtech.com>
Authored: Tue Jul 7 16:06:41 2015 -0700
Committer: Bridget Bevens <bb...@maprtech.com>
Committed: Tue Jul 7 16:06:41 2015 -0700

----------------------------------------------------------------------
 .../060-text-files-csv-tsv-psv.md               |  89 ++++++++++++++++---
 _docs/img/csv_no_header.png                     | Bin 0 -> 10877 bytes
 _docs/img/csv_with_comments.png                 | Bin 0 -> 14602 bytes
 _docs/img/csv_with_escape.png                   | Bin 0 -> 14171 bytes
 _docs/img/csv_with_header.png                   | Bin 0 -> 12451 bytes
 5 files changed, 79 insertions(+), 10 deletions(-)
----------------------------------------------------------------------