You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2017/07/20 17:34:00 UTC

[jira] [Created] (CALCITE-1896) sqlsh: SQL access to shell commands, as a shell command

Julian Hyde created CALCITE-1896:
------------------------------------

             Summary: sqlsh: SQL access to shell commands, as a shell command
                 Key: CALCITE-1896
                 URL: https://issues.apache.org/jira/browse/CALCITE-1896
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde
            Assignee: Julian Hyde


Bourne shell has a toolkit of commands that allow you to do relational processing: grep = WHERE, sort = ORDER BY, uniq = SELECT DISTINCT, head = OFFSET/LIMIT, and so forth. Shell hackers are accustomed to writing pipelines. For example, to find the 3 largest files one would type

{noformat}
$ find . -type f -print0 |xargs -0 ls -l  | sort -nr -k 5 | head -3

-rw-r--r--  1 jhyde  staff  416028 Jul 16 10:06 ./core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java
-rw-r--r--  1 jhyde  staff  325727 Jun 28 11:48 ./core/src/test/java/org/apache/calcite/test/JdbcTest.java
-rw-r--r--  1 jhyde  staff  325106 Jul 16 10:06 ./core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml
{noformat}

SQL is not quite as concise but nevertheless:

{noformat}
$ sqlsh select \* from files where not directory order by size desc limit 3
{noformat}

{{sqlsh}} is "SQL shell", a wrapper around Calcite JDBC. {{files}} is a view backed by a table function that calls "find .". Other commands:
* {{du}} table function - e.g. {{select path, size_k from du}}
* {{git_ls_files}} table function calls git-ls-files - e.g. {{select * from git_ls_files}}
* {{ps}} table function calls {{ps aux}} - e.g. {{select * from ps}}
* {{wc}} function calls {{wc}} - e.g. {{select path, lineCount from git_ls_files cross apply wc(path)}}

SQL would run in a lexical mode which is case-sensitive, and identifiers are not upper-cased if not quoted.

We could consider allowing shell-safe characters such as '-' in unquoted identifiers. (It's difficult to quote identifiers in SQL if the SQL has already passed through bash's quote handling, and had double-quotes and single-quotes removed.)

It gets really interesting when commands accept arguments, and Calcite pushes down filters to become those arguments. For example, in

{code}sqlsh select distinct author from git_ls_files join git_commit_files using \(path\) join git_commits using \(commit\){code}

Calcite should rewrite to use {{git log}}.

When accessing files, some kinds of files have implicit names (and types) for fields, and some don't. You should be able to access fields by name or position; the following are equivalent:

{noformat}
sqlsh select gid from /etc/passwd where uid = 100
sqlsh select $4 from /etc/passwd where $3 = 100
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)