You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2017/07/25 07:13:18 UTC

OS adapter and sqlsh

I had an idea last week and implemented it quickly over the weekend. You know how bash hackers write pipelines of operations like grep, sort, uniq, sed? Those are basically relational operations, but the pipelines are difficult to write because you’re dealing with space-separated strings. So, my idea was to allow people to write the same pipelines using SQL. Which meant making SQL easily available from the command line, and making the data sources of those operations (shell commands such as du, ps, git log) available as tables.

I call this the OS adapter, and the script that launches SQL from the command line is sqlsh. To find the 5 most prolific committers you’d type

$ git log | grep Author: | sort | uniq -c | sort -nr | head -5

and now you can instead type

$ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by 2 desc limit 5

and Calcite reads from the same data source and executes the query using its operators.

It’s ready to commit. Can someone please review https://issues.apache.org/jira/browse/CALCITE-1896 <https://issues.apache.org/jira/browse/CALCITE-1896>?

It would be great to get contributions to this. Adding new data sources (/etc/passwd, netstat, the file system, apt, the maven repo) should be fairly straightforward. 

Julian

Re: OS adapter and sqlsh

Posted by Josh Elser <el...@apache.org>.

Thanks, Julian!

Alas, if only OSX had remotely similar tooling as Linux :)

On 7/26/17 1:36 AM, Julian Hyde wrote:
> Nice! Thanks Josh. I'll commit it when I commit the other stuff
> (keeping you as author of course).
> 
> I see that your "netstat" table has different column names on macOS
> and Linux. I pondered that one. I can see arguments both ways.
> 
> The principle of least surprise says give the user -- likely an
> administrator -- the names local to their system. In other words I
> think you're right.
> 
> I just pushed a 'files' command. It only works on Linux currently.
> Linux's find has a nice '-printf' option, but for macOS I'll need to
> do 'find | xargs stat'.
> 
> Julian
> 
> 
> 
> On Tue, Jul 25, 2017 at 2:13 PM, Josh Elser <el...@apache.org> wrote:
>> Not sure how useful it is, but it's something! Wrote a little function to
>> wrap `vmstat` and got it working on OSX and linux.
>>
>> https://github.com/joshelser/calcite/commit/01980a96acab47be2692d188ece2a6fa6bec08ae
>>
>> Would be happy to see it land in Calcite, either with your commit, Julian,
>> or I can do it after.
>>
>>
>> On 7/25/17 11:32 AM, Josh Elser wrote:
>>>
>>> I (think?) I have some free time, so let me play around with this today :)
>>>
>>> On 7/25/17 3:13 AM, Julian Hyde wrote:
>>>>
>>>> I had an idea last week and implemented it quickly over the weekend. You
>>>> know how bash hackers write pipelines of operations like grep, sort, uniq,
>>>> sed? Those are basically relational operations, but the pipelines are
>>>> difficult to write because you’re dealing with space-separated strings. So,
>>>> my idea was to allow people to write the same pipelines using SQL. Which
>>>> meant making SQL easily available from the command line, and making the data
>>>> sources of those operations (shell commands such as du, ps, git log)
>>>> available as tables.
>>>>
>>>> I call this the OS adapter, and the script that launches SQL from the
>>>> command line is sqlsh. To find the 5 most prolific committers you’d type
>>>>
>>>> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
>>>>
>>>> and now you can instead type
>>>>
>>>> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by
>>>> 2 desc limit 5
>>>>
>>>> and Calcite reads from the same data source and executes the query using
>>>> its operators.
>>>>
>>>> It’s ready to commit. Can someone please review
>>>> https://issues.apache.org/jira/browse/CALCITE-1896
>>>> <https://issues.apache.org/jira/browse/CALCITE-1896>?
>>>>
>>>> It would be great to get contributions to this. Adding new data sources
>>>> (/etc/passwd, netstat, the file system, apt, the maven repo) should be
>>>> fairly straightforward.
>>>>
>>>> Julian
>>>>
>>>>
>>>>
>>>>
>>

Re: OS adapter and sqlsh

Posted by Julian Hyde <jh...@apache.org>.

Nice! Thanks Josh. I'll commit it when I commit the other stuff
(keeping you as author of course).

I see that your "netstat" table has different column names on macOS
and Linux. I pondered that one. I can see arguments both ways.

The principle of least surprise says give the user -- likely an
administrator -- the names local to their system. In other words I
think you're right.

I just pushed a 'files' command. It only works on Linux currently.
Linux's find has a nice '-printf' option, but for macOS I'll need to
do 'find | xargs stat'.

Julian



On Tue, Jul 25, 2017 at 2:13 PM, Josh Elser <el...@apache.org> wrote:
> Not sure how useful it is, but it's something! Wrote a little function to
> wrap `vmstat` and got it working on OSX and linux.
>
> https://github.com/joshelser/calcite/commit/01980a96acab47be2692d188ece2a6fa6bec08ae
>
> Would be happy to see it land in Calcite, either with your commit, Julian,
> or I can do it after.
>
>
> On 7/25/17 11:32 AM, Josh Elser wrote:
>>
>> I (think?) I have some free time, so let me play around with this today :)
>>
>> On 7/25/17 3:13 AM, Julian Hyde wrote:
>>>
>>> I had an idea last week and implemented it quickly over the weekend. You
>>> know how bash hackers write pipelines of operations like grep, sort, uniq,
>>> sed? Those are basically relational operations, but the pipelines are
>>> difficult to write because you’re dealing with space-separated strings. So,
>>> my idea was to allow people to write the same pipelines using SQL. Which
>>> meant making SQL easily available from the command line, and making the data
>>> sources of those operations (shell commands such as du, ps, git log)
>>> available as tables.
>>>
>>> I call this the OS adapter, and the script that launches SQL from the
>>> command line is sqlsh. To find the 5 most prolific committers you’d type
>>>
>>> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
>>>
>>> and now you can instead type
>>>
>>> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by
>>> 2 desc limit 5
>>>
>>> and Calcite reads from the same data source and executes the query using
>>> its operators.
>>>
>>> It’s ready to commit. Can someone please review
>>> https://issues.apache.org/jira/browse/CALCITE-1896
>>> <https://issues.apache.org/jira/browse/CALCITE-1896>?
>>>
>>> It would be great to get contributions to this. Adding new data sources
>>> (/etc/passwd, netstat, the file system, apt, the maven repo) should be
>>> fairly straightforward.
>>>
>>> Julian
>>>
>>>
>>>
>>>
>

Re: OS adapter and sqlsh

Posted by Josh Elser <el...@apache.org>.

Not sure how useful it is, but it's something! Wrote a little function 
to wrap `vmstat` and got it working on OSX and linux.

https://github.com/joshelser/calcite/commit/01980a96acab47be2692d188ece2a6fa6bec08ae

Would be happy to see it land in Calcite, either with your commit, 
Julian, or I can do it after.

On 7/25/17 11:32 AM, Josh Elser wrote:
> I (think?) I have some free time, so let me play around with this today :)
> 
> On 7/25/17 3:13 AM, Julian Hyde wrote:
>> I had an idea last week and implemented it quickly over the weekend. 
>> You know how bash hackers write pipelines of operations like grep, 
>> sort, uniq, sed? Those are basically relational operations, but the 
>> pipelines are difficult to write because you’re dealing with 
>> space-separated strings. So, my idea was to allow people to write the 
>> same pipelines using SQL. Which meant making SQL easily available from 
>> the command line, and making the data sources of those operations 
>> (shell commands such as du, ps, git log) available as tables.
>>
>> I call this the OS adapter, and the script that launches SQL from the 
>> command line is sqlsh. To find the 5 most prolific committers you’d type
>>
>> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
>>
>> and now you can instead type
>>
>> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order 
>> by 2 desc limit 5
>>
>> and Calcite reads from the same data source and executes the query 
>> using its operators.
>>
>> It’s ready to commit. Can someone please review 
>> https://issues.apache.org/jira/browse/CALCITE-1896 
>> <https://issues.apache.org/jira/browse/CALCITE-1896>?
>>
>> It would be great to get contributions to this. Adding new data 
>> sources (/etc/passwd, netstat, the file system, apt, the maven repo) 
>> should be fairly straightforward.
>>
>> Julian
>>
>>
>>
>>

Re: OS adapter and sqlsh

Posted by Julian Hyde <jh...@apache.org>.

Yeah, someone on twitter mentioned osquery. I'd never heard of it. But
I guess I'm not in osquery's target audience, namely sys admins.

On Tue, Jul 25, 2017 at 9:04 AM, Jacques Nadeau <ja...@apache.org> wrote:
> This is cool. Not really the same but reminds me a bit of osquery?
>
> https://osquery.io/
>
> Wonder what kind of integration could happen there...
>
> On Tue, Jul 25, 2017 at 8:32 AM, Josh Elser <el...@apache.org> wrote:
>
>> I (think?) I have some free time, so let me play around with this today :)
>>
>>
>> On 7/25/17 3:13 AM, Julian Hyde wrote:
>>
>>> I had an idea last week and implemented it quickly over the weekend. You
>>> know how bash hackers write pipelines of operations like grep, sort, uniq,
>>> sed? Those are basically relational operations, but the pipelines are
>>> difficult to write because you’re dealing with space-separated strings. So,
>>> my idea was to allow people to write the same pipelines using SQL. Which
>>> meant making SQL easily available from the command line, and making the
>>> data sources of those operations (shell commands such as du, ps, git log)
>>> available as tables.
>>>
>>> I call this the OS adapter, and the script that launches SQL from the
>>> command line is sqlsh. To find the 5 most prolific committers you’d type
>>>
>>> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
>>>
>>> and now you can instead type
>>>
>>> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by
>>> 2 desc limit 5
>>>
>>> and Calcite reads from the same data source and executes the query using
>>> its operators.
>>>
>>> It’s ready to commit. Can someone please review
>>> https://issues.apache.org/jira/browse/CALCITE-1896 <
>>> https://issues.apache.org/jira/browse/CALCITE-1896>?
>>>
>>> It would be great to get contributions to this. Adding new data sources
>>> (/etc/passwd, netstat, the file system, apt, the maven repo) should be
>>> fairly straightforward.
>>>
>>> Julian
>>>
>>>
>>>
>>>
>>>

Re: OS adapter and sqlsh

Posted by Jacques Nadeau <ja...@apache.org>.

This is cool. Not really the same but reminds me a bit of osquery?

https://osquery.io/

Wonder what kind of integration could happen there...

On Tue, Jul 25, 2017 at 8:32 AM, Josh Elser <el...@apache.org> wrote:

> I (think?) I have some free time, so let me play around with this today :)
>
>
> On 7/25/17 3:13 AM, Julian Hyde wrote:
>
>> I had an idea last week and implemented it quickly over the weekend. You
>> know how bash hackers write pipelines of operations like grep, sort, uniq,
>> sed? Those are basically relational operations, but the pipelines are
>> difficult to write because you’re dealing with space-separated strings. So,
>> my idea was to allow people to write the same pipelines using SQL. Which
>> meant making SQL easily available from the command line, and making the
>> data sources of those operations (shell commands such as du, ps, git log)
>> available as tables.
>>
>> I call this the OS adapter, and the script that launches SQL from the
>> command line is sqlsh. To find the 5 most prolific committers you’d type
>>
>> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
>>
>> and now you can instead type
>>
>> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by
>> 2 desc limit 5
>>
>> and Calcite reads from the same data source and executes the query using
>> its operators.
>>
>> It’s ready to commit. Can someone please review
>> https://issues.apache.org/jira/browse/CALCITE-1896 <
>> https://issues.apache.org/jira/browse/CALCITE-1896>?
>>
>> It would be great to get contributions to this. Adding new data sources
>> (/etc/passwd, netstat, the file system, apt, the maven repo) should be
>> fairly straightforward.
>>
>> Julian
>>
>>
>>
>>
>>

Re: OS adapter and sqlsh

Posted by Josh Elser <el...@apache.org>.

I (think?) I have some free time, so let me play around with this today :)

On 7/25/17 3:13 AM, Julian Hyde wrote:
> I had an idea last week and implemented it quickly over the weekend. You know how bash hackers write pipelines of operations like grep, sort, uniq, sed? Those are basically relational operations, but the pipelines are difficult to write because you’re dealing with space-separated strings. So, my idea was to allow people to write the same pipelines using SQL. Which meant making SQL easily available from the command line, and making the data sources of those operations (shell commands such as du, ps, git log) available as tables.
> 
> I call this the OS adapter, and the script that launches SQL from the command line is sqlsh. To find the 5 most prolific committers you’d type
> 
> $ git log | grep Author: | sort | uniq -c | sort -nr | head -5
> 
> and now you can instead type
> 
> $ ./sqlsh select author, count\(\*\) from git_commits group by 1 order by 2 desc limit 5
> 
> and Calcite reads from the same data source and executes the query using its operators.
> 
> It’s ready to commit. Can someone please review https://issues.apache.org/jira/browse/CALCITE-1896 <https://issues.apache.org/jira/browse/CALCITE-1896>?
> 
> It would be great to get contributions to this. Adding new data sources (/etc/passwd, netstat, the file system, apt, the maven repo) should be fairly straightforward.
> 
> Julian
> 
> 
> 
>