You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/23 01:06:52 UTC

[GitHub] [arrow-datafusion] matthewmturner opened a new pull request #1875: Add new option for running sql file and keeping df-cli open

matthewmturner opened a new pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1872 
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   Added a new command line option to `datafusion-cli` that lets you execute a sql file (for example with DDL) and then let's you continue to use the repl.
   
   for now i called the option `run` but that isnt great.
   
   Is it acceptable to change the existing `file` option?  For example we could have
   
   `file-exit` -> Exits after running file
   `file-run` -> Continues to repl after running file
   
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   Maybe a changed option.  If no change to that, then just a new command line option.
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1056409316


   @alamb also interested in your thoughts on this as i know you have mentioned using datafusion-cli


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1054689747


   @Jimexist any additional thoughts on the above?  If you are ok with it then ill move forward the following:
   
   no option => startup with ~/.datafusionrc
   `-f` / `--file` => existing use case, exits after running, aligned with psql
   `-r` / `--rc` => startup with selected file, i dont think the `-r` option is currently used by psql so shouldnt cause confusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1875: Add support for `~/.datafusionrc` and cli option for overriding it to datafusion-cli

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057910450


   Thanks for sticking with this @matthewmturner 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #1875: Add support for `~/.datafusionrc` and cli option for overriding it to datafusion-cli

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057113073


   Taking inspiration from`psql` , it implements a `\i` command (not a command line argument) for this usecase. It *includes* a file:
   
   https://www.postgresql.org/docs/13/app-psql.html
   
   > \i or \include filename
   > Reads input from the file filename and executes it as though it had been typed on the keyboard.
   > 
   > If filename is - (hyphen), then standard input is read until an EOF indication or \q meta-command. This can be used to intersperse interactive input with input from files. Note that Readline behavior will be used only if it is active at the outermost level.
   
   It appears that mysql has a similar command `source` or `\.` that does the same thing
   
   https://dev.mysql.com/doc/refman/8.0/en/mysql-batch-commands.html
   
   
   Thus I would recommend implementing a `\i` command (and possibly also a `source` command)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057365831


   Just to ensure were aligned the PR adds both `-r <file>` option and it defaults to looking for `~/.datafusionrc`.
   
   I also think that adding the `\i` command still makes sense.  Since its still helps achieve the intent of the issue i think its ok to just add it to this PR. Lmk if you think otherwise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057129101


   I thought we were actually already getting inspiriration from psql, see below:
   
   ```
   psqlrc and ~/.psqlrc
   
       Unless it is passed an -X option, psql attempts to read and execute commands from the system-wide startup file (psqlrc) and then the user's personal startup file (~/.psqlrc), after connecting to the database but before accepting normal commands. These files can be used to set up the client and/or the server to taste, typically with \set and SET commands.
   
       The system-wide startup file is named psqlrc and is sought in the installation's “system configuration” directory, which is most reliably identified by running pg_config --sysconfdir. By default this directory will be ../etc/ relative to the directory containing the PostgreSQL executables. The name of this directory can be set explicitly via the PGSYSCONFDIR environment variable.
   
       The user's personal startup file is named .psqlrc and is sought in the invoking user's home directory. On Windows, which lacks such a concept, the personal startup file is named %APPDATA%\postgresql\psqlrc.conf. The location of the user's startup file can be set explicitly via the PSQLRC environment variable.
   
       Both the system-wide startup file and the user's personal startup file can be made psql-version-specific by appending a dash and the PostgreSQL major or minor release number to the file name, for example ~/.psqlrc-9.2 or ~/.psqlrc-9.2.5. The most specific version-matching file will be read in preference to a non-version-specific file.
   ```
   
   Unless your point is that the above is more for options / configuration than it is for actually SQL / DDL.  Which I think makes sense... I can make that update.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1048496216


   I would be interested to hear what @Jimexist has to say about it. I am cool with `file-exit` and `file-run` rename. Other options are:
   
   1. use `bootstrap` as the argument name
   2. start using sub commands, then we can have `cli run QUERY_FILE` and `cli console --file QUERY_FILE`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1049596993


   i'm thinking of how this can be done in a more generic way.
   
   if you think of `psql`, it will on startup read and execute commands and sql statements from `.psqlrc`. then it's a good place to put common init setup for your local dev exp.
   
   probably we should read from `.datafusionrc` by default, and that's where you'd want to put `\pset` commands and common ddl commands?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1049338082


   Agree - would like @Jimexist opinion.  I've renamed to `file-run` and `file-exit` for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add support for `~/.datafusionrc` and cli option for overriding it to datafusion-cli

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057400058


   @alamb thanks much for the review and feedback.
   
   @Jimexist are we good to go on your end?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1057367315


   > I also think that adding the \i command still makes sense. Since its still helps achieve the intent of the issue i think its ok to just add it to this PR. Lmk if you think otherwise.
   
   
   I think it would also be fine to add as a follow on PR.  Updating this PR's description and title I think is important though as a way to communicate the change to others


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1875: Add new option for running sql file and keeping df-cli open

Posted by GitBox <gi...@apache.org>.
matthewmturner commented on pull request #1875:
URL: https://github.com/apache/arrow-datafusion/pull/1875#issuecomment-1049994245


   @Jimexist i think thats a great idea.  the only thing ill add, which i personally would find useful, is the option to provide a custom `.datafusionrc` file when starting up (default can of course be `~/.datafusionrc`.
   
   To give you some context, im expecting to setup different environments and would like to be able to choose which i want to use without having them all bundled.  I.e. I could have one for my main job that has ddl for those datasets, and then i could have another focused on arrow dev where i load db-benchmark, parquet-testing, and tpch data, etc.  In that case maybe we could have something like
   
   no option => startup with `~/.datafusionrc`
   `-f / --file` => existing use case, exits after running, aligned with psql
   `-r / --rc` => startup with selected file, i dont think the `-r` option is currently used by psql so shouldnt cause confusion
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org