You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Aakcht (via GitHub)" <gi...@apache.org> on 2023/02/03 08:48:33 UTC

[GitHub] [airflow] Aakcht opened a new pull request, #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Aakcht opened a new pull request, #29347:
URL: https://github.com/apache/airflow/pull/29347

   This PR adds possibility to set cmd_timeout on ssh connection(including setting it to `''` meaning no timeout). I tried to keep the previous behavior of SSH hook/operator the same. 
   
   closes: #29282
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on a diff in pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on code in PR #29347:
URL: https://github.com/apache/airflow/pull/29347#discussion_r1111214874


##########
airflow/providers/ssh/hooks/ssh.py:
##########
@@ -173,6 +182,14 @@ def __init__(
                 if "conn_timeout" in extra_options and self.conn_timeout is None:
                     self.conn_timeout = int(extra_options["conn_timeout"])
 
+                if self.cmd_timeout is None:
+                    if "cmd_timeout" in extra_options:
+                        self.cmd_timeout = (
+                            int(extra_options["cmd_timeout"]) if extra_options["cmd_timeout"] else None
+                        )
+                    else:
+                        self.cmd_timeout = CMD_TIMEOUT
+

Review Comment:
   Hi, @Taragolis  ! Good point about inconsistency between direct hook parameters and connection parameters - I changed  it to `negative value means no timeout` (and now it's possible to set it both in the hook and in the connection).
   
   However I'm not sure I agree with NOTSET approach. It will (somewhat) bring back the issue with inconsistency between hook/connection parameters when you try to turn off cmd_timeout. Because in this case:
   
   - 	To pass "no timeout" via hook parameters you should pass `None` as a hook argument.
   - 	To pass "no timeout" via connection parameters you should specify `null` in connection extra: `{"cmd_timeout": null}`. (It's possible to use empty string or negative values instead of null here, but I feel like it'll be even more inconsistent with hook parameters).
   
   I fill like specifying a parameter in connection extra and setting it to null is a bit counterintuitive. If you think it's ok, I'll change it to NOTSET approach, but I feel like negative values both in hook and connection for `no timeout` would be a little more clear.
   
   The unittests are already present at https://github.com/apache/airflow/pull/29347/files#diff-8a4026fba1b53549c99ae768b6308aefddee93e54f3c8105aa454a15ee7cc075R824 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29347:
URL: https://github.com/apache/airflow/pull/29347#discussion_r1111203656


##########
airflow/providers/ssh/hooks/ssh.py:
##########
@@ -173,6 +182,14 @@ def __init__(
                 if "conn_timeout" in extra_options and self.conn_timeout is None:
                     self.conn_timeout = int(extra_options["conn_timeout"])
 
+                if self.cmd_timeout is None:
+                    if "cmd_timeout" in extra_options:
+                        self.cmd_timeout = (
+                            int(extra_options["cmd_timeout"]) if extra_options["cmd_timeout"] else None
+                        )
+                    else:
+                        self.cmd_timeout = CMD_TIMEOUT
+

Review Comment:
   Just for a record, we have a specific sentinel type and value, which use when None as default value for argument is not an option.
   
   https://github.com/apache/airflow/blob/d67ac5932dabbf06ae733fc57b48491a8029b8c2/airflow/utils/types.py#L28-L44
   
   
   So potentially we could use it here, e.g.:
   
   If hook argument cmd_timeout is NOTSET and Connection extra argument set than use from connection
   If hook argument cmd_timeout is NOTSET and Connection extra argument **not set** than use CMD_TIMEOUT 
   If hook argument set to any value rather than NOTSET than use this value and ignore value from connection
   
   
   And of course we need to tests all combination in unittests, just for sure that we do not break anything as well as we correctly process every possible case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis merged pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis merged PR #29347:
URL: https://github.com/apache/airflow/pull/29347


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on a diff in pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on code in PR #29347:
URL: https://github.com/apache/airflow/pull/29347#discussion_r1111290081


##########
airflow/providers/ssh/hooks/ssh.py:
##########
@@ -173,6 +182,14 @@ def __init__(
                 if "conn_timeout" in extra_options and self.conn_timeout is None:
                     self.conn_timeout = int(extra_options["conn_timeout"])
 
+                if self.cmd_timeout is None:
+                    if "cmd_timeout" in extra_options:
+                        self.cmd_timeout = (
+                            int(extra_options["cmd_timeout"]) if extra_options["cmd_timeout"] else None
+                        )
+                    else:
+                        self.cmd_timeout = CMD_TIMEOUT
+

Review Comment:
   As per suggestion, changed to `None/null means no timeout`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1434481893

   Hi, @ferruzzi do you have any questions left? I'd be happy to help. 
   
   @eladkal @potiuk I will be glad could take a look at this PR too, cause I see you reviewed #27184, which is related to this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1431843459

   I think I need a little more context here.   It looks like this PR is adding the option to set the command timeout for the session rather than for each command.  Is that correct?  And if the user sets a command timeout in a specific command then that will override the session default you've set?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1436048563

   Looks like the unsuccessful check is not related to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1439863842

   Hi, @pollynesterovich . This PR is not related to the chart since it added a fix to airflow provider. So as I understand, for it to get to the chart, firstly a new 3.5.0 [SSH provider version](https://airflow.apache.org/docs/apache-airflow-providers-ssh/stable/index.html#changelog) should get released. After that you'll have two options:
   * you can manually [build a new airflow docker image](https://airflow.apache.org/docs/docker-stack/build.html#quick-start-scenarios-of-image-extending) with the new SSH provider version and specify this image in the [chart installation parameters](https://airflow.apache.org/docs/helm-chart/stable/parameters-ref.html#images).
   * you can wait till a new airflow version and airflow docker image with this version will get released (airflow images usually include latest released by that time SSH provider version). Then you should add this new docker image to chart installation parameters manually. (Or you can wait a bit more, then a new chart version will also get released, where new airflow docker image will be set in installation parameters by default).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1436056370

   @Taragolis  - are you ok ? This one LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ferruzzi commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "ferruzzi (via GitHub)" <gi...@apache.org>.
ferruzzi commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1442274063

   @Taragolis Thanks for picking up the review on this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29347:
URL: https://github.com/apache/airflow/pull/29347#discussion_r1111243238


##########
airflow/providers/ssh/hooks/ssh.py:
##########
@@ -173,6 +182,14 @@ def __init__(
                 if "conn_timeout" in extra_options and self.conn_timeout is None:
                     self.conn_timeout = int(extra_options["conn_timeout"])
 
+                if self.cmd_timeout is None:
+                    if "cmd_timeout" in extra_options:
+                        self.cmd_timeout = (
+                            int(extra_options["cmd_timeout"]) if extra_options["cmd_timeout"] else None
+                        )
+                    else:
+                        self.cmd_timeout = CMD_TIMEOUT
+

Review Comment:
   > To pass "no timeout" via hook parameters you should pass None as a hook argument.
   To pass "no timeout" via connection parameters you should specify null in connection extra: {"cmd_timeout": null}. (It's possible to use empty string or negative values instead of null here, but I feel like it'll be even more inconsistent with hook parameters).
   
   I can't see any inconsistent here if you want so set no timeout, there is no problem to set it to `None` or json `null` explicitly, rather than use negative values which converted to `None` anyway. 
   
   NOTSET use as sentinel in situation when None value itself it is a legit value but for some reason we can't provide `None` as default value, and in this case it is 100% legit because  [paramico use `None` as no timeout](https://docs.paramiko.org/en/stable/api/client.html#paramiko.client.SSHClient.exec_command).
   
   Potentially this discussion about `spaces vs tabs`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] pollynesterovich commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "pollynesterovich (via GitHub)" <gi...@apache.org>.
pollynesterovich commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1438908550

   Please add these changes to the helm chart of the latest version 1.8.0 so it affects the work of dags.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on code in PR #29347:
URL: https://github.com/apache/airflow/pull/29347#discussion_r1111094280


##########
airflow/providers/ssh/hooks/ssh.py:
##########
@@ -173,6 +182,14 @@ def __init__(
                 if "conn_timeout" in extra_options and self.conn_timeout is None:
                     self.conn_timeout = int(extra_options["conn_timeout"])
 
+                if self.cmd_timeout is None:
+                    if "cmd_timeout" in extra_options:
+                        self.cmd_timeout = (
+                            int(extra_options["cmd_timeout"]) if extra_options["cmd_timeout"] else None
+                        )
+                    else:
+                        self.cmd_timeout = CMD_TIMEOUT
+

Review Comment:
   This seems like some inconsistent between direct hook parameters and Connection parameters.
   You can turn off cmd timeout only by connection, but not by hook parameters, it always fallback to `CMD_TIMEOUT`
   
   Usual direct hook parameters override connection



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Aakcht commented on pull request #29347: SSH Provider: Add cmd_timeout to ssh connection extra

Posted by "Aakcht (via GitHub)" <gi...@apache.org>.
Aakcht commented on PR #29347:
URL: https://github.com/apache/airflow/pull/29347#issuecomment-1431948430

   @ferruzzi I'd say that this PR is about the timeout for waiting for command reply. For example, consider the command 
   ```bash
   sleep 10
   ``` 
   (as actually used in the tests for this PR). If `cmd_timeout`>10, the command should execute successfully, if `cmd_timeout`<10 it'll fail.
    <br /> 
    <br /> 
   But this PR doesn't actually add this `cmd_timeout` and instead fixes a bit different problem. The `cmd_timeout` was already added before (in #27184), but only for SSH hook/operator. In that previous PR the `cmd_timeout` value was set by default to 10 seconds (instead of "no timeout" which was the previous behavior). 
   
   So this PR fixes two issues that appeared after #27184:
   
   1) It adds possibility to configure `cmd_timeout` not only in SSH hook/SSH operator, but also in SSH connection(so now you don't have to modify your DAG code to set it).
   2) It adds possibility to set it to `''` in SSH connection which means "no timeout" and restore the behavior that was in place before #27184.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org