You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/13 13:38:27 UTC

[GitHub] [airflow] potiuk commented on a diff in pull request #28322: Fix output of error in K8s venv installation

potiuk commented on code in PR #28322:
URL: https://github.com/apache/airflow/pull/28322#discussion_r1047166815


##########
dev/breeze/src/airflow_breeze/utils/kubernetes_utils.py:
##########
@@ -309,14 +309,10 @@ def _install_packages_in_k8s_virtualenv(with_constraints: bool):
                 f"constraints-{sys.version_info.major}.{sys.version_info.minor}.txt",
             ]
         )
-    install_packages_result = run_command(
-        install_command,
-        check=False,
-        capture_output=True,
-    )
+    install_packages_result = run_command(install_command, check=False, capture_output=True, text=True)

Review Comment:
   Yes. I rarely do something without earlier reasoning :)
   
   The main reason is `subptoces.run` compatibility. Breeze'\s `run_command` is rather thin wrapper over "subprocess.run"
   
   Similarly as in other cases, all the other parameters are transparently passed to underlying Popen calls. 
   
   I only get few parameters that add extra behaviour on verbose/dry_run and better display on the CI (for example allow to manage "group output"), and add a feew more flags that might change the behaviour, depending on the context. It also allows to handle `dry-run` and simulate execution of the command without actually doing it.
   
   The `--dry-run`  and `--verbose` flags are documented here: https://github.com/apache/airflow/blob/main/dev/breeze/doc/adr/0011-unified-communication-with-the-users.md
   
   > We should also support --verbose flag to display all the commands being executed, and the --dry-run flag to only display commands and not execute the commands. This will allow to diagnose any problems that the user might have.
   
   
   This is the signature of the function:
   
   ```
   def run_command(
       cmd: list[str],
       title: str | None = None,
       *,
       check: bool = True,
       no_output_dump_on_exception: bool = False,
       env: Mapping[str, str] | None = None,
       cwd: Path | None = None,
       input: str | None = None,
       output: Output | None = None,
       output_outside_the_group: bool = False,
       verbose_override: bool | None = None,
       dry_run_override: bool | None = None,
       **kwargs,
   ) -> RunCommandResult:
   ```
   
   And this runs this under-the-hood:
   
   ```
   subprocess.run(cmd, input=input, check=check, env=cmd_env, cwd=workdir, **kwargs)
   ```
   
   The subprocess `text` parameter behaviour is rather subtle. From the docs https://docs.python.org/3/library/subprocess.html
   
   > If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the [io.TextIOWrapper](https://docs.python.org/3/library/io.html#io.TextIOWrapper) default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.
   
   I guess the main reason for Python maintainer to choose text as not being the default is performance and memory usage. our outputs are often quite huge (especially when there are errors dumped), so I think  this is a good choice. 
   
   U actually considered changing this behaviour, but I decided compatibility with the underlying run method's behaviour and performance reasons are worth occasional forgetting to add text=True when needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org