You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/05 22:21:29 UTC

[GitHub] [airflow] potiuk opened a new pull request, #24236: Add limited progress output for tests

potiuk opened a new pull request, #24236:
URL: https://github.com/apache/airflow/pull/24236

   This is the first step to run breeze tests in parallel in CI.
   This flag adds "limited progress" output when running tests
   which means that the runnig tests will just pring single line with
   percent progress, but when it completes, the whole output is
   printed in a CI group - colored depending on status.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #24236:
URL: https://github.com/apache/airflow/pull/24236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1147269715

   Note - that this one is optimized for our CI usage. We cannot keep log output in memory, because we often miss memory on GitHub Public runners. That's why each test "run" will simply write the output to a temporary file and we are using this file as the "progress status" source. By writing directly to file rather than getting the logs piped to our "managing" process we save precious memory for the actual test execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896907152


##########
dev/breeze/src/airflow_breeze/commands/configuration_and_maintenance_commands.py:
##########
@@ -157,9 +157,9 @@ def cleanup(verbose: bool, dry_run: bool, github_repository: str, all: bool, ans
         )
         images = command_result.stdout.splitlines() if command_result and command_result.stdout else []
         if images:
-            get_console().print("[light_blue]Removing images:[/]")
+            get_console().print("[info]Removing images:[/]")

Review Comment:
   Should these use the new Enum too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896863170


##########
dev/breeze/src/airflow_breeze/utils/ci_group.py:
##########
@@ -22,7 +22,7 @@
 
 
 @contextmanager
-def ci_group(title: str, enabled: bool = True):
+def ci_group(title: str, enabled: bool = True, message_type: str = "[info]"):

Review Comment:
   Right. Good point. Otherwise [/]  - might be wrong.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896852912


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -122,10 +226,19 @@ def docker_compose_tests(
 )
 @option_dry_run
 @option_verbose
+@option_python
+@option_backend
+@option_postgres_version
+@option_mysql_version
+@option_mssql_version
 @option_integration
+@click.option(
+    '--limit-progress-output',
+    help="Limit progress to percentage only and just show the summary when tests complete.",
+    is_flag=True,
+)
 @click.argument('extra_pytest_args', nargs=-1, type=click.UNPROCESSED)
 @click.option(
-    "-tt",

Review Comment:
   Just curious, why this removal?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r893250391


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -149,11 +268,39 @@ def tests(
         os.environ["LIST_OF_INTEGRATION_TESTS_TO_RUN"] = ' '.join(list(integration))
     if db_reset:
         os.environ["DB_RESET"] = "true"
-
-    exec_shell_params = ShellParams(verbose=verbose, dry_run=dry_run)
+    exec_shell_params = ShellParams(
+        verbose=verbose,
+        dry_run=dry_run,
+        python=python,
+        backend=backend,
+        postgres_version=postgres_version,
+        mysql_version=mysql_version,
+        mssql_version=mssql_version,
+    )
     env_variables = get_env_variables_for_docker_commands(exec_shell_params)
     perform_environment_checks(verbose=verbose)
     cmd = ['docker-compose', 'run', '--service-ports', '--rm', 'airflow']
     cmd.extend(list(extra_pytest_args))
-    result = run_command(cmd, verbose=verbose, dry_run=dry_run, env=env_variables, check=False)
+    version = (
+        mssql_version
+        if backend == "mssql"
+        else mysql_version
+        if backend == "mysql"
+        else postgres_version
+        if backend == "postgres"
+        else "none"
+    )
+    if limit_progress_output:
+        result = run_with_progress(
+            cmd=cmd,
+            env_variables=env_variables,
+            test_type=test_type,
+            python=python,
+            backend=backend,
+            version=version,
+            verbose=verbose,
+            dry_run=dry_run,
+        )
+    else:

Review Comment:
   @potiuk why does `limit_progress_output` check alone need to have `python`, `backend`, `version` passed?  I could see it's used for logging alone. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896856289


##########
dev/breeze/src/airflow_breeze/utils/ci_group.py:
##########
@@ -22,7 +22,7 @@
 
 
 @contextmanager
-def ci_group(title: str, enabled: bool = True):
+def ci_group(title: str, enabled: bool = True, message_type: str = "[info]"):

Review Comment:
   Should the arg passed just be the type name rather than also require the square brackets?
   
   ```python
   def ci_group(title: str, enabled: bool = True, message_type: str = "info"):
       ....
   
   # Handle print formatting later
   get_console().print(f"[{message_type}]{title}[/]")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1147468337

   Tt's green and good next step to get rid of the bash finally 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1148439436

   @potiuk I didn't make much progress in the issue https://github.com/apache/airflow/issues/23538 as I was stuck on what I have to do next in the PR to proceed further. So I started focussing on other PR's and missed updating any progress on [that](https://github.com/apache/airflow/issues/23538) issue.
   
   Thanks for this PR. I will read this and start focussing on completing [this](https://github.com/apache/airflow/issues/23538) issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894749991


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)
+                        tail = temp_f.read().decode()
+                    try:
+                        two_last_lines = tail.splitlines()[-2:]
+                        previous_no_ansi_line = escape_ansi(two_last_lines[0])
+                        m = matcher.match(previous_no_ansi_line)
+                        if m:
+                            get_console().print(f"[info]{self.title}:[/] {m.group(1).strip()}")
+                            print(f"\r{two_last_lines[0]}\r")
+                            print(f"\r{two_last_lines[1]}\r")
+                    except IndexError:
+                        pass
+                except OSError as e:
+                    if e.errno == errno.EINVAL:
+                        pass
+                    else:
+                        raise
+            sleep(5)
+
+    def stop(self):
+        self._stop_event.set()
+
+    def stopped(self):
+        return self._stop_event.is_set()
+
+
+def escape_ansi(line):
+    ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]')
+    return ansi_escape.sub('', line)
+
+
+def run_with_progress(
+    cmd: List[str],
+    env_variables: Dict[str, str],
+    test_type: str,
+    python: str,
+    backend: str,
+    version: str,
+    verbose: bool,
+    dry_run: bool,
+) -> RunCommandResult:
+    title = f"Running tests: {test_type}, Python: {python}, Backend: {backend}:{version}"
+    try:
+        with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as f:
+            get_console().print(f"[info]Starting test = {title}[/]")
+            thread = MonitoringThread(title=title, file_name=f.name)
+            thread.start()
+            try:
+                result = run_command(
+                    cmd,
+                    verbose=verbose,
+                    dry_run=dry_run,
+                    env=env_variables,
+                    check=False,
+                    stdout=f,
+                    stderr=subprocess.STDOUT,

Review Comment:
   It is in captured in the same file. The stderr = `subprocess.STDOUT` means literally "send the output to the same place where stdout is sent". If you want to send it to stdout you would like to send it to `sys.stdout` instead. 
   
   See https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.stdout
   
   > If you ran the process with stderr=subprocess.STDOUT, stdout and stderr will be combined in this attribute, and [stderr](https://docs.python.org/3/library/subprocess.html#subprocess.CompletedProcess.stderr) will be None.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894117097


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)
+                        tail = temp_f.read().decode()
+                    try:
+                        two_last_lines = tail.splitlines()[-2:]
+                        previous_no_ansi_line = escape_ansi(two_last_lines[0])
+                        m = matcher.match(previous_no_ansi_line)
+                        if m:
+                            get_console().print(f"[info]{self.title}:[/] {m.group(1).strip()}")
+                            print(f"\r{two_last_lines[0]}\r")
+                            print(f"\r{two_last_lines[1]}\r")
+                    except IndexError:
+                        pass
+                except OSError as e:
+                    if e.errno == errno.EINVAL:
+                        pass
+                    else:
+                        raise
+            sleep(5)
+
+    def stop(self):
+        self._stop_event.set()
+
+    def stopped(self):
+        return self._stop_event.is_set()
+
+
+def escape_ansi(line):
+    ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]')
+    return ansi_escape.sub('', line)
+
+
+def run_with_progress(
+    cmd: List[str],
+    env_variables: Dict[str, str],
+    test_type: str,
+    python: str,
+    backend: str,
+    version: str,
+    verbose: bool,
+    dry_run: bool,
+) -> RunCommandResult:
+    title = f"Running tests: {test_type}, Python: {python}, Backend: {backend}:{version}"
+    try:
+        with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as f:
+            get_console().print(f"[info]Starting test = {title}[/]")
+            thread = MonitoringThread(title=title, file_name=f.name)
+            thread.start()
+            try:
+                result = run_command(
+                    cmd,
+                    verbose=verbose,
+                    dry_run=dry_run,
+                    env=env_variables,
+                    check=False,
+                    stdout=f,
+                    stderr=subprocess.STDOUT,
+                )
+            finally:
+                thread.stop()
+                thread.join()
+        with ci_group(
+            f"Result of {title}", message_type="[success]" if result.returncode == 0 else "[error]"
+        ):
+            with open(f.name) as f:
+                shutil.copyfileobj(f, sys.stdout)

Review Comment:
   @potiuk Why do we need to copy file to stdout rather than simply printing the file content out?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894758526


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -149,11 +268,39 @@ def tests(
         os.environ["LIST_OF_INTEGRATION_TESTS_TO_RUN"] = ' '.join(list(integration))
     if db_reset:
         os.environ["DB_RESET"] = "true"
-
-    exec_shell_params = ShellParams(verbose=verbose, dry_run=dry_run)
+    exec_shell_params = ShellParams(
+        verbose=verbose,
+        dry_run=dry_run,
+        python=python,
+        backend=backend,
+        postgres_version=postgres_version,
+        mysql_version=mysql_version,
+        mssql_version=mssql_version,
+    )
     env_variables = get_env_variables_for_docker_commands(exec_shell_params)
     perform_environment_checks(verbose=verbose)
     cmd = ['docker-compose', 'run', '--service-ports', '--rm', 'airflow']
     cmd.extend(list(extra_pytest_args))
-    result = run_command(cmd, verbose=verbose, dry_run=dry_run, env=env_variables, check=False)
+    version = (
+        mssql_version
+        if backend == "mssql"
+        else mysql_version
+        if backend == "mysql"
+        else postgres_version
+        if backend == "postgres"
+        else "none"
+    )
+    if limit_progress_output:
+        result = run_with_progress(
+            cmd=cmd,
+            env_variables=env_variables,
+            test_type=test_type,
+            python=python,
+            backend=backend,
+            version=version,
+            verbose=verbose,
+            dry_run=dry_run,
+        )
+    else:

Review Comment:
   It's not the "limit progess output". It's the "test" command that missed it. The version of the tests command we had just did not have those parameters and always used defaults. Which was fine to run quick testing locally but if we are going to use "tests" command in CI, we need to test many combinations of tests (i.e. running mysql version 8 on Python 3.7). or running (Postgres 10 on python 3.10). That's why we need to specify those as parameters here (and that's why I added it to ShellParams - because they will determine which CI image (python 3.7 - 3.10) will be used for the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1148445653

   > @potiuk I didn't make much progress in the issue #23538 as I was stuck on what I have to do next in the PR to proceed further. So I started focussing on other PR's and missed updating any progress on [that](https://github.com/apache/airflow/issues/23538) issue.
   > 
   > Thanks for this PR. I will read this and start focussing on completing [this](https://github.com/apache/airflow/issues/23538) issue.
   
   Yep. I just implemented it as I thought it might be useful (and got some fun while doing it :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add limited progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1146893943

   cc: @Bowrna  -> here is a part of the "parallel" test implementation. Not yet complete, but shoudl be closer to implement running tests in parallel and should be easy to parallelise it with #23715 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1154493917

   Need it to improve testing speed in CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r897293582


##########
dev/breeze/src/airflow_breeze/commands/configuration_and_maintenance_commands.py:
##########
@@ -157,9 +157,9 @@ def cleanup(verbose: bool, dry_run: bool, github_repository: str, all: bool, ans
         )
         images = command_result.stdout.splitlines() if command_result and command_result.stdout else []
         if images:
-            get_console().print("[light_blue]Removing images:[/]")
+            get_console().print("[info]Removing images:[/]")

Review Comment:
   Hmm. I think in most cases "[info]...." is a bit better than f"{INFO.value}....." + import :).
   
   The only place where it makes sense to use enum is where it is programmatically passed (as it is in ci_group as parameter), then it's worth to use enum  :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1150458336

   PR Looking for some approval here :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add limited progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1147240209

   Last version has a bit better-suited output for Parallel CI tests:
   
   ![Screenshot from 2022-06-06 11-17-28](https://user-images.githubusercontent.com/595491/172132987-33d37d1e-838a-4ac2-a670-5563a305b2d5.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1152776485

   An approval would be nice :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896861965


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -122,10 +226,19 @@ def docker_compose_tests(
 )
 @option_dry_run
 @option_verbose
+@option_python
+@option_backend
+@option_postgres_version
+@option_mysql_version
+@option_mssql_version
 @option_integration
+@click.option(
+    '--limit-progress-output',
+    help="Limit progress to percentage only and just show the summary when tests complete.",
+    is_flag=True,
+)
 @click.argument('extra_pytest_args', nargs=-1, type=click.UNPROCESSED)
 @click.option(
-    "-tt",

Review Comment:
   It was a mistake :). Generally the direction I steer breeze to  is that longer options are preferred, and shorter ones are only for things that you are using very, very often. Short options are generally evil IMHO and they are only justified if you use them enough time that tyou want to get muscle memory kick in. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894755846


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)

Review Comment:
   Beause I want to find last two lines in the file very efficiently without reading the whole file in memory and without loosing time on reading the whole file. from the beginning (it can be 20-30MB file so it might take few seconds to read it and it will consume I/O and memory)
   
   I really do not want to read the whole file to find the last two lines - this is all I care about and I can survive if I won't succeed - at worst I will not print progress. No big deal.. And it's generally a difficult problem to solve, because in order to be sure you'd have to read the whole file to be absolutely sure to print last two lines. For example if you have big file with two lines - each of them 10000 chars, you would have to print 20000 characters to prin the line.
   
   So I am cheating "a bit" here. I assume the line is not longer than X. I go to the END of the file (SEEK.END), then i go back 2 *X and read only those maximum 2*X characters. This is much faster and takes much less memory than reading the whole file nd being "sure" on the other hand, I KNOW our lines are not too long in most cases and actually I only care  about two last lines if they contain Pytest output:
   
   ```
   test1 ................................................................................ [20%]
   test2 ......................................................................
   ```
   
   So I KNOW the lines I am interested at are not long. And in case I have longer lines - too bad (the output does nothing if ti does not find the [X%] in the output.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894113605


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)
+                        tail = temp_f.read().decode()
+                    try:
+                        two_last_lines = tail.splitlines()[-2:]
+                        previous_no_ansi_line = escape_ansi(two_last_lines[0])
+                        m = matcher.match(previous_no_ansi_line)
+                        if m:
+                            get_console().print(f"[info]{self.title}:[/] {m.group(1).strip()}")
+                            print(f"\r{two_last_lines[0]}\r")
+                            print(f"\r{two_last_lines[1]}\r")
+                    except IndexError:
+                        pass
+                except OSError as e:
+                    if e.errno == errno.EINVAL:
+                        pass
+                    else:
+                        raise
+            sleep(5)
+
+    def stop(self):
+        self._stop_event.set()
+
+    def stopped(self):
+        return self._stop_event.is_set()
+
+
+def escape_ansi(line):
+    ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]')
+    return ansi_escape.sub('', line)
+
+
+def run_with_progress(
+    cmd: List[str],
+    env_variables: Dict[str, str],
+    test_type: str,
+    python: str,
+    backend: str,
+    version: str,
+    verbose: bool,
+    dry_run: bool,
+) -> RunCommandResult:
+    title = f"Running tests: {test_type}, Python: {python}, Backend: {backend}:{version}"
+    try:
+        with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as f:
+            get_console().print(f"[info]Starting test = {title}[/]")
+            thread = MonitoringThread(title=title, file_name=f.name)
+            thread.start()
+            try:
+                result = run_command(
+                    cmd,
+                    verbose=verbose,
+                    dry_run=dry_run,
+                    env=env_variables,
+                    check=False,
+                    stdout=f,
+                    stderr=subprocess.STDOUT,

Review Comment:
   @potiuk the `stderr` part is sent to STDOUT and not captured in the file right? So if there is an error in the process, we print directly rather than capturing from the file. Am I right in understanding that part?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r895174787


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)

Review Comment:
   This is a great tip. I will save this and use it when needed :) Thanks @potiuk 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894747966


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)
+                        tail = temp_f.read().decode()
+                    try:
+                        two_last_lines = tail.splitlines()[-2:]
+                        previous_no_ansi_line = escape_ansi(two_last_lines[0])
+                        m = matcher.match(previous_no_ansi_line)
+                        if m:
+                            get_console().print(f"[info]{self.title}:[/] {m.group(1).strip()}")
+                            print(f"\r{two_last_lines[0]}\r")
+                            print(f"\r{two_last_lines[1]}\r")
+                    except IndexError:
+                        pass
+                except OSError as e:
+                    if e.errno == errno.EINVAL:
+                        pass
+                    else:
+                        raise
+            sleep(5)
+
+    def stop(self):
+        self._stop_event.set()
+
+    def stopped(self):
+        return self._stop_event.is_set()
+
+
+def escape_ansi(line):
+    ansi_escape = re.compile(r'(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]')
+    return ansi_escape.sub('', line)
+
+
+def run_with_progress(
+    cmd: List[str],
+    env_variables: Dict[str, str],
+    test_type: str,
+    python: str,
+    backend: str,
+    version: str,
+    verbose: bool,
+    dry_run: bool,
+) -> RunCommandResult:
+    title = f"Running tests: {test_type}, Python: {python}, Backend: {backend}:{version}"
+    try:
+        with tempfile.NamedTemporaryFile(mode='w+t', delete=False) as f:
+            get_console().print(f"[info]Starting test = {title}[/]")
+            thread = MonitoringThread(title=title, file_name=f.name)
+            thread.start()
+            try:
+                result = run_command(
+                    cmd,
+                    verbose=verbose,
+                    dry_run=dry_run,
+                    env=env_variables,
+                    check=False,
+                    stdout=f,
+                    stderr=subprocess.STDOUT,
+                )
+            finally:
+                thread.stop()
+                thread.join()
+        with ci_group(
+            f"Result of {title}", message_type="[success]" if result.returncode == 0 else "[error]"
+        ):
+            with open(f.name) as f:
+                shutil.copyfileobj(f, sys.stdout)

Review Comment:
   The `copyfileobj` sends directly content of the file to stdeout without any buffering and reading it to memory. 
   If you read the content to a string and print it -  it will be loaded in memory first. the output can be potentially very long (20-30MB) and when you load it to string - this is how much memory you wil use. And we already had a lot of troubles with our tests exceeding memory available to run tests (that's why some tests are not run when we have mssql or mysql database).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1155539365

   The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Bowrna commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
Bowrna commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r894111895


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -112,6 +129,93 @@ def docker_compose_tests(
     sys.exit(return_code)
 
 
+class MonitoringThread(Thread):
+    """Thread class with a stop() method. The thread itself has to check
+    regularly for the stopped() condition."""
+
+    def __init__(self, title: str, file_name: str):
+        super().__init__(target=self.peek_percent_at_last_lines_of_file, daemon=True)
+        self._stop_event = Event()
+        self.title = title
+        self.file_name = file_name
+
+    def peek_percent_at_last_lines_of_file(self) -> None:
+        max_line_length = 400
+        matcher = re.compile(r"^.*\[([^\]]*)\]$")
+        while not self.stopped():
+            if os.path.exists(self.file_name):
+                try:
+                    with open(self.file_name, 'rb') as temp_f:
+                        temp_f.seek(-(max_line_length * 2), os.SEEK_END)

Review Comment:
   @potiuk Could you explain me why you are doing this part? I can understand that you are trying to read the last lines of file, but i don't understand why you are passing `-(max_line_length * 2)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896893089


##########
dev/breeze/src/airflow_breeze/utils/ci_group.py:
##########
@@ -22,7 +22,7 @@
 
 
 @contextmanager
-def ci_group(title: str, enabled: bool = True):
+def ci_group(title: str, enabled: bool = True, message_type: str = "[info]"):

Review Comment:
   Fixed it and actually even converted it into an Enum :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] josh-fell commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
josh-fell commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r896867006


##########
dev/breeze/src/airflow_breeze/commands/testing_commands.py:
##########
@@ -122,10 +226,19 @@ def docker_compose_tests(
 )
 @option_dry_run
 @option_verbose
+@option_python
+@option_backend
+@option_postgres_version
+@option_mysql_version
+@option_mssql_version
 @option_integration
+@click.option(
+    '--limit-progress-output',
+    help="Limit progress to percentage only and just show the summary when tests complete.",
+    is_flag=True,
+)
 @click.argument('extra_pytest_args', nargs=-1, type=click.UNPROCESSED)
 @click.option(
-    "-tt",

Review Comment:
   Right on, makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24236: Add CI-friendly progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24236:
URL: https://github.com/apache/airflow/pull/24236#discussion_r897293582


##########
dev/breeze/src/airflow_breeze/commands/configuration_and_maintenance_commands.py:
##########
@@ -157,9 +157,9 @@ def cleanup(verbose: bool, dry_run: bool, github_repository: str, all: bool, ans
         )
         images = command_result.stdout.splitlines() if command_result and command_result.stdout else []
         if images:
-            get_console().print("[light_blue]Removing images:[/]")
+            get_console().print("[info]Removing images:[/]")

Review Comment:
   Hmm. I think in most cases "[info]...." is a bit better than f"{INFO.value}....." + import :).
   
   The only place where it makes sense to use enum is where it is programmatically passed (as it is in ci_group as parameter, then it's worth to use enum  :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add limited progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1146894563

   Example output:
   
   <img width="1394" alt="Screenshot 2022-06-06 at 00 24 21" src="https://user-images.githubusercontent.com/595491/172073072-940c2ff6-1d9d-411b-9d20-48d8b26f6964.png">
   
   * Initially only the single lines are printed with "progress" (44%) 
   * but when the whole tests complete, the output of tests is printed in the "foldable" group (In this case "Result of ...." ) 
   * in the output you see a  "green" line, but in CI it will be a "folded" group of output (same as in the current tests)
   * the color of the group is "green" when tests succeed or "red" if they fail (same as in current tests)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24236: Add limited progress output for tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24236:
URL: https://github.com/apache/airflow/pull/24236#issuecomment-1146895154

   Similar output for bigger number of tests when we see how the progress works:
   
   <img width="825" alt="Screenshot 2022-06-06 at 00 30 56" src="https://user-images.githubusercontent.com/595491/172073298-3cbe9cad-7375-4151-8148-d89e5b20be79.png">
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org