You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/28 18:13:43 UTC

[GitHub] [airflow] Taragolis opened a new pull request, #28631: Clear DB between each separate providers tests

Taragolis opened a new pull request, #28631:
URL: https://github.com/apache/airflow/pull/28631

   Follow-up: https://github.com/apache/airflow/pull/28337#issuecomment-1348771336
   
   Some hacks which uses in PR
   1. `module` scope with parsing "provider name" instead of `package` scope. pytest decide that package is `airflow` and run only once
   2. Use separate clear tests helpers instead of `airflow.utils.db.resetdb()`. locally cause some errors with duplicate constraint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058530854


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   Another comment here @Taragolis . Maybe we do not need that at all. Don't we want to do it for ALL packages that are "leaves" - not only for providers? I thin that would be a nice compromise between isolation and speed of execution



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058571297


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   See the videos above (that's the native env).. It is noticeable, a lot. I hate having to wait even 0.5 to run single test with Ctrl+R. Plus some people have auto-rerrunable tests (there are plugins that run tests automatically after saving). We cannot make that experience so much worse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058568341


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   1 second difference with this approach only in breeze in MacOS, in native environment differences even less
   
   ```console
   
   1 passed in 0.21s
   1 passed in 0.23s
   1 passed in 0.24s
   ```
   
   ```console
   
   1 passed in 0.03s
   1 passed in 0.03s
   1 passed in 0.03s
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1367489959

   BTW. I do not see a noticeable impact on the times of execution of the tests ... Which is cool if this is really the case. Are we sure it works :) ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061586884


##########
tests/integration/api/auth/backend/test_kerberos_auth.py:
##########
@@ -46,19 +46,17 @@ def app_for_kerberos():
         yield app.create_app(testing=True)
 
 
-@pytest.fixture(scope="module", autouse=True)
+@pytest.fixture(scope="module")

Review Comment:
   I see. Right. Should be maybe change the name of the class fixture? to `initialize` or something? 
   I was under the impressuion (due to `_set_attrs` name) it was doing something different - and now it is not really setting the attributes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061588331


##########
tests/api/common/test_mark_tasks.py:
##########
@@ -49,7 +49,7 @@ def dagbag():
     from airflow.models.dagbag import DagBag
 
     # Ensure the DAGs we are looking at from the DB are up-to-date
-    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=False)
+    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=True)

Review Comment:
   Yeah. Those changes are needed .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058531880


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   There is also one side-effect that we should take into account here.  This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058530302


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   We can check for presence of provider.yaml. Or better - rely on `generated/provider_dependencies.json`  instead. the provider_dependencies.json is guaranteed to be updated by pre-commit and it is far easier to use than scanning folders, as well as it's intention is to remain stable when we refactor and move the providers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058628201


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   And also remove `yield` in this fixture we do not need any teardown, at least yet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061516214


##########
tests/api/common/test_mark_tasks.py:
##########
@@ -49,7 +49,7 @@ def dagbag():
     from airflow.models.dagbag import DagBag
 
     # Ensure the DAGs we are looking at from the DB are up-to-date
-    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=False)
+    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=True)

Review Comment:
   Some API tests use example DAGs: `example_subdag_operator`, `example_trigger_target_dag` and others.
   Before changes in this PR all example serialised DAGs stored into database so we did not have issues with it but after we start cleanup DB before each module we have.
   
   I'd rather to say this is temporary solution as target we need to use DAGs from:
   - `tests/dags`
   - `tests/dags_corrupted`
   - `tests/dags_with_system_exit`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059076499


##########
tests/conftest.py:
##########
@@ -196,6 +185,12 @@ def pytest_addoption(parser):
         ),
         metavar="COLUMNS",
     )
+    group.addoption(

Review Comment:
   Nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058574499


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   Also tried.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058572636


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > But the same kind of problem happened on multiple occasions for all kinds of tests, not only between providers. The mechanism is the same - data created in one test (in unrelated package) might impact tests from another package. So I think if we attempt to do such isolation, it should be done across the board.
   
   Then I suggest to do in a different way. Because such side effect outside of Providers could happened only when two PR merged and create this stuff
   
   1. Migrate all test to pytest (there is less then 100 modules remaining)
   2. Create separate pytest plugin with fixtures/context managers
   3. Migrate all current tests to use this fixtures, most of them already exists in [conftest.py](https://github.com/apache/airflow/blob/main/tests/conftest.py)
   4. Restrict manually create DAGs, TIs and etc and allow only by fixtures/context managers (more tricky part)
   5. All core functionality should have setup and teardown which cleanup (most of them already have)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058731243


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Would it be a good idea to also clean after the provider tests? I feel like it’s generally a good principle for one thing to clean up one’s own stuffs. We could do
   
   1. Clean when entering the `providers` directory.
   2. Clean after each provider is finished.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1376181227

   Yeah. Let's merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1375637595

   Let's check with **public runners**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059092118


##########
tests/conftest.py:
##########
@@ -88,18 +89,6 @@ def url_safe_serializer(secret_key) -> URLSafeSerializer:
     return URLSafeSerializer(secret_key)
 
 
-@pytest.fixture()

Review Comment:
   Yes, we could



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058561504


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > So one second per run it not a big deal and if developer run all tests in single module or just one test class, it would be even unnoticeable.
   
   One second is a lot for tests that do not require the DB at all - especially if you run them from IDE with re-running single tests after the whole suite is run,  The difference is - immediate green/red vs. waiting a second. I often forget to remove `--with-db-init` when I add it and this happens only for one run - the difference is really noticeable so I remove it as soon as I realise I left it.
   
   Compare those two: 
   
   With db reset:
   
   https://user-images.githubusercontent.com/595491/209865511-d2815c9f-0747-4445-adbe-0318705fbc1d.mov
   
   Without:
   
   https://user-images.githubusercontent.com/595491/209865517-0931689a-3491-43ec-824d-8db14100a775.mov
   
   > All others tests run all together, so if one tests from Core affect another then most probably we notice this in Core section
   
   Correct. But the same kind of problem happened on multiple occasions for all kinds of tests, not only between providers. The mechanism is the same - data created in one test (in unrelated package) might impact tests from another package. So I think if we attempt to do such isolation, it should be done across the board.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058622999


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   You convinced me.
   
   I just tested your solution (My videos were with --with-db-init) and yes. I agree it's not noticeable in PyCharm/IntelliJ, so you are likely right that Flask import and others add a lot. And yeah. Having DB reset in this case for every test is quite OK. It is actually likely that those flask imports cause the constraint problems anyway.
   
   Ttry to notice which one is which (it's visible, but barely).
   
   https://user-images.githubusercontent.com/595491/209874550-732ee92c-43dc-4fc1-b3f7-f0b365b0d281.mov
   
   https://user-images.githubusercontent.com/595491/209874540-7a7e5611-d888-45a8-b8fd-02f8df147385.mov
   
   And it has the nice benefit that it will help the users in many situations where they run single tests as well. This overhead is much smallee that I was used to from `--with-db-init` 
   
   However, I think few changes are needed:
   
   1) have a "clear_all()" in the test_utils.db and using it instead of having the list of methods in conftest.py  - this will help with future maintainability
   
   2) What's even more ... I **really** think we should apply it to all tests and completely apply it to the main conftest.py and do it for all modules - I think with such low overhead we can kill few birds with the same stone and the overall overhead will be miinimal. I just did this in main "conftest.py". I don't think we need to detect any providers at all and run it for **everything**
   
   ```
   @pytest.fixture(scope="module", autouse=True)
   def _clear_db_between_modules(request):
       from tests.test_utils import db
   
       """Clear DB between each separate provider package test runs."""
       print("!!!!!!!!Clearing DB!!!!!!!!")
       db.clear_db_runs()
       db.clear_db_datasets()
       db.clear_db_dags()
       db.clear_db_serialized_dags()
       db.clear_db_sla_miss()
       db.clear_db_pools()
       db.clear_db_connections()
       db.clear_db_variables()
       db.clear_db_dag_code()
       db.clear_db_callbacks()
       db.clear_rendered_ti_fields()
       db.clear_db_import_errors()
       db.clear_db_dag_warnings()
       db.clear_db_xcom()
       db.clear_db_logs()
       db.clear_db_jobs()
       db.clear_db_task_fail()
       db.clear_db_task_reschedule()
       db.clear_dag_specific_permissions()
       db.create_default_connections()
       db.set_default_pool_slots(128)
       yield
   ```
   
   And run it with -s 
   
   ```
   pytest -s tests/cli/ tests/core/
   ```
   
   And it looks like it works as expected:
   
   ```
   tests/cli/commands/test_info_command.py::TestPiiAnonymizer::test_should_remove_pii_from_path !!!!!!!!Clearing DB!!!!!!!!
   PASSED
   tests/cli/commands/test_info_command.py::TestPiiAnonymizer::test_should_remove_pii_from_url[postgresql+psycopg2://postgres:airflow@postgres/airflow-postgresql+psycopg2://p...s:PASSWORD@postgres/airflow] PASSED
   tests/cli/commands/test_info_command.py::TestPiiAnonymizer::test_should_remove_pii_from_url[postgresql+psycopg2://postgres@postgres/airflow-postgresql+psycopg2://p...s@postgres/airflow] PASSED
   tests/cli/commands/test_info_command.py::TestPiiAnonymizer::test_should_remove_pii_from_url[postgresql+psycopg2://:airflow@postgres/airflow-postgresql+psycopg2://:PASSWORD@postgres/airflow] PASSED
   tests/cli/commands/test_info_command.py::TestPiiAnonymizer::test_should_remove_pii_from_url[postgresql+psycopg2://postgres/airflow-postgresql+psycopg2://postgres/airflow] PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_airflow_info PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_system_info PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_paths_info PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_tools_info PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_show_info PASSED
   tests/cli/commands/test_info_command.py::TestAirflowInfo::test_show_info_anonymize PASSED
   tests/cli/commands/test_info_command.py::TestInfoCommandMockHttpx::test_show_info_anonymize_fileio PASSED
   tests/cli/commands/test_jobs_command.py::TestCliConfigList::test_should_report_success_for_one_working_scheduler !!!!!!!!Clearing DB!!!!!!!!
   PASSED
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058533615


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   Hm. I thinkwe can do it with simply `db.resetdb()`



##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   Hm. I think we can do it with simply `db.resetdb()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058576658


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   If you want I can take over (but not just yet - I need catch-up with other stuff :D )



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1367213468

   Also add option which disable auto cleanup db. Of course it is not affect explicit clear parts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1371075461

   I am good with how it looks like now :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058531880


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   There is also one side-effect that we should take into account here.  This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).
   
   I think this can be solved by only doing this when we run:
   
   1) for the first time ever  (so some kind of check if db is initialized) 
   2) or always when we run in CI (`CI` env variable set to "true").
   
   
   The 1) would be nice for first-time users - we badly miss it now, the user needs to know that they have to run the tests with `--with-db-init` flag. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058532316


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   I think we could . Just need to scan directory until found one of the folder: `hooks`, `operators`, `sensors`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058578673


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > See the videos above (that's the native env).. It is noticeable, a lot.
   
   I think this happen because Init DB required load Flask and their dependencies.
   All other tests right not use reset db / init db for cleanup their states. Instead of just run direct queries by this module: https://github.com/apache/airflow/blob/main/tests/test_utils/db.py which run pretty fast
   
   I agree that the current solution a bit dumb but the main target right now reduce chance of side-effects between different providers tests, and the cost about 0.2 sec on single test or +0.2 (plus additional for not expensive logic) on single provider package.
   
   A lot of chance that this solution do not required as soon as we change structure of providers tests.
   
   For all tests we should consider to use different approach



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058824123


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Another con of cleaning after the test - this is rarely used, but potentially not cleaning the DB after the test gives us the luxury that we can easily inspect the DB after the test completes - so for running individual tests, I think leaving the DB intact after the test has some nice debugging property as well.



##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Another con of cleaning after the test - this is rarely used, but potentially not cleaning the DB after the test gives the developer the luxury that we can easily inspect the DB after the test completes - so for running individual tests, I think leaving the DB intact after the test has some nice debugging property as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058823025


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   We still will need to see how much it will impact the overall test time, but I am cautiously optimistic it won't be a lot. And we can move it up to package as well if we find module is too much.
   
   And this approach has really some interesting properties if we do such cleanup unconditionally before every module:
   
   1) We can stop worrying about any DB side-effects outside of the modules. We have all but guarantee that any side-effects will not cross the module boundaries
   
   2) Running all tests in a module is usually what people do - because it is easy. And whenever we see a suspicious side-effect, it will be super-easy to reproduce - just run the whole module - it will run identically in CI and locally because each module is isolated.
   
   3) When a new user checks out and run tests, they will not have to think about initializing the database - similarly when the database has changed. So far we had `--with-db-init` switch to let airflow reset the unit test DB but it was pretty brittle and not many people even knew about its existence. We will be able to remove that now - because every time you run a test you have guarantee to get a fresh DB with the current structure.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061593561


##########
tests/api/common/test_mark_tasks.py:
##########
@@ -49,7 +49,7 @@ def dagbag():
     from airflow.models.dagbag import DagBag
 
     # Ensure the DAGs we are looking at from the DB are up-to-date
-    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=False)
+    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=True)

Review Comment:
   > Also found that if we call DAG.sync_to_db() it no has any affect to database, so I change it to same method in DagBag object, I do not know is it expected or it is some kind of bug.
   
   Not sure about it. But I do not see it to have any noticeable effect on time of running the tests, so let's keep it this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1376132311

   Looks GOOOD. Shall we ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058531880


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   There is also one side-effect that we should take into account here.  This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).
   
   I think this can be solved by only doing this when we run:
   
   1) for the first time ever  (so some kind of check if db is initialized) 
   2) or when we run in CI (`CI` env variable set to "true").
   
   
   The 1) would be nice for first-time users - we badly miss it now, the user needs to know that they have to run the tests with `--with-db-init` flag. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058531880


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   There is also one side-effect that we should take into account here.  This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).
   
   I think this can be solved by only doing this when we run for:
   
   1) the first time at all (so some kind of check if db is initialized) 
   2) or when we run in CI (`CI` env variable set to "true").
   
   
   The 1) would be nice for first-time users - we badly miss it now, the user needs to know that they have to run the tests with `--with-db-init` flag. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058569528


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   I think this is because of some import sequence @ephraimbuddy - does it ring a bell ?. It works for sure in 'airflow  db init` command, so we should be able to implement it here as well.  I think we **must** sort it out, because otherwise it's not future-proof - in case we add new stuff. 
   
   I can help sorting it out when we solve the other comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058576453


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   (BTW. Now you know why I have not tried it **just yet**, I kinda subconsciously expected it won't be super easy). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058601195


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > I hate having to wait even 0.5 s to run single test with Ctrl+R. Plus some people have auto-rerrunable tests (there are plugins that run tests automatically after saving). We cannot make that experience so much worse.
   
   This could be nice debate for many reason.
   
   First of all `pytest` itself started for about 1 second with load all plugins plus outputs might take a lot of resources. 
   
   ```console
   ❯ time pytest --help 
   ...
   pytest --help  1.21s user 0.21s system 76% cpu 1.851 total
   
   ❯ time pytest --help
   ...
   pytest --help  0.58s user 0.11s system 98% cpu 0.697 total
   
   ❯ time pytest --help
   ...
   pytest --help  0.59s user 0.12s system 87% cpu 0.816 total
   ```
   
   PyCharm also could add latency. Hopefully we not enable any coverage in adopts because PyCharm/IDEA could add additional 5-10 seconds per run in this case.
   
   Personally most of the time I spend in debugger when I try to check is it work as expected or try to figure out why it not working. And usually rerun all the module tests just for sure that last change not break anything new.
   
   As well as if we consider **X** it is a time for create something new then time for cover by tests plus fixing become **X * 5** with most of the time siting with hand on the face and told "Why on Earth you don't work as expected" or "This concept is garbage I need to rewrite my code because tests shows me not everything might work well".
   
   But this is a personal as well as all times only related to my setup, someone who has old laptop could additionally wait more than 0.2 second



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058626222


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   I could check also `request` fixture tomorrow, I thought it should contain session information, such as how many tests collected and if it equal (or less, I don't know how it possible) `1` then do not cleanup anything.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059092039


##########
tests/integration/api/auth/backend/test_kerberos_auth.py:
##########
@@ -46,19 +46,17 @@ def app_for_kerberos():
         yield app.create_app(testing=True)
 
 
-@pytest.fixture(scope="module", autouse=True)
+@pytest.fixture(scope="module")

Review Comment:
   I move this usage into class fixture.
   
   ```python
       def _set_attrs(self, app_for_kerberos, dagbag_to_db):
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058557413


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   That what I tried initially
   
   > Use separate clear tests helpers instead of airflow.utils.db.resetdb(). locally cause some errors with duplicate constraints.
   
   <details>
     <summary>Long error traceback under the spoiler</summary>
   
   ``` console
   ❯ breeze --python 3.7 --backend postgres shell --db-reset 
   
   root@ba342530c0bc:/opt/airflow# pytest tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated 
   ================================================================================== test session starts ==================================================================================
   platform linux -- Python 3.7.16, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow, configfile: pytest.ini
   plugins: cov-4.0.0, asyncio-0.20.3, rerunfailures-9.1.1, instafail-0.4.2, anyio-3.6.2, timeouts-1.2.1, xdist-3.1.0, requests-mock-1.10.0, capture-warnings-0.0.4, httpx-0.21.2, time-machine-2.8.2
   asyncio: mode=strict
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 2 items                                                                                                                                                                       
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute ERROR                                                                                           [ 50%]
   tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated ERROR                                                                                    [100%]
   
   ======================================================================================== ERRORS =========================================================================================
   ___________________________________________________________________ ERROR at setup of TestDockerOperator.test_execute ___________________________________________________________________
   
   self = <sqlalchemy.engine.base.Connection object at 0xffff7dd28d50>, dialect = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>
   constructor = <bound method DefaultExecutionContext._init_ddl of <class 'sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2'>>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}, execution_options = immutabledict({'autocommit': True})
   args = (<sqlalchemy.dialects.postgresql.base.PGDDLCompiler object at 0xffff7dd36790>,), kw = {}, branched = <sqlalchemy.engine.base.Connection object at 0xffff7dd28d50>, yp = None
   conn = <sqlalchemy.pool.base._ConnectionFairy object at 0xffff7dd86190>, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   cursor = <cursor object at 0xffff7df4de50; closed: -1>, evt_handled = False
   
       def _execute_context(
           self,
           dialect,
           constructor,
           statement,
           parameters,
           execution_options,
           *args,
           **kw
       ):
           """Create an :class:`.ExecutionContext` and execute, returning
           a :class:`_engine.CursorResult`."""
       
           branched = self
           if self.__branch_from:
               # if this is a "branched" connection, do everything in terms
               # of the "root" connection, *except* for .close(), which is
               # the only feature that branching provides
               self = self.__branch_from
       
           if execution_options:
               yp = execution_options.get("yield_per", None)
               if yp:
                   execution_options = execution_options.union(
                       {"stream_results": True, "max_row_buffer": yp}
                   )
       
           try:
               conn = self._dbapi_connection
               if conn is None:
                   conn = self._revalidate_connection()
       
               context = constructor(
                   dialect, self, conn, execution_options, *args, **kw
               )
           except (exc.PendingRollbackError, exc.ResourceClosedError):
               raise
           except BaseException as e:
               self._handle_dbapi_exception(
                   e, util.text_type(statement), parameters, None, None
               )
       
           if (
               self._transaction
               and not self._transaction.is_active
               or (
                   self._nested_transaction
                   and not self._nested_transaction.is_active
               )
           ):
               self._invalid_transaction()
       
           elif self._trans_context_manager:
               TransactionalContext._trans_ctx_check(self)
       
           if self._is_future and self._transaction is None:
               self._autobegin()
       
           context.pre_exec()
       
           if dialect.use_setinputsizes:
               context._set_input_sizes()
       
           cursor, statement, parameters = (
               context.cursor,
               context.statement,
               context.parameters,
           )
       
           if not context.executemany:
               parameters = parameters[0]
       
           if self._has_events or self.engine._has_events:
               for fn in self.dispatch.before_cursor_execute:
                   statement, parameters = fn(
                       self,
                       cursor,
                       statement,
                       parameters,
                       context,
                       context.executemany,
                   )
       
           if self._echo:
       
               self._log_info(statement)
       
               stats = context._get_cache_stats()
       
               if not self.engine.hide_parameters:
                   self._log_info(
                       "[%s] %r",
                       stats,
                       sql_util._repr_params(
                           parameters, batches=10, ismulti=context.executemany
                       ),
                   )
               else:
                   self._log_info(
                       "[%s] [SQL parameters hidden due to hide_parameters=True]"
                       % (stats,)
                   )
       
           evt_handled = False
           try:
               if context.executemany:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_executemany:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_executemany(
                           cursor, statement, parameters, context
                       )
               elif not parameters and context.no_parameters:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute_no_params:
                           if fn(cursor, statement, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute_no_params(
                           cursor, statement, context
                       )
               else:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute(
   >                       cursor, statement, parameters, context
                       )
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7df4de50; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       psycopg2.errors.DuplicateTable: relation "idx_ab_user_username" already exists
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: DuplicateTable
   
   The above exception was the direct cause of the following exception:
   
   request = <SubRequest '_clear_db_between_providers_tests' for <Function test_execute>>
   
       @pytest.fixture(scope="module", autouse=True)
       def _clear_db_between_providers_tests(request):
           """Clear DB between each separate provider package test runs."""
           # from tests.test_utils import db
       
           provider_name = get_test_provider_name(request.module)
           if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
               _CLEAR_DB_PROVIDERS.add(provider_name)
               from airflow.utils.db import resetdb
   >           resetdb()
   
   tests/providers/conftest.py:68: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   airflow/utils/session.py:75: in wrapper
       return func(*args, session=session, **kwargs)
   airflow/utils/db.py:1619: in resetdb
       initdb(session=session)
   airflow/utils/session.py:72: in wrapper
       return func(*args, **kwargs)
   airflow/utils/db.py:709: in initdb
       _create_db_from_orm(session=session)
   airflow/utils/db.py:692: in _create_db_from_orm
       Base.metadata.create_all(settings.engine)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py:4931: in create_all
       ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3228: in _run_ddl_visitor
       conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2211: in _run_ddl_visitor
       visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:855: in visit_metadata
       _is_metadata_operation=True,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:907: in visit_table
       self.traverse_single(index, create_ok=True)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:938: in visit_index
       self.connection.execute(CreateIndex(index))
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1380: in execute
       return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:81: in _execute_on_connection
       self, multiparams, params, execution_options
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1478: in _execute_ddl
       compiled,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1944: in _execute_context
       e, statement, parameters, cursor, context
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2125: in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   /usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py:211: in raise_
       raise exception
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: in _execute_context
       cursor, statement, parameters, context
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7df4de50; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_username" already exists
   E       
   E       [SQL: CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))]
   E       (Background on this error at: https://sqlalche.me/e/14/f405)
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: ProgrammingError
   --------------------------------------------------------------------------------- Captured stdout setup ---------------------------------------------------------------------------------
   ========================= AIRFLOW ==========================
   Home of the user: /root
   Airflow home /root/airflow
   Initializing the DB - first time after entering the container.
   You can force re-initialization the database by adding --with-db-init switch to run-tests.
   [2022-12-28 19:49:00,870] {db.py:1608} INFO - Dropping tables that exist
   [2022-12-28 19:49:01,287] {migration.py:205} INFO - Context impl PostgresqlImpl.
   [2022-12-28 19:49:01,287] {migration.py:212} INFO - Will assume transactional DDL.
   [2022-12-28 19:49:01,292] {migration.py:205} INFO - Context impl PostgresqlImpl.
   [2022-12-28 19:49:01,292] {migration.py:212} INFO - Will assume transactional DDL.
   --------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Running stamp_revision  -> 290244fb8b83
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_tag>, delete_tag already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_tag>, create_tag already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered for DAG: example_dag_decorator
   WARNI [airflow.task.operators] Dependency <Task(EmailOperator): send_email>, prepare_email already registered for DAG: example_dag_decorator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.www.fab_security.manager] No user yet created, use flask fab command to do it.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   ---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
   INFO     airflow.utils.db:db.py:1608 Dropping tables that exist
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   ____________________________________________________________ ERROR at setup of TestSlackHook.test_token_property_deprecated _____________________________________________________________
   
   self = <sqlalchemy.engine.base.Connection object at 0xffff7e0a8e90>, dialect = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>
   constructor = <bound method DefaultExecutionContext._init_ddl of <class 'sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2'>>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}, execution_options = immutabledict({'autocommit': True})
   args = (<sqlalchemy.dialects.postgresql.base.PGDDLCompiler object at 0xffff7dd8a550>,), kw = {}, branched = <sqlalchemy.engine.base.Connection object at 0xffff7e0a8e90>, yp = None
   conn = <sqlalchemy.pool.base._ConnectionFairy object at 0xffff7ee13810>, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   cursor = <cursor object at 0xffff7ed00550; closed: -1>, evt_handled = False
   
       def _execute_context(
           self,
           dialect,
           constructor,
           statement,
           parameters,
           execution_options,
           *args,
           **kw
       ):
           """Create an :class:`.ExecutionContext` and execute, returning
           a :class:`_engine.CursorResult`."""
       
           branched = self
           if self.__branch_from:
               # if this is a "branched" connection, do everything in terms
               # of the "root" connection, *except* for .close(), which is
               # the only feature that branching provides
               self = self.__branch_from
       
           if execution_options:
               yp = execution_options.get("yield_per", None)
               if yp:
                   execution_options = execution_options.union(
                       {"stream_results": True, "max_row_buffer": yp}
                   )
       
           try:
               conn = self._dbapi_connection
               if conn is None:
                   conn = self._revalidate_connection()
       
               context = constructor(
                   dialect, self, conn, execution_options, *args, **kw
               )
           except (exc.PendingRollbackError, exc.ResourceClosedError):
               raise
           except BaseException as e:
               self._handle_dbapi_exception(
                   e, util.text_type(statement), parameters, None, None
               )
       
           if (
               self._transaction
               and not self._transaction.is_active
               or (
                   self._nested_transaction
                   and not self._nested_transaction.is_active
               )
           ):
               self._invalid_transaction()
       
           elif self._trans_context_manager:
               TransactionalContext._trans_ctx_check(self)
       
           if self._is_future and self._transaction is None:
               self._autobegin()
       
           context.pre_exec()
       
           if dialect.use_setinputsizes:
               context._set_input_sizes()
       
           cursor, statement, parameters = (
               context.cursor,
               context.statement,
               context.parameters,
           )
       
           if not context.executemany:
               parameters = parameters[0]
       
           if self._has_events or self.engine._has_events:
               for fn in self.dispatch.before_cursor_execute:
                   statement, parameters = fn(
                       self,
                       cursor,
                       statement,
                       parameters,
                       context,
                       context.executemany,
                   )
       
           if self._echo:
       
               self._log_info(statement)
       
               stats = context._get_cache_stats()
       
               if not self.engine.hide_parameters:
                   self._log_info(
                       "[%s] %r",
                       stats,
                       sql_util._repr_params(
                           parameters, batches=10, ismulti=context.executemany
                       ),
                   )
               else:
                   self._log_info(
                       "[%s] [SQL parameters hidden due to hide_parameters=True]"
                       % (stats,)
                   )
       
           evt_handled = False
           try:
               if context.executemany:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_executemany:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_executemany(
                           cursor, statement, parameters, context
                       )
               elif not parameters and context.no_parameters:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute_no_params:
                           if fn(cursor, statement, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute_no_params(
                           cursor, statement, context
                       )
               else:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute(
   >                       cursor, statement, parameters, context
                       )
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7ed00550; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       psycopg2.errors.DuplicateTable: relation "idx_ab_user_username" already exists
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: DuplicateTable
   
   The above exception was the direct cause of the following exception:
   
   request = <SubRequest '_clear_db_between_providers_tests' for <Function test_token_property_deprecated>>
   
       @pytest.fixture(scope="module", autouse=True)
       def _clear_db_between_providers_tests(request):
           """Clear DB between each separate provider package test runs."""
           # from tests.test_utils import db
       
           provider_name = get_test_provider_name(request.module)
           if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
               _CLEAR_DB_PROVIDERS.add(provider_name)
               from airflow.utils.db import resetdb
   >           resetdb()
   
   tests/providers/conftest.py:68: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   airflow/utils/session.py:75: in wrapper
       return func(*args, session=session, **kwargs)
   airflow/utils/db.py:1619: in resetdb
       initdb(session=session)
   airflow/utils/session.py:72: in wrapper
       return func(*args, **kwargs)
   airflow/utils/db.py:709: in initdb
       _create_db_from_orm(session=session)
   airflow/utils/db.py:692: in _create_db_from_orm
       Base.metadata.create_all(settings.engine)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py:4931: in create_all
       ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3228: in _run_ddl_visitor
       conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2211: in _run_ddl_visitor
       visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:855: in visit_metadata
       _is_metadata_operation=True,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:907: in visit_table
       self.traverse_single(index, create_ok=True)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:938: in visit_index
       self.connection.execute(CreateIndex(index))
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1380: in execute
       return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:81: in _execute_on_connection
       self, multiparams, params, execution_options
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1478: in _execute_ddl
       compiled,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1944: in _execute_context
       e, statement, parameters, cursor, context
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2125: in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   /usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py:211: in raise_
       raise exception
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: in _execute_context
       cursor, statement, parameters, context
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7ed00550; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_username" already exists
   E       
   E       [SQL: CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))]
   E       (Background on this error at: https://sqlalche.me/e/14/f405)
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: ProgrammingError
   --------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------
   INFO  [airflow.utils.db] Dropping tables that exist
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   ---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
   INFO     airflow.utils.db:db.py:1608 Dropping tables that exist
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   =================================================================================== warnings summary ====================================================================================
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_sensor_decorator.py:64: RemovedInAirflow3Warning: Param `schedule_interval` is deprecated and will be removed in a future release. Please use `schedule` instead. 
       tutorial_etl_dag = example_sensor_decorator()
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_subdag_operator.py:45: RemovedInAirflow3Warning: This class is deprecated. Please use `airflow.utils.task_group.TaskGroup`.
       subdag=subdag(DAG_NAME, "section-1", dag.default_args),
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_subdag_operator.py:54: RemovedInAirflow3Warning: This class is deprecated. Please use `airflow.utils.task_group.TaskGroup`.
       subdag=subdag(DAG_NAME, "section-2", dag.default_args),
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ================================================================================ short test summary info ================================================================================
   ERROR tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute - sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_user...
   ERROR tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated - sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_us...
   ============================================================================= 3 warnings, 2 errors in 8.28s =============================================================================
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058561504


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > So one second per run it not a big deal and if developer run all tests in single module or just one test class, it would be even unnoticeable.
   
   One second is a lot for tests that do not require the DB at all - especially if you run them from IDE with re-running single tests after the whole suite is run,  The difference is - immediate green/red vs. waiting a second. I often forget to remove `--with-db-init` when I add it and this happens only for one run - the difference is really noticeable so I remove it as soon as I realise I left it.
   
   Compare those two: 
   
   With db init:
   
   https://user-images.githubusercontent.com/595491/209865511-d2815c9f-0747-4445-adbe-0318705fbc1d.mov
   
   Without:
   
   https://user-images.githubusercontent.com/595491/209865517-0931689a-3491-43ec-824d-8db14100a775.mov
   
   > All others tests run all together, so if one tests from Core affect another then most probably we notice this in Core section
   
   Correct. But the same kind of problem happened on multiple occasions for all kinds of tests, not only between providers. The mechanism is the same - data created in one test (in unrelated package) might impact tests from another package. So I think if we attempt to do such isolation, it should be done across the board.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061591346


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Yes. We should leave it in.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058533615


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   Hm. I think we can do it with simply `db.resetdb()` (from airflow.utils)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058574439


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   I think som of those imports are needed to make it works (and no, it was not me who implemented it, and yes I also hate it ).



##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   I think some of those imports are needed to make it works (and no, it was not me who implemented it, and yes I also hate it ).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058633670


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > For long term we also need to consider get rid of most of the stuff in `test.test_utils` and replace them by equivalent as fixture/context manager/markers but this better complete after migrate remaining tests to `pytest` and document them.
   > 
   > Because most of the new contributors just copy-paste some stuff from one test to another without any idea how it actually work.
   
   Yep. Full agreement here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1367350124

   Fingers cross. Locally it run without any issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1367197168

   Ok, lets have a look.
   
   I change it and make it as module scoped autouse fixture


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058527580


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   Should/can we generate this list automatically?
   Hard coding has risk that we will forget to update the list when needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058554942


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).
   
   A lot of tests right now clear some part of DB between the run, so I can't say it dramatically affect of single test.
   
   
   Single test with fixture (3 runs) on Postgres in breeze
   ```console
   
   1 passed in 1.95s
   1 passed in 1.96s
   1 passed in 2.00s
   ```
   
   Single test without fixture (3 runs) on Postgres in breeze
   ```console
   
   1 passed in 1.02s
   1 passed in 0.98s
   1 passed in 0.99s
   ```
   
   So one second per run it not a big deal and if developer run all tests in single module or just one test class, it would be even unnoticeable.
   
   > Don't we want to do it for ALL packages that are "leaves" - not only for providers (including all airflow packages)?
   
   Initial problem happen because different providers packages could run separately for each other, e.g. SSH and Oracle and if developer do some stuff wrong we could find it only in main.
   
   All others tests run all together, so if one tests from Core affect another then most probably we notice this in Core section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058561504


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > So one second per run it not a big deal and if developer run all tests in single module or just one test class, it would be even unnoticeable.
   
   One second is a lot for tests that do not require the DB at all - especially if you run them from IDE with re-running single tests after the whole suite is run,  The difference is - immediate green/red vs. waiting a second. I often forget to remove `--with-db-init` when I add it and this happens only for one run - the difference is really noticeable.
   
   Compare those two: 
   
   With db reset:
   
   https://user-images.githubusercontent.com/595491/209865511-d2815c9f-0747-4445-adbe-0318705fbc1d.mov
   
   Without:
   
   https://user-images.githubusercontent.com/595491/209865517-0931689a-3491-43ec-824d-8db14100a775.mov
   
   > All others tests run all together, so if one tests from Core affect another then most probably we notice this in Core section
   
   Correct. But the same kind of problem happened on multiple occasions for all kinds of tests, not only between providers. The mechanism is the same - data created in one test (in unrelated package) might impact tests from another package. So I think if we attempt to do such isolation, it should be done across the board.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058602023


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   We could do it any time. This PR just a temporary solution but unfortunetly some of the temporary solution could work for ages and become permanent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058623426


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   I think it costs us almost nothing and is a **long term** solution if we do it this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058561504


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   
   > So one second per run it not a big deal and if developer run all tests in single module or just one test class, it would be even unnoticeable.
   
   One second is a lot for tests that do not require the DB at all - especially if you run them from IDE with re-running single tests after the whole suite is run,  The difference is - immediate green/red vs. waiting a second. I often forget to remove `--with-db-init` when I add it and this happens only for one run - the difference is really noticeable.
   
   Compare those two: 
   
   With db reset:
   
   https://user-images.githubusercontent.com/595491/209865511-d2815c9f-0747-4445-adbe-0318705fbc1d.mov
   
   Without:
   
   https://user-images.githubusercontent.com/595491/209865517-0931689a-3491-43ec-824d-8db14100a775.mov
   
   > All others tests run all together, so if one tests from Core affect another then most probably we notice this in Core section
   
   But the same kind of problem happened on multiple occasions for all kinds of tests, not only between providers. The mechanism is the same - data created in one test (in unrelated package) might impact tests from another package. So I think if we attempt to do such isolation, it should be done across the board.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058576111


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   > Also tried.
   
   For some reason it works when we run --with-db-init. I think this is a **real** bug that we need to trace if it does not work here.  I recall similar errors raised by our users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058575510


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   > We also have reset_db fixture which not use anywhere, at least I can't find any usage of them
   
   Yep. Noticed it - we can remove it I think while doing it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058824123


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Another con of cleaning - this is rarely used, but not cleaning the DB after the test gives us the luxury that we can easily inspect the DB after the test completes - so for running individual tests, I think leaving the DB intact after the test has some nice debugging property as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061596893


##########
tests/integration/api/auth/backend/test_kerberos_auth.py:
##########
@@ -46,19 +46,17 @@ def app_for_kerberos():
         yield app.create_app(testing=True)
 
 
-@pytest.fixture(scope="module", autouse=True)
+@pytest.fixture(scope="module")

Review Comment:
   Yeah. This is a good point.
   
   BTW we have quite a few tests classes where we use method (function) scoped fixture. And most of them has different name, so might be also a good point to make them more or less consistent. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058530854


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   Another comment here @Taragolis . Maybe we do not need that at all. Don't we want to do it for ALL packages that are "leaves" - not only for providers (including all airflow packages)? I think that would be a nice compromise between isolation and speed of execution



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058574076


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   This one works when we run `--with-db-init` so maybe you should simply just use this command @Taragolis  (from our conftest.py). 
   
   ```
   def initial_db_init():
       from flask import Flask
   
       from airflow.configuration import conf
       from airflow.utils import db
       from airflow.www.app import sync_appbuilder_roles
       from airflow.www.extensions.init_appbuilder import init_appbuilder
   
       db.resetdb()
       db.bootstrap_dagbag()
       # minimal app to add roles
       flask_app = Flask(__name__)
       flask_app.config["SQLALCHEMY_DATABASE_URI"] = conf.get("database", "SQL_ALCHEMY_CONN")
       init_appbuilder(flask_app)
       sync_appbuilder_roles(flask_app)
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058571297


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   See the videos above (that's the native env).. It is noticeable, a lot. I hate having to wait even 0.5 s to run single test with Ctrl+R. Plus some people have auto-rerrunable tests (there are plugins that run tests automatically after saving). We cannot make that experience so much worse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058557413


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   That what I tried initially
   
   > Use separate clear tests helpers instead of airflow.utils.db.resetdb(). locally cause some errors with duplicate constraints.
   
   ``` console
   ❯ breeze --python 3.7 --backend postgres shell --db-reset 
   
   root@ba342530c0bc:/opt/airflow# pytest tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated 
   ================================================================================== test session starts ==================================================================================
   platform linux -- Python 3.7.16, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow, configfile: pytest.ini
   plugins: cov-4.0.0, asyncio-0.20.3, rerunfailures-9.1.1, instafail-0.4.2, anyio-3.6.2, timeouts-1.2.1, xdist-3.1.0, requests-mock-1.10.0, capture-warnings-0.0.4, httpx-0.21.2, time-machine-2.8.2
   asyncio: mode=strict
   setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
   collected 2 items                                                                                                                                                                       
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute ERROR                                                                                           [ 50%]
   tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated ERROR                                                                                    [100%]
   
   ======================================================================================== ERRORS =========================================================================================
   ___________________________________________________________________ ERROR at setup of TestDockerOperator.test_execute ___________________________________________________________________
   
   self = <sqlalchemy.engine.base.Connection object at 0xffff7dd28d50>, dialect = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>
   constructor = <bound method DefaultExecutionContext._init_ddl of <class 'sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2'>>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}, execution_options = immutabledict({'autocommit': True})
   args = (<sqlalchemy.dialects.postgresql.base.PGDDLCompiler object at 0xffff7dd36790>,), kw = {}, branched = <sqlalchemy.engine.base.Connection object at 0xffff7dd28d50>, yp = None
   conn = <sqlalchemy.pool.base._ConnectionFairy object at 0xffff7dd86190>, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   cursor = <cursor object at 0xffff7df4de50; closed: -1>, evt_handled = False
   
       def _execute_context(
           self,
           dialect,
           constructor,
           statement,
           parameters,
           execution_options,
           *args,
           **kw
       ):
           """Create an :class:`.ExecutionContext` and execute, returning
           a :class:`_engine.CursorResult`."""
       
           branched = self
           if self.__branch_from:
               # if this is a "branched" connection, do everything in terms
               # of the "root" connection, *except* for .close(), which is
               # the only feature that branching provides
               self = self.__branch_from
       
           if execution_options:
               yp = execution_options.get("yield_per", None)
               if yp:
                   execution_options = execution_options.union(
                       {"stream_results": True, "max_row_buffer": yp}
                   )
       
           try:
               conn = self._dbapi_connection
               if conn is None:
                   conn = self._revalidate_connection()
       
               context = constructor(
                   dialect, self, conn, execution_options, *args, **kw
               )
           except (exc.PendingRollbackError, exc.ResourceClosedError):
               raise
           except BaseException as e:
               self._handle_dbapi_exception(
                   e, util.text_type(statement), parameters, None, None
               )
       
           if (
               self._transaction
               and not self._transaction.is_active
               or (
                   self._nested_transaction
                   and not self._nested_transaction.is_active
               )
           ):
               self._invalid_transaction()
       
           elif self._trans_context_manager:
               TransactionalContext._trans_ctx_check(self)
       
           if self._is_future and self._transaction is None:
               self._autobegin()
       
           context.pre_exec()
       
           if dialect.use_setinputsizes:
               context._set_input_sizes()
       
           cursor, statement, parameters = (
               context.cursor,
               context.statement,
               context.parameters,
           )
       
           if not context.executemany:
               parameters = parameters[0]
       
           if self._has_events or self.engine._has_events:
               for fn in self.dispatch.before_cursor_execute:
                   statement, parameters = fn(
                       self,
                       cursor,
                       statement,
                       parameters,
                       context,
                       context.executemany,
                   )
       
           if self._echo:
       
               self._log_info(statement)
       
               stats = context._get_cache_stats()
       
               if not self.engine.hide_parameters:
                   self._log_info(
                       "[%s] %r",
                       stats,
                       sql_util._repr_params(
                           parameters, batches=10, ismulti=context.executemany
                       ),
                   )
               else:
                   self._log_info(
                       "[%s] [SQL parameters hidden due to hide_parameters=True]"
                       % (stats,)
                   )
       
           evt_handled = False
           try:
               if context.executemany:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_executemany:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_executemany(
                           cursor, statement, parameters, context
                       )
               elif not parameters and context.no_parameters:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute_no_params:
                           if fn(cursor, statement, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute_no_params(
                           cursor, statement, context
                       )
               else:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute(
   >                       cursor, statement, parameters, context
                       )
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7df4de50; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       psycopg2.errors.DuplicateTable: relation "idx_ab_user_username" already exists
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: DuplicateTable
   
   The above exception was the direct cause of the following exception:
   
   request = <SubRequest '_clear_db_between_providers_tests' for <Function test_execute>>
   
       @pytest.fixture(scope="module", autouse=True)
       def _clear_db_between_providers_tests(request):
           """Clear DB between each separate provider package test runs."""
           # from tests.test_utils import db
       
           provider_name = get_test_provider_name(request.module)
           if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
               _CLEAR_DB_PROVIDERS.add(provider_name)
               from airflow.utils.db import resetdb
   >           resetdb()
   
   tests/providers/conftest.py:68: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   airflow/utils/session.py:75: in wrapper
       return func(*args, session=session, **kwargs)
   airflow/utils/db.py:1619: in resetdb
       initdb(session=session)
   airflow/utils/session.py:72: in wrapper
       return func(*args, **kwargs)
   airflow/utils/db.py:709: in initdb
       _create_db_from_orm(session=session)
   airflow/utils/db.py:692: in _create_db_from_orm
       Base.metadata.create_all(settings.engine)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py:4931: in create_all
       ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3228: in _run_ddl_visitor
       conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2211: in _run_ddl_visitor
       visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:855: in visit_metadata
       _is_metadata_operation=True,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:907: in visit_table
       self.traverse_single(index, create_ok=True)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:938: in visit_index
       self.connection.execute(CreateIndex(index))
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1380: in execute
       return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:81: in _execute_on_connection
       self, multiparams, params, execution_options
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1478: in _execute_ddl
       compiled,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1944: in _execute_context
       e, statement, parameters, cursor, context
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2125: in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   /usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py:211: in raise_
       raise exception
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: in _execute_context
       cursor, statement, parameters, context
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7df4de50; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7dd86310>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_username" already exists
   E       
   E       [SQL: CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))]
   E       (Background on this error at: https://sqlalche.me/e/14/f405)
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: ProgrammingError
   --------------------------------------------------------------------------------- Captured stdout setup ---------------------------------------------------------------------------------
   ========================= AIRFLOW ==========================
   Home of the user: /root
   Airflow home /root/airflow
   Initializing the DB - first time after entering the container.
   You can force re-initialization the database by adding --with-db-init switch to run-tests.
   [2022-12-28 19:49:00,870] {db.py:1608} INFO - Dropping tables that exist
   [2022-12-28 19:49:01,287] {migration.py:205} INFO - Context impl PostgresqlImpl.
   [2022-12-28 19:49:01,287] {migration.py:212} INFO - Will assume transactional DDL.
   [2022-12-28 19:49:01,292] {migration.py:205} INFO - Context impl PostgresqlImpl.
   [2022-12-28 19:49:01,292] {migration.py:212} INFO - Will assume transactional DDL.
   --------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Running stamp_revision  -> 290244fb8b83
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_entry_group>, delete_entry_group already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_entry_group>, create_entry_group already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_entry_gcs>, delete_entry already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_entry>, create_entry_gcs already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): create_tag>, delete_tag already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(BashOperator): delete_tag>, create_tag already registered for DAG: example_complex
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): prepare_email>, send_email already registered for DAG: example_dag_decorator
   WARNI [airflow.task.operators] Dependency <Task(EmailOperator): send_email>, prepare_email already registered for DAG: example_dag_decorator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): print_the_context>, log_sql_query already registered for DAG: example_python_operator
   WARNI [airflow.task.operators] Dependency <Task(_PythonDecoratedOperator): log_sql_query>, print_the_context already registered for DAG: example_python_operator
   WARNI [airflow.www.fab_security.manager] No user yet created, use flask fab command to do it.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   ---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
   INFO     airflow.utils.db:db.py:1608 Dropping tables that exist
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   ____________________________________________________________ ERROR at setup of TestSlackHook.test_token_property_deprecated _____________________________________________________________
   
   self = <sqlalchemy.engine.base.Connection object at 0xffff7e0a8e90>, dialect = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>
   constructor = <bound method DefaultExecutionContext._init_ddl of <class 'sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2'>>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}, execution_options = immutabledict({'autocommit': True})
   args = (<sqlalchemy.dialects.postgresql.base.PGDDLCompiler object at 0xffff7dd8a550>,), kw = {}, branched = <sqlalchemy.engine.base.Connection object at 0xffff7e0a8e90>, yp = None
   conn = <sqlalchemy.pool.base._ConnectionFairy object at 0xffff7ee13810>, context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   cursor = <cursor object at 0xffff7ed00550; closed: -1>, evt_handled = False
   
       def _execute_context(
           self,
           dialect,
           constructor,
           statement,
           parameters,
           execution_options,
           *args,
           **kw
       ):
           """Create an :class:`.ExecutionContext` and execute, returning
           a :class:`_engine.CursorResult`."""
       
           branched = self
           if self.__branch_from:
               # if this is a "branched" connection, do everything in terms
               # of the "root" connection, *except* for .close(), which is
               # the only feature that branching provides
               self = self.__branch_from
       
           if execution_options:
               yp = execution_options.get("yield_per", None)
               if yp:
                   execution_options = execution_options.union(
                       {"stream_results": True, "max_row_buffer": yp}
                   )
       
           try:
               conn = self._dbapi_connection
               if conn is None:
                   conn = self._revalidate_connection()
       
               context = constructor(
                   dialect, self, conn, execution_options, *args, **kw
               )
           except (exc.PendingRollbackError, exc.ResourceClosedError):
               raise
           except BaseException as e:
               self._handle_dbapi_exception(
                   e, util.text_type(statement), parameters, None, None
               )
       
           if (
               self._transaction
               and not self._transaction.is_active
               or (
                   self._nested_transaction
                   and not self._nested_transaction.is_active
               )
           ):
               self._invalid_transaction()
       
           elif self._trans_context_manager:
               TransactionalContext._trans_ctx_check(self)
       
           if self._is_future and self._transaction is None:
               self._autobegin()
       
           context.pre_exec()
       
           if dialect.use_setinputsizes:
               context._set_input_sizes()
       
           cursor, statement, parameters = (
               context.cursor,
               context.statement,
               context.parameters,
           )
       
           if not context.executemany:
               parameters = parameters[0]
       
           if self._has_events or self.engine._has_events:
               for fn in self.dispatch.before_cursor_execute:
                   statement, parameters = fn(
                       self,
                       cursor,
                       statement,
                       parameters,
                       context,
                       context.executemany,
                   )
       
           if self._echo:
       
               self._log_info(statement)
       
               stats = context._get_cache_stats()
       
               if not self.engine.hide_parameters:
                   self._log_info(
                       "[%s] %r",
                       stats,
                       sql_util._repr_params(
                           parameters, batches=10, ismulti=context.executemany
                       ),
                   )
               else:
                   self._log_info(
                       "[%s] [SQL parameters hidden due to hide_parameters=True]"
                       % (stats,)
                   )
       
           evt_handled = False
           try:
               if context.executemany:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_executemany:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_executemany(
                           cursor, statement, parameters, context
                       )
               elif not parameters and context.no_parameters:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute_no_params:
                           if fn(cursor, statement, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute_no_params(
                           cursor, statement, context
                       )
               else:
                   if self.dialect._has_events:
                       for fn in self.dialect.dispatch.do_execute:
                           if fn(cursor, statement, parameters, context):
                               evt_handled = True
                               break
                   if not evt_handled:
                       self.dialect.do_execute(
   >                       cursor, statement, parameters, context
                       )
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7ed00550; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       psycopg2.errors.DuplicateTable: relation "idx_ab_user_username" already exists
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: DuplicateTable
   
   The above exception was the direct cause of the following exception:
   
   request = <SubRequest '_clear_db_between_providers_tests' for <Function test_token_property_deprecated>>
   
       @pytest.fixture(scope="module", autouse=True)
       def _clear_db_between_providers_tests(request):
           """Clear DB between each separate provider package test runs."""
           # from tests.test_utils import db
       
           provider_name = get_test_provider_name(request.module)
           if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
               _CLEAR_DB_PROVIDERS.add(provider_name)
               from airflow.utils.db import resetdb
   >           resetdb()
   
   tests/providers/conftest.py:68: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   airflow/utils/session.py:75: in wrapper
       return func(*args, session=session, **kwargs)
   airflow/utils/db.py:1619: in resetdb
       initdb(session=session)
   airflow/utils/session.py:72: in wrapper
       return func(*args, **kwargs)
   airflow/utils/db.py:709: in initdb
       _create_db_from_orm(session=session)
   airflow/utils/db.py:692: in _create_db_from_orm
       Base.metadata.create_all(settings.engine)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/schema.py:4931: in create_all
       ddl.SchemaGenerator, self, checkfirst=checkfirst, tables=tables
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3228: in _run_ddl_visitor
       conn._run_ddl_visitor(visitorcallable, element, **kwargs)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2211: in _run_ddl_visitor
       visitorcallable(self.dialect, self, **kwargs).traverse_single(element)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:855: in visit_metadata
       _is_metadata_operation=True,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:907: in visit_table
       self.traverse_single(index, create_ok=True)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/visitors.py:524: in traverse_single
       return meth(obj, **kw)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:938: in visit_index
       self.connection.execute(CreateIndex(index))
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1380: in execute
       return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
   /usr/local/lib/python3.7/site-packages/sqlalchemy/sql/ddl.py:81: in _execute_on_connection
       self, multiparams, params, execution_options
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1478: in _execute_ddl
       compiled,
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1944: in _execute_context
       e, statement, parameters, cursor, context
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:2125: in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
   /usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py:211: in raise_
       raise exception
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py:1901: in _execute_context
       cursor, statement, parameters, context
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   
   self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0xffff90993a10>, cursor = <cursor object at 0xffff7ed00550; closed: -1>
   statement = 'CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))', parameters = {}
   context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0xffff7e001090>
   
       def do_execute(self, cursor, statement, parameters, context=None):
   >       cursor.execute(statement, parameters)
   E       sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_username" already exists
   E       
   E       [SQL: CREATE UNIQUE INDEX idx_ab_user_username ON ab_user (lower(username))]
   E       (Background on this error at: https://sqlalche.me/e/14/f405)
   
   /usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py:736: ProgrammingError
   --------------------------------------------------------------------------------- Captured stderr setup ---------------------------------------------------------------------------------
   INFO  [airflow.utils.db] Dropping tables that exist
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
   INFO  [alembic.runtime.migration] Will assume transactional DDL.
   ---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
   INFO     airflow.utils.db:db.py:1608 Dropping tables that exist
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   INFO     alembic.runtime.migration:migration.py:205 Context impl PostgresqlImpl.
   INFO     alembic.runtime.migration:migration.py:212 Will assume transactional DDL.
   =================================================================================== warnings summary ====================================================================================
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_sensor_decorator.py:64: RemovedInAirflow3Warning: Param `schedule_interval` is deprecated and will be removed in a future release. Please use `schedule` instead. 
       tutorial_etl_dag = example_sensor_decorator()
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_subdag_operator.py:45: RemovedInAirflow3Warning: This class is deprecated. Please use `airflow.utils.task_group.TaskGroup`.
       subdag=subdag(DAG_NAME, "section-1", dag.default_args),
   
   tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute
     /opt/airflow/airflow/example_dags/example_subdag_operator.py:54: RemovedInAirflow3Warning: This class is deprecated. Please use `airflow.utils.task_group.TaskGroup`.
       subdag=subdag(DAG_NAME, "section-2", dag.default_args),
   
   -- Docs: https://docs.pytest.org/en/stable/warnings.html
   ================================================================================ short test summary info ================================================================================
   ERROR tests/providers/docker/operators/test_docker.py::TestDockerOperator::test_execute - sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_user_user...
   ERROR tests/providers/slack/hooks/test_slack.py::TestSlackHook::test_token_property_deprecated - sqlalchemy.exc.ProgrammingError: (psycopg2.errors.DuplicateTable) relation "idx_ab_us...
   ============================================================================= 3 warnings, 2 errors in 8.28s =============================================================================
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058629909


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   I could check also request fixture tomorrow, I thought it should contain session information, such as how many tests collected and if it equal (or less, I don't know how it possible) 1 then do not cleanup anything.
   
   > I think this is not really needed now - the overhead with using the test_utils classes is really low indeed. I believe we should enable it for all tests and compare execution time on CI and if it is < 5% overhead in total or so, we can leave it on for all modules.



##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   > I could check also request fixture tomorrow, I thought it should contain session information, such as how many tests collected and if it equal (or less, I don't know how it possible) 1 then do not cleanup anything.
   
   I think this is not really needed now - the overhead with using the test_utils classes is really low indeed. I believe we should enable it for all tests and compare execution time on CI and if it is < 5% overhead in total or so, we can leave it on for all modules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1370748325

   @potiuk I removed all optional changes and kept only related to this PR: autofixture and changes in API tests + Kerberos Integration
   
   Also found that if we call `DAG.sync_to_db()` it no has any affect to database, so I change it to same method in DagBag object, I do not know is it expected or it is some kind of bug.
   
   Example of changes
   
   ```patch
   --- a/tests/api/client/test_local_client.py
   +++ b/tests/api/client/test_local_client.py
   @@ -44,7 +44,7 @@
    class TestLocalClient:
        @classmethod
        def setup_class(cls):
   -        DagBag(example_bash_operator.__file__).get_dag("example_bash_operator").sync_to_db()
   +        DagBag(example_bash_operator.__file__, include_examples=False).sync_to_db()
    
        def setup_method(self):
            clear_db_pools()
   ```
   
   Previously this part work because we do not clear this part of metadatabase in API tests and tests use initial database state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061498720


##########
tests/api/common/test_mark_tasks.py:
##########
@@ -49,7 +49,7 @@ def dagbag():
     from airflow.models.dagbag import DagBag
 
     # Ensure the DAGs we are looking at from the DB are up-to-date
-    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=False)
+    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=True)

Review Comment:
   Why set i`nclude_examples=True`? Won't it cost some more time?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059078347


##########
tests/conftest.py:
##########
@@ -88,18 +89,6 @@ def url_safe_serializer(secret_key) -> URLSafeSerializer:
     return URLSafeSerializer(secret_key)
 
 
-@pytest.fixture()

Review Comment:
   Maybe that one also could be better in a separate PR (for trackability)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058633336


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   For long term we also need to consider get rid of most of the stuff in `test.test_utils` and replace them by equivalent as fixture/context manager/markers but this better complete after migrate remaining tests to `pytest` and document them.
   
   Because most of the new contributors just copy-paste some stuff from one test to another without any idea how it actually work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis merged pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis merged PR #28631:
URL: https://github.com/apache/airflow/pull/28631


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058731243


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   Would it be a good idea to also clean after the provider tests? I feel like it’s generally a good principle for one thing to clean up one’s own stuffs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058819738


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   I tihnk we've came to proposal that with such (low) overhead we have we can clean up database not before/after every provider but before (after?) any module. As you can see from the attached videos above - overall overhead for running single test is neglectible if we do (~0.2 s is almost not noticeable). It will be far less if we run the whole group of tests.
   
   If we always clean before each module - this will work not only for providers, but also for regular airflow tests - with such a low overhead, we do not need to do cleanup because we have 100% guarantee, that the DB will be cleaned before runnin any test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058866477


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   I've better keep `--with-db-init` time to time devs still need to clear DB (something changed in structure).
   
   I have for this run configurations, but someone could find `--with-db-init` more useful rather this
   
   ![image](https://user-images.githubusercontent.com/3998685/209937487-8682130a-d1f9-474d-a5ba-5da8ce40b831.png)
   
   



##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,89 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+from functools import lru_cache
+from pathlib import Path
+
+import pytest
+
+from tests.test_utils import db
+
+_CLEAR_DB_PROVIDERS = set()
+
+
+@lru_cache(maxsize=None)
+def providers_packages():
+    """Get providers packages full qualname."""
+
+    current_dir = Path(__file__).absolute().parent
+    providers = set()
+    for root in current_dir.iterdir():
+        if not root.is_dir():
+            continue
+
+        providers_dirs = set()
+        for sentinel in {"hooks", "operators", "sensors"}:
+            providers_dirs = providers_dirs.union({p.parent for p in root.rglob(sentinel) if p.is_dir()})
+
+        if providers_dirs:
+            for d in providers_dirs:
+                providers.add(".".join(d.relative_to(current_dir).parts))
+        else:
+            providers.add(root.name)
+
+    return providers
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for provider in providers_packages():
+        if name.startswith(provider):
+            return provider
+    return None
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in _CLEAR_DB_PROVIDERS:
+        _CLEAR_DB_PROVIDERS.add(provider_name)
+        db.clear_db_runs()
+        db.clear_db_datasets()
+        db.clear_db_dags()
+        db.clear_db_serialized_dags()
+        db.clear_db_sla_miss()
+        db.clear_db_pools()
+        db.clear_db_connections()
+        db.clear_db_variables()
+        db.clear_db_dag_code()
+        db.clear_db_callbacks()
+        db.clear_rendered_ti_fields()
+        db.clear_db_import_errors()
+        db.clear_db_dag_warnings()
+        db.clear_db_xcom()
+        db.clear_db_logs()
+        db.clear_db_jobs()
+        db.clear_db_task_fail()
+        db.clear_db_task_reschedule()
+        db.clear_dag_specific_permissions()
+        db.create_default_connections()
+        db.set_default_pool_slots(128)
+    yield

Review Comment:
   I've better keep `--with-db-init` time to time devs still need to clear DB (something changed in structure).
   
   I have for this run configurations, but someone could find `--with-db-init` more useful rather then this
   
   ![image](https://user-images.githubusercontent.com/3998685/209937487-8682130a-d1f9-474d-a5ba-5da8ce40b831.png)
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058531880


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   There is also one side-effect that we should take into account here.  This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).
   
   I think this can be solved by only doing this when we run for:
   
   1) the first time at all (so some kind of check if db is initialized) 
   2) or when we run in CI (`CI` env variable set to "true").
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058532316


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {

Review Comment:
   I think we could just need to scan directory until found one of the folder: `hooks`, `operators`, `sensors`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058569528


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   I think this is because of some import sequence @ephraimbuddy - does it ring a bell ?. It works for I think we **must** sort it out, because otherwise it's not future-proof - in case we add new stuff. 
   
   I can help sorting it out when we solve the other comments.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB between each separate providers tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1058574945


##########
tests/providers/conftest.py:
##########
@@ -0,0 +1,75 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+from __future__ import annotations
+
+import pytest
+
+from tests.test_utils import db
+
+# Providers with subpackages
+INNER_PROVIDERS = {
+    "alibaba",
+    "amazon",
+    "apache",
+    "atlassian",
+    "common",
+    "dbt",
+    "facebook",
+    "google",
+    "microsoft",
+}
+PROVIDERS_PACKAGES = set()
+
+
+def get_test_provider_name(m):
+    """Extract provider name from module full qualname."""
+    _, _, name = m.__name__.partition("providers.")
+    for inner_provider in INNER_PROVIDERS:
+        if name.startswith(inner_provider):
+            return ".".join(name.split(".", 2)[:2])
+    return name.split(".", 1)[0]
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db_between_providers_tests(request):
+    """Clear DB between each separate provider package test runs."""
+    provider_name = get_test_provider_name(request.module)
+    if provider_name and provider_name not in PROVIDERS_PACKAGES:
+        PROVIDERS_PACKAGES.add(provider_name)
+        db.clear_db_runs()

Review Comment:
   We also have `reset_db` fixture which not use anywhere, at least I can't find any usage of them
   
   https://github.com/apache/airflow/blob/69df1c5d9e27764daa3584d3da6b0647cda9093d/tests/conftest.py#L91-L100



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on PR #28631:
URL: https://github.com/apache/airflow/pull/28631#issuecomment-1367283217

   It is debugging time!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059077247


##########
tests/conftest.py:
##########
@@ -856,3 +851,24 @@ def reset_logging_config():
 
     logging_config = import_string(settings.LOGGING_CLASS_PATH)
     logging.config.dictConfig(logging_config)
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db(request):
+    """Clear DB before each test module run."""
+    if not request.config.option.db_cleanup:
+        return
+    dist_option = getattr(request.config.option, "dist", "no")
+    if dist_option != "no" or hasattr(request.config, "workerinput"):
+        # Skip if pytest-xdist detected (controller or worker)
+        return
+
+    try:
+        clear_all()
+    except Exception as ex:
+        exc_name_parts = [type(ex).__name__]
+        exc_module = type(ex).__module__
+        if exc_module != "builtins":
+            exc_name_parts.insert(0, exc_module)
+        extra_msg = "" if request.config.option.db_init else ", try to run with flag --with-db-init"

Review Comment:
   I like the escalation here and informative message.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059076939


##########
tests/conftest.py:
##########
@@ -856,3 +851,24 @@ def reset_logging_config():
 
     logging_config = import_string(settings.LOGGING_CLASS_PATH)
     logging.config.dictConfig(logging_config)
+
+
+@pytest.fixture(scope="module", autouse=True)
+def _clear_db(request):
+    """Clear DB before each test module run."""
+    if not request.config.option.db_cleanup:
+        return
+    dist_option = getattr(request.config.option, "dist", "no")

Review Comment:
   Cool. This is indeed important, though many tests will not work anyway with xdist. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1059077918


##########
tests/integration/api/auth/backend/test_kerberos_auth.py:
##########
@@ -46,19 +46,17 @@ def app_for_kerberos():
         yield app.create_app(testing=True)
 
 
-@pytest.fixture(scope="module", autouse=True)
+@pytest.fixture(scope="module")

Review Comment:
   I guess this is accidental fix that was needed? Can we maybe separate it out to another PR (just for trackability)? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on a diff in pull request #28631: Clear DB before individual module tests

Posted by GitBox <gi...@apache.org>.
Taragolis commented on code in PR #28631:
URL: https://github.com/apache/airflow/pull/28631#discussion_r1061598466


##########
tests/api/common/test_mark_tasks.py:
##########
@@ -49,7 +49,7 @@ def dagbag():
     from airflow.models.dagbag import DagBag
 
     # Ensure the DAGs we are looking at from the DB are up-to-date
-    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=False)
+    non_serialized_dagbag = DagBag(read_dags_from_db=False, include_examples=True)

Review Comment:
   > Yeah. Those changes are needed .
   
   Echo from the past. I trace back to initial commit from 2017 :rofl: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org