You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/27 23:11:01 UTC

[GitHub] [airflow] cchepelov opened a new pull request #10617: ensure a zipped DAG can locate the modules it provides

cchepelov opened a new pull request #10617:
URL: https://github.com/apache/airflow/pull/10617


   closes: #10616
   
   This PR provides a fix for the issue where DAG packages (zips) which provides possibly distinct versions of a support library might receive the wrong version from a different DAG
   
   To facilitate the review, it isn't yet squashed as requested; there are four commits:
   
   1. Unit tests that demonstrate the issue (and fail)
   2. a refactoring of DagBag.process_file to break up that function into more manageable pieces 
   3. a first-level resolution of the issue (effectively, every module loaded by a DAG within a zip is immediately taken off the global namespace, so that the next DAG pack can try again), which still fails in case a DAG zip attempts to provide a different version of an otherwise globally-supplied module (e.g. an updated tzdb or something like that)
   4. a correction to the previous, ensuring zipped DAG get loaded in a clean and neutral environment
   5. an improvement on the previous, in order to avoid unloading modules between files of the same DAG package (they will be intended to share dependencies if they have any)
   
   (Another issue that is fixed in passing is that previously, the sys.path would grow indefinitely each time the DagBag loads or reloads a zipped DAG)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708540035


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306852311) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-682265588


   Does this problem also occur with the master branch? We do not accept changes to branch v1-10-*, if this change has not been merged into the master branch. After merging into the master branch, if possible, the change is cherry-picj to branch v1-10 by the release maintainer.
   
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amr-noureldin commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
amr-noureldin commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696144973


   @cchepelov do you have an ETA on when this fix will be ready? We are also impacted by the exact same issue, and I am wondering if we shall wait for the next Airflow release, or try to manually integrate your changes into our fork.
   
   Thanks anyway for your efforts here!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stale[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
stale[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-751240011


   This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708455268


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306687994) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-709310346


   > > serves no end in 2.0.0 and onwards.
   > 
   > @ashb Are you saying this issue will not exist in 2.0.0?
   
   That is what I think, yes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: WIP: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708356052


   indeed @mik-laj ; although I still need a couple cleanups before it's ready for review. 
   
   Slack, ugh... 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708220874


   (why those conflicts? T_T I thought I had just rebased...)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696208338


   My apologies, I’ve been sidetracked and could not yet finish the patch against the 2.0 branch (playing whack-a-mole with additional ’sys.path leaks’), although I’d reckon I’m 95% there.
   
   I hope to publish it tomorrow. 
   
   Kamil is right you probably ought to take the patch as I originally sent it and attempt to integrate it into your system if that works for you. 
   
   
   
   
   > Le 21 sept. 2020 à 16:45, Kamil Breguła <no...@github.com> a écrit :
   > 
   > 
   > if you can, please integrate this patch yourself. This PR target to Airflow 1.10 and we are focusing on Airflow 2,0 development. There is a long way to go before this change could be released in the 1.10 release,
   > 
   > First, we need to prepare an PR for Airflow 2.0. Once it gets merged we can work on a change for Airflow 1.10, but we have a lot of differences between releases, so we would have to prepare many other dependent changes. we have too much work on Airflow 2.0 to spend time on this bug. I would be happy to help if you would like to start working on a change for Airflow 2.0.
   > 
   > On other hand, we are getting closer to 2.0 and we also need to have some "nice feature" in 2.0 to get more people exited to migrate to it.
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/airflow/pull/10617#issuecomment-696162904>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALR7ME4YIM4KS5ZQU4DSD3SG5RI7ANCNFSM4QNQIZPQ>.
   > 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#discussion_r504734126



##########
File path: airflow/models/dagbag.py
##########
@@ -299,48 +300,120 @@ def _load_modules_from_file(self, filepath, safe_mode):
 
     def _load_modules_from_zip(self, filepath, safe_mode):
         mods = []
+
+        mod_overrides = []
         current_zip_file = zipfile.ZipFile(filepath)
+
+        # we do a first pass within the zip, in order to know which module names are provided by the zip itself at the
+        # root. Any such module name gets removed from the system path in order to ensure a successful override
+        # by the packaged zip, should the user want it.
         for zip_info in current_zip_file.infolist():
             head, _ = os.path.split(zip_info.filename)
             mod_name, ext = os.path.splitext(zip_info.filename)
             if ext not in [".py", ".pyc"]:
                 continue
             if head:
                 continue
+            mod_overrides.append(mod_name.replace('/','.').replace('\\', '.'))
 
-            if mod_name == '__init__':
-                self.log.warning("Found __init__.%s at root of %s", ext, filepath)
-
-            self.log.debug("Reading %s from %s", zip_info.filename, filepath)
+        modules_initial = set(sys.modules.keys())
 
-            if not might_contain_dag(zip_info.filename, safe_mode, current_zip_file):
-                # todo: create ignore list
-                # Don't want to spam user with skip messages
-                if not self.has_logged or True:
-                    self.has_logged = True
-                    self.log.info(
-                        "File %s:%s assumed to contain no DAGs. Skipping.",
-                        filepath, zip_info.filename
-                    )
+        for mod_key in modules_initial:
+            if mod_key in mod_overrides:
+                self.log.info("dropping module %s due to presence in mod_overrides" % mod_key)
+                del sys.modules[mod_key]
                 continue
+            if not self._is_system_module(mod_key):
+                self.log.info("dropping module %s due to _is_system_module=False" % mod_key)
+                del sys.modules[mod_key]
 
-            if mod_name in sys.modules:
-                del sys.modules[mod_name]
+        modules_before = set(sys.modules.keys())
+        path_before = list(sys.path)
 
-            try:
-                sys.path.insert(0, filepath)
-                current_module = importlib.import_module(mod_name)
-                mods.append(current_module)
-            except Exception as e:  # pylint: disable=broad-except
-                self.log.exception("Failed to import: %s", filepath)
-                if self.dagbag_import_error_tracebacks:
-                    self.import_errors[filepath] = traceback.format_exc(
-                        limit=-self.dagbag_import_error_traceback_depth
-                    )
-                else:
-                    self.import_errors[filepath] = str(e)
+        try:
+            for zip_info in current_zip_file.infolist():
+                head, _ = os.path.split(zip_info.filename)
+                mod_name, ext = os.path.splitext(zip_info.filename)
+                if ext not in [".py", ".pyc"]:
+                    continue
+                if head:
+                    continue
+
+                if mod_name == '__init__':
+                    self.log.warning("Found __init__.%s at root of %s", ext, filepath)
+
+                self.log.debug("Reading %s from %s", zip_info.filename, filepath)
+
+                if not might_contain_dag(zip_info.filename, safe_mode, current_zip_file):
+                    # todo: create ignore list
+                    # Don't want to spam user with skip messages
+                    if not self.has_logged or True:
+                        self.has_logged = True
+                        self.log.info(
+                            "File %s:%s assumed to contain no DAGs. Skipping.",
+                            filepath, zip_info.filename
+                        )
+                    continue
+
+                if mod_name in sys.modules:
+                    del sys.modules[mod_name]
+
+                try:
+                    if not filepath in sys.path:
+                        sys.path.insert(0, filepath)
+                    current_module = importlib.import_module(mod_name)
+                    mods.append(current_module)
+                except Exception as e:  # pylint: disable=broad-except
+                    self.log.exception("Failed to import: %s", filepath)
+                    if self.dagbag_import_error_tracebacks:
+                        self.import_errors[filepath] = traceback.format_exc(
+                            limit=-self.dagbag_import_error_traceback_depth
+                        )
+                    else:
+                        self.import_errors[filepath] = str(e)
+        finally:
+            loaded_modules = set(sys.modules.keys()) - modules_before
+            for mod_key in loaded_modules:
+                del sys.modules[mod_key]
+
+            sys.path = path_before
         return mods
 
+    def _is_system_module(self, module_name):
+        """
+        :return: True if the module comes from the platform (and ought to stay loaded) or might come from
+        the actions of a DAG (in which case it's safer to unload it before reloading it)
+        """
+        if module_name in ["sys", "types", "__main__", "websocket", "airflow"]:

Review comment:
       You're going to have to explain this in a lot more detail. -- there are many many more system modules than this :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10617: WIP: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708325423


   @cchepelov  If you have made a change to your PR but you haven't notified anyone and you don't have a reviewer, it is unlikely that the change will be noticed.
   See: 
   https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow
   If you have made a change to your PR, you must at least write a comment in your PR. Ideally, you should find a reviewer on Slack. It will be helpful to ask for a reviewer on the #development channel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708497093


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306829171) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on a change in pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on a change in pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#discussion_r504745282



##########
File path: airflow/models/dagbag.py
##########
@@ -299,48 +300,120 @@ def _load_modules_from_file(self, filepath, safe_mode):
 
     def _load_modules_from_zip(self, filepath, safe_mode):
         mods = []
+
+        mod_overrides = []
         current_zip_file = zipfile.ZipFile(filepath)
+
+        # we do a first pass within the zip, in order to know which module names are provided by the zip itself at the
+        # root. Any such module name gets removed from the system path in order to ensure a successful override
+        # by the packaged zip, should the user want it.
         for zip_info in current_zip_file.infolist():
             head, _ = os.path.split(zip_info.filename)
             mod_name, ext = os.path.splitext(zip_info.filename)
             if ext not in [".py", ".pyc"]:
                 continue
             if head:
                 continue
+            mod_overrides.append(mod_name.replace('/','.').replace('\\', '.'))
 
-            if mod_name == '__init__':
-                self.log.warning("Found __init__.%s at root of %s", ext, filepath)
-
-            self.log.debug("Reading %s from %s", zip_info.filename, filepath)
+        modules_initial = set(sys.modules.keys())
 
-            if not might_contain_dag(zip_info.filename, safe_mode, current_zip_file):
-                # todo: create ignore list
-                # Don't want to spam user with skip messages
-                if not self.has_logged or True:
-                    self.has_logged = True
-                    self.log.info(
-                        "File %s:%s assumed to contain no DAGs. Skipping.",
-                        filepath, zip_info.filename
-                    )
+        for mod_key in modules_initial:
+            if mod_key in mod_overrides:
+                self.log.info("dropping module %s due to presence in mod_overrides" % mod_key)
+                del sys.modules[mod_key]
                 continue
+            if not self._is_system_module(mod_key):
+                self.log.info("dropping module %s due to _is_system_module=False" % mod_key)
+                del sys.modules[mod_key]
 
-            if mod_name in sys.modules:
-                del sys.modules[mod_name]
+        modules_before = set(sys.modules.keys())
+        path_before = list(sys.path)
 
-            try:
-                sys.path.insert(0, filepath)
-                current_module = importlib.import_module(mod_name)
-                mods.append(current_module)
-            except Exception as e:  # pylint: disable=broad-except
-                self.log.exception("Failed to import: %s", filepath)
-                if self.dagbag_import_error_tracebacks:
-                    self.import_errors[filepath] = traceback.format_exc(
-                        limit=-self.dagbag_import_error_traceback_depth
-                    )
-                else:
-                    self.import_errors[filepath] = str(e)
+        try:
+            for zip_info in current_zip_file.infolist():
+                head, _ = os.path.split(zip_info.filename)
+                mod_name, ext = os.path.splitext(zip_info.filename)
+                if ext not in [".py", ".pyc"]:
+                    continue
+                if head:
+                    continue
+
+                if mod_name == '__init__':
+                    self.log.warning("Found __init__.%s at root of %s", ext, filepath)
+
+                self.log.debug("Reading %s from %s", zip_info.filename, filepath)
+
+                if not might_contain_dag(zip_info.filename, safe_mode, current_zip_file):
+                    # todo: create ignore list
+                    # Don't want to spam user with skip messages
+                    if not self.has_logged or True:
+                        self.has_logged = True
+                        self.log.info(
+                            "File %s:%s assumed to contain no DAGs. Skipping.",
+                            filepath, zip_info.filename
+                        )
+                    continue
+
+                if mod_name in sys.modules:
+                    del sys.modules[mod_name]
+
+                try:
+                    if not filepath in sys.path:
+                        sys.path.insert(0, filepath)
+                    current_module = importlib.import_module(mod_name)
+                    mods.append(current_module)
+                except Exception as e:  # pylint: disable=broad-except
+                    self.log.exception("Failed to import: %s", filepath)
+                    if self.dagbag_import_error_tracebacks:
+                        self.import_errors[filepath] = traceback.format_exc(
+                            limit=-self.dagbag_import_error_traceback_depth
+                        )
+                    else:
+                        self.import_errors[filepath] = str(e)
+        finally:
+            loaded_modules = set(sys.modules.keys()) - modules_before
+            for mod_key in loaded_modules:
+                del sys.modules[mod_key]
+
+            sys.path = path_before
         return mods
 
+    def _is_system_module(self, module_name):
+        """
+        :return: True if the module comes from the platform (and ought to stay loaded) or might come from
+        the actions of a DAG (in which case it's safer to unload it before reloading it)
+        """
+        if module_name in ["sys", "types", "__main__", "websocket", "airflow"]:

Review comment:
       fair; there are many more criteria than just being named one of these five modules. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696162904


   if you can, please integrate this patch yourself.  This PR target to Airflow 1.10 and we are focusing on Airflow 2,0 development. There is a long way to go before this change could be released in the 1.10 release,
   
   First, we need to prepare an PR for Airflow 2.0.  Once it gets merged we can work on a change for Airflow 1.10, but we have a lot of differences between releases, so we would have to prepare many other dependent changes. we have too much work on Airflow 2.0 to spend time on this bug. I would be happy to help if you would like to start working on a change for Airflow 2.0.
   
   On other hand, we are getting closer to 2.0 and we also need to have some "nice feature" in 2.0 to get more people exited to migrate to it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amr-noureldin commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
amr-noureldin commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696144973


   @cchepelov do you have an ETA on when this fix will be ready? We are also impacted by the exact same issue, and I am wondering if we shall wait for the next Airflow release, or try to manually integrate your changes into our fork.
   
   Thanks anyway for your efforts here!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708455763


   1.10.1 or 1.10.10?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-682326811


   Thank you for the prompt reply @mik-laj! 
   
   Looking at master:
   
   - the code has been refactored for legibility already, so there won't be any cherry-picking in either direction
   - the core part of the dag zip-loading code doesn't appear materially changed between 1.10.12 and master, so there is a high likelihood the issue affects master as well.
   
   I'll port the PR forward and push it again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708494636


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306824043) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #10617:
URL: https://github.com/apache/airflow/pull/10617


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708512364


   Encountered the problem on 1.10.1 (which $work was using), reproduced it (and started to fix it) on 1.10.12, then ported it (and completed it) on master of late August, then rebased today
    
   The module-loading code was effectively identical on all four versions, with the same fundamental issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696162904


   if you can, please integrate this patch yourself.  This PR target to Airflow 1.10 and we are focusing on Airflow 2,0 development. There is a long way to go before this change could be released in the 1.10 release,
   
   First, we need to prepare an PR for Airflow 2.0.  Once it gets merged we can work on a change for Airflow 1.10, but we have a lot of differences between releases, so we would have to prepare many other dependent changes. we have too much work on Airflow 2.0 to spend time on this bug. I would be happy to help if you would like to start working on a change for Airflow 2.0.
   
   On other hand, we are getting closer to 2.0 and we also need to have some "nice feature" in 2.0 to get more people exited to migrate to it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708453080


   it affects task execution as well. 
   
   In certain cases, I've witnessed the (1.10.1) webserver happily pick up the correct module, and then some task executor instances appear fine, and then *somehow* another task execution instance loads things in a different order and then the wrong module version kicks in.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] abhilash1in commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
abhilash1in commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708918755


   @ashb Are you saying this issue will not exist in 2.0.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-682242239


   (CI failed on an error which doesn't seem related... Huh?:
   
   `     Running setup.py install for pyodbc: finished with status 'error'
       Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-3b6bnkqc/pyodbc/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3iim2fgk/install-record.txt --single-version-externally-managed --compile:
       running install
       running build
       running build_ext
       building 'pyodbc' extension
       creating build
       creating build/temp.linux-x86_64-3.8
       creating build/temp.linux-x86_64-3.8/src
       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPYODBC_VERSION=4.0.30 -I/usr/local/include/python3.8 -c src/buffer.cpp -o build/temp.linux-x86_64-3.8/src/buffer.o -Wno-write-strings
       In file included from src/buffer.cpp:12:
       src/pyodbc.h:56:10: fatal error: sql.h: No such file or directory
        #include <sql.h>
                 ^~~~~~~
       compilation terminated.
       error: command 'gcc' failed with exit status 1
       `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708483521


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306789100) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-682236471


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://apache-airflow-slack.herokuapp.com/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708515603


   Basically my concern here is that this is a lot of code, and not all python modules are safe to just delete from sys.module (specifically things with compiled components might not be happy with this) -- and the following two constraints should already hold in 1.10.12 with DAG serialization enabled, or in 2.0.0 always:
   
   - When parsing a dag file, it is done in a "single-use" process that parses one file, then exits.
   - When executing a task, the same caveat should apply - the dag is never loaded in a long-running/persistent python process.
   
   So my worry is this is a lot of code and "messing about" in python internals that serves no end in 2.0.0 and onwards.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708492564


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306819054) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] abhilash1in edited a comment on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
abhilash1in edited a comment on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708918755


   > serves no end in 2.0.0 and onwards.
   
   @ashb Are you saying this issue will not exist in 2.0.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708512967


   Module loading code is the same, but the execution environment of tasks is _very_ different between the versions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-696208338


   My apologies, I’ve been sidetracked and could not yet finish the patch against the 2.0 branch (playing whack-a-mole with additional ’sys.path leaks’), although I’d reckon I’m 95% there.
   
   I hope to publish it tomorrow. 
   
   Kamil is right you probably ought to take the patch as I originally sent it and attempt to integrate it into your system if that works for you. 
   
   
   
   
   > Le 21 sept. 2020 à 16:45, Kamil Breguła <no...@github.com> a écrit :
   > 
   > 
   > if you can, please integrate this patch yourself. This PR target to Airflow 1.10 and we are focusing on Airflow 2,0 development. There is a long way to go before this change could be released in the 1.10 release,
   > 
   > First, we need to prepare an PR for Airflow 2.0. Once it gets merged we can work on a change for Airflow 1.10, but we have a lot of differences between releases, so we would have to prepare many other dependent changes. we have too much work on Airflow 2.0 to spend time on this bug. I would be happy to help if you would like to start working on a change for Airflow 2.0.
   > 
   > On other hand, we are getting closer to 2.0 and we also need to have some "nice feature" in 2.0 to get more people exited to migrate to it.
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub <https://github.com/apache/airflow/pull/10617#issuecomment-696162904>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALR7ME4YIM4KS5ZQU4DSD3SG5RI7ANCNFSM4QNQIZPQ>.
   > 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] cchepelov removed a comment on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
cchepelov removed a comment on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-682242239


   (CI failed on an error which doesn't seem related... Huh?:
   
   `     Running setup.py install for pyodbc: finished with status 'error'
       Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-3b6bnkqc/pyodbc/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-3iim2fgk/install-record.txt --single-version-externally-managed --compile:
       running install
       running build
       running build_ext
       building 'pyodbc' extension
       creating build
       creating build/temp.linux-x86_64-3.8
       creating build/temp.linux-x86_64-3.8/src
       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPYODBC_VERSION=4.0.30 -I/usr/local/include/python3.8 -c src/buffer.cpp -o build/temp.linux-x86_64-3.8/src/buffer.o -Wno-write-strings
       In file included from src/buffer.cpp:12:
       src/pyodbc.h:56:10: fatal error: sql.h: No such file or directory
        #include <sql.h>
                 ^~~~~~~
       compilation terminated.
       error: command 'gcc' failed with exit status 1
       `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #10617: ensure a zipped DAG can locate the modules it provides

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10617:
URL: https://github.com/apache/airflow/pull/10617#issuecomment-708502704


   [The Workflow run](https://github.com/apache/airflow/actions/runs/306843555) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks$,^Build docs$,^Spell check docs$,^Backport packages$,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org