You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/27 13:31:09 UTC

[GitHub] [airflow] ashb commented on pull request #11875: Dont create duplicate dag file processors

ashb commented on pull request #11875:
URL: https://github.com/apache/airflow/pull/11875#issuecomment-717244661


   Are you sure this is still a problem on mainline. Looking at start_new_processes:
   
   ```python
       def start_new_processes(self):
           """Start more processors if we have enough slots and files to process"""
           while self._parallelism - len(self._processors) > 0 and self._file_path_queue:
               file_path = self._file_path_queue.pop(0)
               callback_to_execute_for_file = self._callback_to_execute[file_path]
               processor = self._processor_factory(
                   file_path,
                   callback_to_execute_for_file,
                   self._dag_ids,
                   self._pickle_dags)
   
               del self._callback_to_execute[file_path]
               Stats.incr('dag_processing.processes')
   
               processor.start()
               self.log.debug(
                   "Started a process (PID: %s) to generate tasks for %s",
                   processor.pid, file_path
               )
               self._processors[file_path] = processor
               self.waitables[processor.waitable_handle] = processor
   ```
   
   I can't see at first glance how `self._parallelism - len(self._processors) > 0` would ever lead to too many processes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org