You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/07 14:26:49 UTC

[GitHub] [airflow] ltutar opened a new issue #21392: .airflowignore does not use 'glob' as .gitignore for example

ltutar opened a new issue #21392:
URL: https://github.com/apache/airflow/issues/21392


   ### Description
   
   Hi,
   
   It would be nice of the file `.airflowignore` also used 'glob' as `.gitignore` for example. In my case when I used `*_test.py` instead of `.*_test.py`, I could not see the dags and also got errors when I used `airflow dags list`. The error message was not easy to understand. Without the help of @potiuk, I was lost.
   
   An example is shown below:
   
   `
   Traceback (most recent call last):
     File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
       self.run()
     File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
       self._target(*self._args, **self._kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 370, in _run_processor_manager
       processor_manager.start()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 610, in start
       return self._run_parsing_loop()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 620, in _run_parsing_loop
       self._refresh_dag_dir()
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 755, in _refresh_dag_dir
       self._file_paths = list_py_file_paths(self._dag_directory)
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 169, in list_py_file_paths
       file_paths.extend(find_dag_file_paths(directory, safe_mode))
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 187, in find_dag_file_paths
       for file_path in find_path_from_directory(str(directory), ".airflowignore"):
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 115, in find_path_from_directory
       patterns += [re.compile(line) for line in lines_no_comments if line]
     File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 115, in <listcomp>
       patterns += [re.compile(line) for line in lines_no_comments if line]
     File "/usr/local/lib/python3.8/re.py", line 252, in compile
       return _compile(pattern, flags)
     File "/usr/local/lib/python3.8/re.py", line 304, in _compile
       p = sre_compile.compile(pattern, flags)
     File "/usr/local/lib/python3.8/sre_compile.py", line 764, in compile
       p = sre_parse.parse(p, flags)
     File "/usr/local/lib/python3.8/sre_parse.py", line 948, in parse
       p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
     File "/usr/local/lib/python3.8/sre_parse.py", line 443, in _parse_sub
       itemsappend(_parse(source, state, verbose, nested + 1,
     File "/usr/local/lib/python3.8/sre_parse.py", line 668, in _parse
       raise source.error("nothing to repeat",
   re.error: nothing to repeat at position 0
   `
   
   ### Use case/motivation
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031526637


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1040729582


   @potiuk thoughts on the configuration suggestion? I think supporting the Unix glob-style is reasonable, since it would match user expectations from other tools much more closely:
   
   * https://git-scm.com/docs/gitignore
   * https://docs.docker.com/engine/reference/builder/#dockerignore-file
   * https://v3-1-0.helm.sh/docs/chart_template_guide/helm_ignore_file/
   
   These all support a similar (identical?) syntax, so we'd be in good company.
   
   If we think it's worth pursuing it's probably something I can take on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss edited a comment on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss edited a comment on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1039504738


   One option might be to add a config option to specify the file syntax, which would default to `regexp` to maintain backwards compat. Something like:
   ```
   [core]
   dag_ignorefile_syntax = regexp|glob
   ```
   May be something I haven't considered though.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031851652


   > Yeah, if we wanted to change the format of this file to more closely match what people expect (which I agree is confusing) -- i.e. `.gitignore` we'd either have to change the filename, or add a `# airflow-ignore-v2` comment as the first line or something.
   
   The comment would be also not good, because people would not know they have to add it :( . Same story as now.
   
   I think we are a bit stuck with regexp until 2.3.0 (then we could change it and add migration check with smart detection and even conversion suggestions  - that would be kinda doable I think.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1042741680


   @ashb @potiuk can you put me down for this one please, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1039504738


   One option might be to add a config option to specify the file syntax, which would default to `regexp` to maintain backwards compat. Something like:
   ```
   [core]
   dag_ignorefile_syntax = regexp|glob
   ```
   May be something I haven't considered.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1040745937


   Yeah. That looks like a good idea to add configuration option. That's a clear path for deprecation as well - we coul add a deprecation for regexp now and swap the default in Airflow 3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031851652


   > Yeah, if we wanted to change the format of this file to more closely match what people expect (which I agree is confusing) -- i.e. `.gitignore` we'd either have to change the filename, or add a `# airflow-ignore-v2` comment as the first line or something.
   
   The comment would be also not good, because people would not know they have to add it :( . Same story as now.
   
   I think we are a bit stuck with regexp until ~2.3.0~ 3.0.0 (then we could change it and add migration check with smart detection and even conversion suggestions  - that would be kinda doable I think.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1039504738


   One option might be to add a config option to specify the file syntax, which would default to `regexp` to maintain backwards compat. Something like:
   ```
   [core]
   dag_ignorefile_syntax = regexp|glob
   ```
   May be something I haven't considered.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031851652


   > Yeah, if we wanted to change the format of this file to more closely match what people expect (which I agree is confusing) -- i.e. `.gitignore` we'd either have to change the filename, or add a `# airflow-ignore-v2` comment as the first line or something.
   
   The comment would be also not good, because people would not know they have to add it :( . Same story as now.
   
   I think we are a bit stuck with regexp until 3.0.0 (then we could change it and add migration check with smart detection and even conversion suggestions  - that would be kinda doable I think.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031836603


   Yeah, if we wanted to change the format of this file to more closely match what people expect (which I agree is confusing) -- i.e. `.gitignore` we'd either have to change the filename, or add a `# airflow-ignore-v2` comment as the first line or something.
   
   (Of the two I'd prefer a new name, but I don't know what I'd call the file)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031585411


   Another think to add could be to add the list of ignored files to the logs (With some reasonable frequency) with information that they were matched by the regexp (that would cover the case where regexp woudl not fail, but you would end up in not ignoring the files that you want).
   
   I marked it as "good first issue" I think this one might be a good one for some "fresh" contribution. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1031581595


   I thin changing to glob would not be feasible, as glob and regexp are not compatible. But better error message would be nice indeed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ianbuss edited a comment on issue #21392: .airflowignore does not use 'glob' as .gitignore for example

Posted by GitBox <gi...@apache.org>.
ianbuss edited a comment on issue #21392:
URL: https://github.com/apache/airflow/issues/21392#issuecomment-1039504738


   One option might be to add a config option to specify the file syntax, which would default to `regexp` to maintain backwards compat. Something like:
   ```
   [core]
   dag_ignorefile_syntax = regexp|glob
   ```
   May be something I haven't considered though.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org