You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/12 12:52:41 UTC

[GitHub] [airflow] julwin opened a new issue, #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

julwin opened a new issue, #28304:
URL: https://github.com/apache/airflow/issues/28304

   ### Apache Airflow version
   
   2.5.0
   
   ### What happened
   
   When using dynamic task mapping, I tried to update an airflow dataset for every file in a S3 location. 
   The URI for the dataset should be generated inside of a Taskflow task, containing the basename of the processed file.
   
   The
   ```python
   @task()
   def extract_s3_destination():
     dataset = Dataset("s3://dataset-bucket/example.csv")
     return {"filename": path, "dest_key": dest_key, "outlets": [dataset]}
   ```
   
   When handing this over to `expand_kwargs` (even with strict=False), this error occurs:
   
   ```python
   filenames = extract_s3_destination.expand(path=XComArg(list_modified_files))
   
   copy_to_s3 = LocalFilesystemToS3Operator.partial(
           task_id="copy_file_to_s3",
           aws_conn_id="s3",
           gzip=True,
           replace=True,
           dest_bucket="{{ conn.s3.login }}",
       ).expand_kwargs(filenames, strict=False)
   ```
   
   
   ```
   {manager.py:56} WARNING - DatasetModel Dataset(uri='unprocessed/customfilename20221212.zip', extra=None) not found
   ```
   
   The same happens when 
   
   ### What you think should happen instead
   
   The Dataset should be updated if not exists or created, the way you would expect it after reading these examples from the Documentation:
   
   ```
   from airflow import Dataset
   
   with DAG(...):
       MyOperator(
           # this task updates example.csv
           outlets=[Dataset("s3://dataset-bucket/example.csv")],
           ...,
       )
   
   
   with DAG(
       # this DAG should be run when example.csv is updated (by dag1)
       schedule=[Dataset("s3://dataset-bucket/example.csv")],
       ...,
   ):
       ...
   ```
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   kubernetes / airflow executor / default image
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1348167478

   Is this specific to `expand_kwargs`? I am under the impression passing outlet as XComArg would not work in general since the value needs to be accessed in the scheduler.
   
   Say
   
   ```python
   @task()
   def get_outlet():
       return Dataset("s3://dataset-bucket/example.csv")
   
   MyOperator(outlets=[get_outlet()])  # Does this work?
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1346443875

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1496021382

   Looking at the history - because the author did not respond to the question - it has been closed automatically after than. If you want, you can respond in a new issue @DaanRademaker - there is nothing stopping you from creating similar issue (and becoming and author who will provide more insights when asked). Feel absolutely free.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] DaanRademaker commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "DaanRademaker (via GitHub)" <gi...@apache.org>.
DaanRademaker commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1495561016

   Why is this issue closed? This does not seem possible currently. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1471014896

   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs
URL: https://github.com/apache/airflow/issues/28304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1461065698

   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #28304: expand_kwargs does not handle update of dataset correctly when "outlets" parameter is used as kwargs

Posted by "uranusjr (via GitHub)" <gi...@apache.org>.
uranusjr commented on issue #28304:
URL: https://github.com/apache/airflow/issues/28304#issuecomment-1496037138

   If you do reactivate the issue, please make sure to first investigate the topic asked above first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org