You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/18 20:59:08 UTC

[GitHub] [airflow] gfelot opened a new issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

gfelot opened a new issue #16529:
URL: https://github.com/apache/airflow/issues/16529


   
   
   **Apache Airflow version**: 2.0.1
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): Ubuntu 20.04
   
   
   **What happened**:
   
   I trigger my dag with the API from a lambda function with a trigger on a file upload. I get the file path from the lambda context
   i.e. : `ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML`
   
   I put this variable in the API call to get it back as `"{{ dag_run.conf['file_path'] }}"`
   
   At some point, I need to extract information from this string by splitting it by `/` so inside the DAG to use the `S3CopyObjectOperator`.
   
   So here the first approach I had
   
   ```python
   from datetime import datetime
   
   from airflow import DAG
   from airflow.providers.amazon.aws.operators.s3_copy_object import S3CopyObjectOperator
   from airflow.operators.python_operator import PythonOperator
   
   
   default_args = {
       'owner': 'me',
   }
   
   s3_final_destination = {
       "bucket_name": "ingestion.archive.dev",
       "verification_failed": "validation_failed",
       "processing_failed": "processing_failed",
       "processing_success": "processing_success"
   }
   
   
   def print_var(file_path,
                 file_split,
                 source_bucket,
                 source_path,
                 file_name):
       data = {
           "file_path": file_path,
           "file_split": file_split,
           "source_bucket": source_bucket,
           "source_path": source_path,
           "file_name": file_name
       }
   
       print(data)
   
   
   with DAG(
           f"test_s3_transfer",
           default_args=default_args,
           description='Test',
           schedule_interval=None,
           start_date=datetime(2021, 4, 24),
           tags=['ingestion', "test", "context"],
   
   ) as dag:
       # {"file_path": "ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"}
       file_path = "{{ dag_run.conf['file_path'] }}"
       file_split = file_path.split('/')
       source_bucket = file_split[0]
       source_path = "/".join(file_split[1:])
       file_name = file_split[-1]
   
       test_var = PythonOperator(
           task_id="test_var",
           python_callable=print_var,
           op_kwargs={
               "file_path": file_path,
               "file_split": file_split,
               "source_bucket": source_bucket,
               "source_path": source_path,
               "file_name": file_name
           }
       )
   
       file_verification_fail_to_s3 = S3CopyObjectOperator(
           task_id="file_verification_fail_to_s3",
           source_bucket_key=source_bucket,
           source_bucket_name=source_path,
           dest_bucket_key=s3_final_destination["bucket_name"],
           dest_bucket_name=f'{s3_final_destination["verification_failed"]}/{file_name}'
       )
   
       test_var >> file_verification_fail_to_s3
   
   ```
   
   I use the `PythonOperator` to check the value I got to debug.
   I have the right value in `file_path` but I got in `file_split` -> `['ingestion.archive.dev/yolo/PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML']`
   It's my str in a list and not each part splited like `["ingestion.archive.dev", "yolo", "PMS_2_DXBTD_RTBD_2021032800000020210328000000SD_20210329052822.XML"]`.
   
   So what's wrong here?
   So I started to read more about Jinja Templating and I find out this on Airflow : https://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html#rendering-fields-as-native-python-objects
   
   And try to use `render_template_as_native_obj=True` to solve my issue, but I got an error when the scheduler picked up my dag saying that this args isn't in the DAG object. Effectivly in the documentation, you cannot find it either : 
   https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/dag/index.html?highlight=dag#module-airflow.models.dag
   
   I try to use this argument in the `jinja_environment_kwargs` arg, but it's not available again. So there is a regression and an error in the documentation.
   
   But my real question is how to split my jinja str ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vikramkoka commented on issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

Posted by GitBox <gi...@apache.org>.
vikramkoka commented on issue #16529:
URL: https://github.com/apache/airflow/issues/16529#issuecomment-864188566


   > @vikramkoka Would you mind if I took this one?
   
   @josh-fell Go for it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gfelot closed issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

Posted by GitBox <gi...@apache.org>.
gfelot closed issue #16529:
URL: https://github.com/apache/airflow/issues/16529


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #16529:
URL: https://github.com/apache/airflow/issues/16529#issuecomment-864186705


   @vikramkoka Would you mind if I took this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] josh-fell commented on issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

Posted by GitBox <gi...@apache.org>.
josh-fell commented on issue #16529:
URL: https://github.com/apache/airflow/issues/16529#issuecomment-864280192


   Hi @gfelot,
   
   Addressing the errors you are observing:
   - The documentation you are referring to is for version 2.1 (current stable) while you are running 2.0.2 (of which the documentation can be found [here](https://airflow.apache.org/docs/apache-airflow/2.0.1/_api/airflow/models/dag/index.html?highlight=dag#module-airflow.models.dag)). The `render_template_as_native_obj` parameter wasn't introduced until 2.1 so that's why it's missing from the `DAG` object within your environment.  Additionally, the `jinja_environment_kwargs` arguments refer to the initialization parameters for a [jinja2.Environment](https://jinja.palletsprojects.com/en/3.0.x/api/#jinja2.Environment).
   - The [documentation for 2.1](https://airflow.apache.org/docs/apache-airflow/stable/_modules/airflow/models/dag.html#DAG) _is_ missing for the `render_template_as_native_obj` parameter and we'll get that added.  Thank you for catching this!
   
   Pertaining to splitting Jinja strings:
   - Jinja expressions are not evaluated/rendered in Airflow until runtime (i.e. within the operator's `execute()` method.  As written, the parsing/splitting of the string is done outside of the operator.  Only after the task begins executing is the Jinja expression evaluated and rendered. Therefore the value being initialized is `[" {{ dag_run.conf['file_path'] }} "]` as there is no `/` character in the literal string of  `{{ dag_run.conf['file_path'] }}`. 
   - A couple options:
     - Assuming the "test_var" task is not for demonstration purposes, move the splitting of the `file_path` value into the `print_var()` function and use `XComs` or the Taskflow API to pass values to the "file_verification_fail_to_s3" task.
     - Since the `source_bucket_key`, `source_bucket_name`, and `dest_bucket_name` arguments can be templated values in the `S3CopyObjectOperator`, you could directly access the `conf` object as you do now and split the value as needed within the Jinja expression.
   
   
   For similar questions in the future, please use GitHub discussions, not issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gfelot commented on issue #16529: Cannot split Jinja str and render_template_as_native_obj doesn't exist

Posted by GitBox <gi...@apache.org>.
gfelot commented on issue #16529:
URL: https://github.com/apache/airflow/issues/16529#issuecomment-864287465


   For the doc issue. My bad I didn't check the version you are right. Good if I help out on something. When I try to fix the doc with the builtin button "Suggest a change on this page" I got a 404: https://github.com/apache/airflow/edit/devel/docs/apache-airflow/concepts/operators.rst
   
   For the splitting issue, I knew that jinja was working at running level but I thought that the splitting would work. I made several variables like that because I will need those values like 5 times in the same DAG (the one I shared is for demonstration purposes). That's why I didn't want to write the logic multiple times and had to fix the same bug at differents places (kinda factorization).
   The PythonOperator is for demonstration here so I know how to use XCom. For the S3Operator I will use the templated values.
   
   Thank you for the quick answer.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org