You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Joel Croteau (JIRA)" <ji...@apache.org> on 2019/08/03 17:01:00 UTC

[jira] [Comment Edited] (AIRFLOW-5046) Allow GoogleCloudStorageToBigQueryOperator to accept source_objects as a string or otherwise take input from XCom

    [ https://issues.apache.org/jira/browse/AIRFLOW-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894725#comment-16894725 ] 

Joel Croteau edited comment on AIRFLOW-5046 at 8/3/19 5:00 PM:
---------------------------------------------------------------

[~m1racoli], I would agree that the third solution sounds best. I would suggest that this should be done at template expansion time. An XCom value can be any pickleable object, and since the template expansion simply replaces the content of the templated value with the expanded value returned by Jinja, there is no reason that that expanded value would have to be a string. Changing that behavior would require modifying or extending Jinja though. Perhaps there could be a second phase of template expansion, or a special template syntax specifically to check for XCom operators, and change the templated value to whatever was actually passed to XCom.


was (Author: tv4fun):
[~m1racoli], I would agree that the third solution sounds best. I would suggest that this should be done at template expansion time. An XCom value can be any pickleable object, and since the template expansion simply replaces the content of the templated value with the expanded value returned by Jinja, there is no reason that that expanded value would have to be a string. Changing that behavior would require modifying or extending Jinja though. Perhaps their could be a second phase of template expansion, or a special template syntax specifically to check for XCom operators, and change the templated value to whatever was actually passed to XCom.

> Allow GoogleCloudStorageToBigQueryOperator to accept source_objects as a string or otherwise take input from XCom
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5046
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5046
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib, gcp
>    Affects Versions: 1.10.2
>            Reporter: Joel Croteau
>            Priority: Minor
>
> `GoogleCloudStorageToBigQueryOperator` should be able to have its `source_objects` dynamically determined by the results of a previous workflow. This is hard to do with it expecting a list, as any template expansion will render as a string. This could be implemented either as a check for whether `source_objects` is a string, and trying to parse it as a list if it is, or a separate argument for a string encoded as a list.
> My particular use case for this is as follows:
>  # A daily DAG scans a GCS bucket for all objects created in the last day and loads them into BigQuery.
>  # To find these objects, a `PythonOperator` scans the bucket and returns a list of object names.
>  # A `GoogleCloudStorageToBigQueryOperator` is used to load these objects into BigQuery.
> The operator should be able to have its list of objects provided by XCom, but there is no functionality to do this, and trying to do a template expansion along the lines of `source_objects='\{{ task_instance.xcom_pull(key="KEY") }}'` doesn't work because this is rendered as a string, which `GoogleCloudStorageToBigQueryOperator` will try to treat as a list, with each character being a single item.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)