You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Cedrik Neumann (JIRA)" <ji...@apache.org> on 2019/07/28 08:36:00 UTC

[jira] [Comment Edited] (AIRFLOW-5046) Allow GoogleCloudStorageToBigQueryOperator to accept source_objects as a string or otherwise take input from XCom

    [ https://issues.apache.org/jira/browse/AIRFLOW-5046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894652#comment-16894652 ] 

Cedrik Neumann edited comment on AIRFLOW-5046 at 7/28/19 8:35 AM:
------------------------------------------------------------------

Yeah, this problem - lists from xcom not usable in arguments accepting lists - arises in various use cases. It's worth debating,
 # if some operators which accept parameters as lists should allow comma delimited list in strings.
 # if operators should have a separate argument for string encoded lists
 # if operators should recognise xcom keys in arguments (i.e. prefixed with "xcom:" => "xcom:KEY")

First might suit most use cases, but has its limitations as it doesn't apply to all operators (SQL ones for example). Second might blow up operator interfaces and is probably the least generic solution. Third could be implemented as an airflow wide feature, which would enable this functionality to all operators, potentially limited to templated fields.


was (Author: m1racoli):
Yeah, this problem - lists from xcom not usable in arguments accepting lists - arises in various use cases. It's worth debating,
 # if some operators which accept parameters as lists should allow comma delimited list in strings.
 # if operators should have a separate argument for string encoded lists
 # if operators should recognise xcom keys in arguments (i.e. prefixed with "xcom:KEY")

First might suit most use cases, but has its limitations as it doesn't apply to all operators (SQL ones for example). Second might blow up operator interfaces and is probably the least generic solution. Third could be implemented as an airflow wide feature, which would enable this functionality to all operators, potentially limited to templated fields.

> Allow GoogleCloudStorageToBigQueryOperator to accept source_objects as a string or otherwise take input from XCom
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5046
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5046
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: contrib, gcp
>    Affects Versions: 1.10.2
>            Reporter: Joel Croteau
>            Priority: Minor
>
> `GoogleCloudStorageToBigQueryOperator` should be able to have its `source_objects` dynamically determined by the results of a previous workflow. This is hard to do with it expecting a list, as any template expansion will render as a string. This could be implemented either as a check for whether `source_objects` is a string, and trying to parse it as a list if it is, or a separate argument for a string encoded as a list.
> My particular use case for this is as follows:
>  # A daily DAG scans a GCS bucket for all objects created in the last day and loads them into BigQuery.
>  # To find these objects, a `PythonOperator` scans the bucket and returns a list of object names.
>  # A `GoogleCloudStorageToBigQueryOperator` is used to load these objects into BigQuery.
> The operator should be able to have its list of objects provided by XCom, but there is no functionality to do this, and trying to do a template expansion along the lines of `source_objects='\{{ task_instance.xcom_pull(key="KEY") }}'` doesn't work because this is rendered as a string, which `GoogleCloudStorageToBigQueryOperator` will try to treat as a list, with each character being a single item.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)