You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Eugene Brown (JIRA)" <ji...@apache.org> on 2018/05/22 16:01:00 UTC

[jira] [Updated] (AIRFLOW-2508) Handle non string types in render_template_from_field

     [ https://issues.apache.org/jira/browse/AIRFLOW-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eugene Brown updated AIRFLOW-2508:
----------------------------------
    Description: 
The render_template_from_field method of the BaseOperator class raises an exception when it encounters content that is not a string_type, list, tuple or dict.

Example exception:
{noformat}
airflow.exceptions.AirflowException: Type '<type 'int'>' used for parameter 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not supported for templating{noformat}
I propose instead that when it encounters content of other types it returns the content unchanged, rather than raising an exception.

Consider this case: I extended the EmrCreateJobFlowOperator to make the job_flow_overrides argument a templatable field. job_flow_overrides is a dictionary with a mix of strings, integers and booleans for values.

When I extended the class as such:
{code:java}
class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
    template_fields = ['job_flow_overrides']{code}
And added a task to my dag with this format:
{code:java}
step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
    task_id="create_cluster",
    job_flow_overrides={
        "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
        "Instances": {
            "InstanceGroups": [
                {
                    "Name": "Master nodes",
                    "InstanceType": "c3.4xlarge",
                    "InstanceCount": 1
                },
                {
                    "Name": "Slave nodes",
                    "InstanceType": "c3.4xlarge",
                    "InstanceCount": 4
                },
                "TerminationProtected": False
            ]
        },
        "BootstrapActions": [{
             "Name": "Custom action",
             "ScriptBootstrapAction": {
                 "Path": "s3://repo/{{ dag_run.conf['branch'] }}/requirements.txt"
             }
        }],
   },
   aws_conn_id='aws_default',
   emr_conn_id='aws_default',
   dag=dag
)
{code}
The exception I gave above was raised and the step failed. I think it would be preferable for the method to instead pass over numeric and boolean values as users may want to use template_fields in the way I have to template string values in dictionaries or lists of mixed types.

Here is the render_template_from_field method from the BaseOperator:
{code:java}
def render_template_from_field(self, attr, content, context, jinja_env):
    """
    Renders a template from a field. If the field is a string, it will
    simply render the string and return the result. If it is a collection or
    nested set of collections, it will traverse the structure and render
    all strings in it.
    """
    rt = self.render_template
    if isinstance(content, six.string_types):
        result = jinja_env.from_string(content).render(**context)
    elif isinstance(content, (list, tuple)):
        result = [rt(attr, e, context) for e in content]
    elif isinstance(content, dict):
        result = {
            k: rt("{}[{}]".format(attr, k), v, context)
            for k, v in list(content.items())}
    else:
        param_type = type(content)
        msg = (
            "Type '{param_type}' used for parameter '{attr}' is "
            "not supported for templating").format(**locals())
        raise AirflowException(msg)
    return result{code}
 I propose that the method returns content unchanged if the content is of one of (int, float, complex, bool) types. So my solution would include an extra elif in the form:
{code}
elif isinstance(content, (int, float, complex, bool)):
    result = content
{code}
 Are there any reasons this would be a bad idea?

  was:
The render_template_from_field method of the BaseOperator class raises an exception when it encounters content that is not a string_type, list, tuple or dict.

Example exception:
{noformat}
airflow.exceptions.AirflowException: Type '<type 'int'>' used for parameter 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not supported for templating{noformat}
I propose instead that when it encounters content of other types it returns the content unchanged, rather than raising an exception.

Consider this case: I extended the EmrCreateJobFlowOperator to make the job_flow_overrides argument a templatable field. job_flow_overrides is a dictionary with a mix of strings, integers and booleans for values.

When I extended the class as such:
{code:java}
class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
    template_fields = ['job_flow_overrides']{code}
And added a task to my dag with this format:
{code:java}
step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
    task_id="create_cluster",
    job_flow_overrides={
        "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
        "Instances": {
            "InstanceGroups": [
                {
                    "Name": "Master nodes",
                    "InstanceType": "c3.4xlarge",
                    "InstanceCount": 1
                },
                {
                    "Name": "Slave nodes",
                    "InstanceType": "c3.4xlarge",
                    "InstanceCount": 4
                },
                "TerminationProtected": False
            ]
        },
        "BootstrapActions": [{
             "Name": "Custom action",
             "ScriptBootstrapAction": {
                 "Path": "s3://repo/{{ dag_run.conf['branch'] }}/requirements.txt"
             }
        }],
   },
   aws_conn_id='aws_default',
   emr_conn_id='aws_default',
   dag=dag
)
{code}
The exception I gave above was raised and the step failed. I think it would be preferable for the method to instead pass over numeric and boolean values as users may want to use template_fields in the way I have to template string values in dictionaries or lists of mixed types.

Here is the render_template_from_field method from the BaseOperator:
{code:java}
def render_template_from_field(self, attr, content, context, jinja_env):
    """
    Renders a template from a field. If the field is a string, it will
    simply render the string and return the result. If it is a collection or
    nested set of collections, it will traverse the structure and render
    all strings in it.
    """
    rt = self.render_template
    if isinstance(content, six.string_types):
        result = jinja_env.from_string(content).render(**context)
    elif isinstance(content, (list, tuple)):
        result = [rt(attr, e, context) for e in content]
    elif isinstance(content, dict):
        result = {
            k: rt("{}[{}]".format(attr, k), v, context)
            for k, v in list(content.items())}
    else:
        param_type = type(content)
        msg = (
            "Type '{param_type}' used for parameter '{attr}' is "
            "not supported for templating").format(**locals())
        raise AirflowException(msg)
    return result{code}
 I propose that the method returns content unchanged if the content is of one of (int, long, float, complex, bool) types. So my solution would include an extra elif in the form:
{code:python}
elif isinstance(content, (int, long, float, complex, bool)):
    result = content
{code}
 Are there any reasons this would be a bad idea?


> Handle non string types in render_template_from_field
> -----------------------------------------------------
>
>                 Key: AIRFLOW-2508
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2508
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: models
>    Affects Versions: Airflow 2.0
>            Reporter: Eugene Brown
>            Assignee: Eugene Brown
>            Priority: Minor
>              Labels: easyfix, newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The render_template_from_field method of the BaseOperator class raises an exception when it encounters content that is not a string_type, list, tuple or dict.
> Example exception:
> {noformat}
> airflow.exceptions.AirflowException: Type '<type 'int'>' used for parameter 'job_flow_overrides[Instances][InstanceGroups][InstanceCount]' is not supported for templating{noformat}
> I propose instead that when it encounters content of other types it returns the content unchanged, rather than raising an exception.
> Consider this case: I extended the EmrCreateJobFlowOperator to make the job_flow_overrides argument a templatable field. job_flow_overrides is a dictionary with a mix of strings, integers and booleans for values.
> When I extended the class as such:
> {code:java}
> class EmrCreateJobFlowOperatorTemplateOverrides(EmrCreateJobFlowOperator):
>     template_fields = ['job_flow_overrides']{code}
> And added a task to my dag with this format:
> {code:java}
> step_create_cluster = EmrCreateJobFlowOperatorTemplateOverrides(
>     task_id="create_cluster",
>     job_flow_overrides={
>         "Name": "my-cluster {{ dag_run.conf['run_date'] }}",
>         "Instances": {
>             "InstanceGroups": [
>                 {
>                     "Name": "Master nodes",
>                     "InstanceType": "c3.4xlarge",
>                     "InstanceCount": 1
>                 },
>                 {
>                     "Name": "Slave nodes",
>                     "InstanceType": "c3.4xlarge",
>                     "InstanceCount": 4
>                 },
>                 "TerminationProtected": False
>             ]
>         },
>         "BootstrapActions": [{
>              "Name": "Custom action",
>              "ScriptBootstrapAction": {
>                  "Path": "s3://repo/{{ dag_run.conf['branch'] }}/requirements.txt"
>              }
>         }],
>    },
>    aws_conn_id='aws_default',
>    emr_conn_id='aws_default',
>    dag=dag
> )
> {code}
> The exception I gave above was raised and the step failed. I think it would be preferable for the method to instead pass over numeric and boolean values as users may want to use template_fields in the way I have to template string values in dictionaries or lists of mixed types.
> Here is the render_template_from_field method from the BaseOperator:
> {code:java}
> def render_template_from_field(self, attr, content, context, jinja_env):
>     """
>     Renders a template from a field. If the field is a string, it will
>     simply render the string and return the result. If it is a collection or
>     nested set of collections, it will traverse the structure and render
>     all strings in it.
>     """
>     rt = self.render_template
>     if isinstance(content, six.string_types):
>         result = jinja_env.from_string(content).render(**context)
>     elif isinstance(content, (list, tuple)):
>         result = [rt(attr, e, context) for e in content]
>     elif isinstance(content, dict):
>         result = {
>             k: rt("{}[{}]".format(attr, k), v, context)
>             for k, v in list(content.items())}
>     else:
>         param_type = type(content)
>         msg = (
>             "Type '{param_type}' used for parameter '{attr}' is "
>             "not supported for templating").format(**locals())
>         raise AirflowException(msg)
>     return result{code}
>  I propose that the method returns content unchanged if the content is of one of (int, float, complex, bool) types. So my solution would include an extra elif in the form:
> {code}
> elif isinstance(content, (int, float, complex, bool)):
>     result = content
> {code}
>  Are there any reasons this would be a bad idea?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)