You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Jeffrey Payne (JIRA)" <ji...@apache.org> on 2018/09/04 16:19:00 UTC

[jira] [Updated] (AIRFLOW-3002) ValueError in dataflow operators when using GCS jar or py_file

     [ https://issues.apache.org/jira/browse/AIRFLOW-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Payne updated AIRFLOW-3002:
-----------------------------------
    Description: 
The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with a ValueError, with:
{noformat}
...
file_size = self._gcs_hook.download(bucket_id, object_id, local_file)

if os.stat(file_size).st_size > 0:
    return local_file
...
{noformat}
The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed in is actually the downloaded bytes from {{GoogleCloudStorageHook.download()}}.

The error is like:
{noformat}
[2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export   File "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", line 372, in google_cloud_to_local
[2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export     if os.stat(file_size).st_size > 0:
[2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export ValueError: stat: embedded null character in path
{noformat}


  was:
The {{GoogleCloudBucketHelper.google_cloud_to_local}} function attempts to compare a list to an int, resulting in the TypeError, with:
{noformat}
...
path_components = file_name[self.GCS_PREFIX_LENGTH:].split('/')
if path_components < 2:
...
{noformat}
This should be {{if len(path_components) < 2:}}.

Also, fix {{if file_size > 0:}} in same function...


> ValueError in dataflow operators when using GCS jar or py_file
> --------------------------------------------------------------
>
>                 Key: AIRFLOW-3002
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3002
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: contrib, Dataflow
>    Affects Versions: 1.9.0, 2.0.0
>            Reporter: Jeffrey Payne
>            Assignee: Kaxil Naik
>            Priority: Major
>             Fix For: 1.10.1
>
>
> The {{GoogleCloudBucketHelper.google_cloud_to_local}} function now fails with a ValueError, with:
> {noformat}
> ...
> file_size = self._gcs_hook.download(bucket_id, object_id, local_file)
> if os.stat(file_size).st_size > 0:
>     return local_file
> ...
> {noformat}
> The {{os.stat()}} function takes a _path_, but the {{file_size}} var passed in is actually the downloaded bytes from {{GoogleCloudStorageHook.download()}}.
> The error is like:
> {noformat}
> [2018-09-04 14:46:49,840] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export   File "/opt/conda/envs/bairflow-gke/lib/python3.5/site-packages/airflow/contrib/operators/dataflow_operator.py", line 372, in google_cloud_to_local
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export     if os.stat(file_size).st_size > 0:
> [2018-09-04 14:46:49,841] {base_task_runner.py:107} INFO - Job 59: Subtask surge_export ValueError: stat: embedded null character in path
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)