You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/24 18:19:00 UTC

[jira] [Work logged] (BEAM-6154) Gcsio batch delete broken in Python 3

     [ https://issues.apache.org/jira/browse/BEAM-6154?focusedWorklogId=189600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-189600 ]

ASF GitHub Bot logged work on BEAM-6154:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jan/19 18:18
            Start Date: 24/Jan/19 18:18
    Worklog Time Spent: 10m 
      Work Description: markflyhigh commented on pull request #7617: [BEAM-6154] Update google-apitools to 0.5.26 and fix gcsio in python 3
URL: https://github.com/apache/beam/pull/7617
 
 
   google-apitools 0.5.26 contains a critical python 3 fix that help to unblock DataflowRunner in Python 3. The problem is described in https://issues.apache.org/jira/browse/BEAM-6154. This PR contains fix to the problem as well as upgrade google-apitools.
   
   **Note: this fix touches `base_image_requirements.txt` which is used to build Python sdk harness container image.**
   
   ------------------------
   
   Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it.
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | ---
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) </br> [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | --- | --- | --- | ---
   
   
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 189600)
            Time Spent: 10m
    Remaining Estimate: 0h

> Gcsio batch delete broken in Python 3
> -------------------------------------
>
>                 Key: BEAM-6154
>                 URL: https://issues.apache.org/jira/browse/BEAM-6154
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio error while deleting files:
> {code}
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", line 1077, in <genexpr>
>     window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 315, in finalize_write
>     num_threads)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", line 145, in run_using_threadpool
>     return pool.map(fn_to_execute, inputs)
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map
>     return self._map_async(func, iterable, mapstar, chunksize).get()
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get
>     raise self._value
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker
>     result = (True, func(*args, **kwds))
>   File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
>     return list(map(*args))
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 299, in _rename_batch
>     FileSystems.rename(source_files, destination_files)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 252, in rename
>     return filesystem.rename(source_file_names, destination_file_names)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 229, in rename
>     copy_statuses = gcsio.GcsIO().copy_batch(batch)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", line 322, in copy_batch
>     api_calls = batch_request.Execute(self.client._http)  # pylint: disable=protected-access
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 222, in Execute
>     batch_http_request.Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 480, in Execute
>     self._Execute(http)
>   File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 450, in _Execute
>     mime_response = parser.parsestr(header + response.content)
> TypeError: Can't convert 'bytes' object to str implicitly
> {code} 
> After looking into related code in apitools library, I found response.content that's returned via http request to gcs is bytes and apitools didn't handle this scenario. This can be a blocker to any pipeline depending on gcsio and apparently blocks all Dataflow job in Python 3.
> This could be another case that moving off apitools dependency in [BEAM-4850|https://issues.apache.org/jira/browse/BEAM-4850].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)