You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jiaxin Shan (Jira)" <ji...@apache.org> on 2020/08/21 20:24:00 UTC

[jira] [Created] (BEAM-10790) tar.gz artifacts written into fat jar is no longer gzip file

Jiaxin Shan created BEAM-10790:
----------------------------------

             Summary: tar.gz artifacts written into fat jar is no longer gzip file
                 Key: BEAM-10790
                 URL: https://issues.apache.org/jira/browse/BEAM-10790
             Project: Beam
          Issue Type: Bug
          Components: runner-flink
         Environment: Flink 1.10.1
Beam worker pool: apache/beam_python3.6_sdk:2.22.0
SDK: apache beam 2.22.0 
Python 3.6
            Reporter: Jiaxin Shan


I am using Flink Runner on Kubernetes. The problem I meet is zip file is kind of broken after going through the artifact server.  


 # application persist a dependency dist file tfx_ephemeral-0.22.0.tar.gz and pass `--extra_package=/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz` to beam args
 # artifact service will retrieve it from artifact server and append into uber jar. 

FROM: 'ref_Environment_default_environment_1', dependencies '[type_urn: "beam:artifact:type:file:v1"
27type_payload: "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz" 
TO: `/BEAM-PIPLINE/pipeline/artifacts/job-${uuid}/${hash}-tfx_ephemeral-0.22.0.tar.gz`
 # beam python worker pool will get dependency from uber jar and put it under /tmp/staged/ 



I originally find this problem in step 3. I tried to trace the problem and notice it is from step 2. the code snippet is here. 

[https://github.com/apache/beam/blob/3d0f7dc011bb7bbee4eea9525b82f53855de85f1/sdks/python/apache_beam/runners/portability/artifact_service.py#L306-L310]

 

Logs from Application

```
INFO:root:Using Python SDK docker image: apache/beam_python3.6_sdk:2.22.0. If the image is not available at local, we will try to pull from hub.docker.com
19INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x7fe51595dea0> ====================
20INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol scheme to flink_master parameter: [http://beam-flink-cluster-jobmanager:8081|http://beam-flink-cluster-jobmanager:8081/]
21INFO:apache_beam.runners.portability.abstract_job_service:Got Prepare request.
22INFO:apache_beam.utils.subprocess_server:Downloading job server jar from [https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.10-job-server/2.22.0/beam-runners-flink-1.10-job-server-2.22.0.jar]
23INFO:apache_beam.runners.portability.abstract_job_service:Artifact server started on port 44337
24INFO:apache_beam.runners.portability.abstract_job_service:Prepared job 'job' as 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
25INFO:apache_beam.runners.portability.artifact_service:staging token: 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
26INFO:apache_beam.runners.portability.artifact_service:artifact service key: 'ref_Environment_default_environment_1', dependencies '[type_urn: "beam:artifact:type:file:v1"
27type_payload: "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz"
 
INFO:apache_beam.runners.portability.abstract_job_service:Running job 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
51INFO:apache_beam.runners.portability.flink_uber_jar_job_server:Started Flink job as bedeef0503bf0a17e3461169e2c9b5bc
52INFO:apache_beam.runners.portability.portable_runner:Job state changed to STOPPED
53INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
```

The job never finish because exception in beam worker pool

 

Logs from beam worker pool. 

```

2020/08/21 18:29:46 Installing extra package: tfx_ephemeral-0.22.0.tar.gz
ERROR: Exception:
Traceback (most recent call last):
 File "/usr/local/lib/python3.6/tarfile.py", line 1643, in gzopen
 t = cls.taropen(name, mode, fileobj, **kwargs)
 File "/usr/local/lib/python3.6/tarfile.py", line 1619, in taropen
 return cls(name, mode, fileobj, **kwargs)
 File "/usr/local/lib/python3.6/tarfile.py", line 1482, in __init__
 self.firstmember = self.next()
 File "/usr/local/lib/python3.6/tarfile.py", line 2297, in next
 tarinfo = self.tarinfo.fromtarfile(self)
 File "/usr/local/lib/python3.6/tarfile.py", line 1092, in fromtarfile
 buf = tarfile.fileobj.read(BLOCKSIZE)
 File "/usr/local/lib/python3.6/gzip.py", line 276, in read
 return self._buffer.read(size)
 File "/usr/local/lib/python3.6/_compression.py", line 68, in readinto
 data = self.read(len(byte_view))
 File "/usr/local/lib/python3.6/gzip.py", line 463, in read
 if not self._read_gzip_header():
 File "/usr/local/lib/python3.6/gzip.py", line 411, in _read_gzip_header
 raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'tf')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
 status = self.run(options, args)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
 return func(self, options, args)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 333, in run
 reqs, check_supported_wheels=not options.target_dir
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 179, in resolve
 discovered_reqs.extend(self._resolve_one(requirement_set, req))
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 362, in _resolve_one
 abstract_dist = self._get_abstract_dist_for(req_to_install)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 314, in _get_abstract_dist_for
 abstract_dist = self.preparer.prepare_linked_requirement(req)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 469, in prepare_linked_requirement
 hashes=hashes,
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 264, in unpack_url
 unpack_file(file.path, location, file.content_type)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 261, in unpack_file
 untar_file(filename, location)
 File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 177, in untar_file
 tar = tarfile.open(filename, mode)
 File "/usr/local/lib/python3.6/tarfile.py", line 1589, in open
 return func(name, filemode, fileobj, **kwargs)
 File "/usr/local/lib/python3.6/tarfile.py", line 1647, in gzopen
 raise ReadError("not a gzip file")

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)