You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2022/03/20 17:26:00 UTC

[jira] [Updated] (BEAM-10790) tar.gz artifacts written into fat jar is no longer gzip file

     [ https://issues.apache.org/jira/browse/BEAM-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-10790:
---------------------------------
    Labels: stale-P2  (was: )

> tar.gz artifacts written into fat jar is no longer gzip file
> ------------------------------------------------------------
>
>                 Key: BEAM-10790
>                 URL: https://issues.apache.org/jira/browse/BEAM-10790
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>         Environment: Flink 1.10.1
> Beam worker pool: apache/beam_python3.6_sdk:2.22.0
> SDK: apache beam 2.22.0 
> Python 3.6
>            Reporter: Jiaxin Shan
>            Priority: P2
>              Labels: stale-P2
>
> I am using Flink Runner on Kubernetes. The problem I meet is zip file is kind of broken after going through the artifact server.  
>  # application persist a dependency dist file tfx_ephemeral-0.22.0.tar.gz and pass `--extra_package=/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz` to beam args
>  # artifact service will retrieve it from artifact server and append into uber jar. 
>  FROM: 'ref_Environment_default_environment_1', dependencies '[type_urn: "beam:artifact:type:[file:v1|file:///v1]"
>  27type_payload: "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz" 
>  TO: `/BEAM-PIPLINE/pipeline/artifacts/job-${uuid}/${hash}-tfx_ephemeral-0.22.0.tar.gz`
>  # beam python worker pool will get dependency from uber jar and put it under /tmp/staged/ 
> I originally find this problem in step 3. I tried to trace the problem and notice it is from step 2. the code snippet is here. 
> [https://github.com/apache/beam/blob/3d0f7dc011bb7bbee4eea9525b82f53855de85f1/sdks/python/apache_beam/runners/portability/artifact_service.py#L306-L310]
>  
> Logs from Application
> {code:java}
> INFO:root:Using Python SDK docker image: apache/beam_python3.6_sdk:2.22.0. If the image is not available at local, we will try to pull from hub.docker.com
>  19INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x7fe51595dea0> ====================
>  20INFO:apache_beam.runners.portability.flink_runner:Adding HTTP protocol scheme to flink_master parameter: http://beam-flink-cluster-jobmanager:8081
>  21INFO:apache_beam.runners.portability.abstract_job_service:Got Prepare request.
>  22INFO:apache_beam.utils.subprocess_server:Downloading job server jar from https://repo.maven.apache.org/maven2/org/apache/beam/beam-runners-flink-1.10-job-server/2.22.0/beam-runners-flink-1.10-job-server-2.22.0.jar
>  23INFO:apache_beam.runners.portability.abstract_job_service:Artifact server started on port 44337
>  24INFO:apache_beam.runners.portability.abstract_job_service:Prepared job 'job' as 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  25INFO:apache_beam.runners.portability.artifact_service:staging token: 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  26INFO:apache_beam.runners.portability.artifact_service:artifact service key: 'ref_Environment_default_environment_1', dependencies '[type_urn: "beam:artifact:type:file:v1"
>  27type_payload: "\n;/tmp/tmpsm0_ll8e/build/tfx/dist/tfx_ephemeral-0.22.0.tar.gz"
>   
>  INFO:apache_beam.runners.portability.abstract_job_service:Running job 'job-880904cc-ec5e-490a-ba93-8e18d87fbd89'
>  51INFO:apache_beam.runners.portability.flink_uber_jar_job_server:Started Flink job as bedeef0503bf0a17e3461169e2c9b5bc
>  52INFO:apache_beam.runners.portability.portable_runner:Job state changed to STOPPED
>  53INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
> {code}
> The job never finish because exception in beam worker pool
>  
> Logs from beam worker pool. 
>  
> {code:java}
> 2020/08/21 18:29:46 Installing extra package: tfx_ephemeral-0.22.0.tar.gz
>  ERROR: Exception:
>  Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/tarfile.py", line 1643, in gzopen
>  t = cls.taropen(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1619, in taropen
>  return cls(name, mode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1482, in _init_
>  self.firstmember = self.next()
>  File "/usr/local/lib/python3.6/tarfile.py", line 2297, in next
>  tarinfo = self.tarinfo.fromtarfile(self)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1092, in fromtarfile
>  buf = tarfile.fileobj.read(BLOCKSIZE)
>  File "/usr/local/lib/python3.6/gzip.py", line 276, in read
>  return self._buffer.read(size)
>  File "/usr/local/lib/python3.6/_compression.py", line 68, in readinto
>  data = self.read(len(byte_view))
>  File "/usr/local/lib/python3.6/gzip.py", line 463, in read
>  if not self._read_gzip_header():
>  File "/usr/local/lib/python3.6/gzip.py", line 411, in _read_gzip_header
>  raise OSError('Not a gzipped file (%r)' % magic)
>  OSError: Not a gzipped file (b'tf')
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 188, in _main
>  status = self.run(options, args)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 185, in wrapper
>  return func(self, options, args)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 333, in run
>  reqs, check_supported_wheels=not options.target_dir
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 179, in resolve
>  discovered_reqs.extend(self._resolve_one(requirement_set, req))
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 362, in _resolve_one
>  abstract_dist = self._get_abstract_dist_for(req_to_install)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/resolution/legacy/resolver.py", line 314, in _get_abstract_dist_for
>  abstract_dist = self.preparer.prepare_linked_requirement(req)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 469, in prepare_linked_requirement
>  hashes=hashes,
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/operations/prepare.py", line 264, in unpack_url
>  unpack_file(file.path, location, file.content_type)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 261, in unpack_file
>  untar_file(filename, location)
>  File "/usr/local/lib/python3.6/site-packages/pip/_internal/utils/unpacking.py", line 177, in untar_file
>  tar = tarfile.open(filename, mode)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1589, in open
>  return func(name, filemode, fileobj, **kwargs)
>  File "/usr/local/lib/python3.6/tarfile.py", line 1647, in gzopen
>  raise ReadError("not a gzip file")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)