You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Bradshaw (Jira)" <ji...@apache.org> on 2021/12/13 23:58:00 UTC
[jira] [Created] (BEAM-13454) Dataframe read_fwf fails reading incrementally.
Robert Bradshaw created BEAM-13454:
--------------------------------------
Summary: Dataframe read_fwf fails reading incrementally.
Key: BEAM-13454
URL: https://issues.apache.org/jira/browse/BEAM-13454
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Reporter: Robert Bradshaw
Assignee: Robert Bradshaw
When trying to use beam.dataframe.io.read_fwf one gets the error.
{code:python}
File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 1206, in process_with_sized_restriction
return self.do_fn_invoker.invoke_process(
File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 698, in invoke_process
residual = self._invoke_process_per_window(
File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 836, in _invoke_process_per_window
self.output_processor.process_outputs(
File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 1334, in process_outputs
for result in results:
File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/io.py", line 545, in process
frames = reader(handle, *self.args, **self.kwargs)
File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 848, in read_fwf
return _read(filepath_or_buffer, kwds)
File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 942, in __init__
self.engine = self._check_file_or_buffer(f, engine)
File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 1003, in _check_file_or_buffer
raise ValueError(msg)
ValueError: The 'python' engine cannot iterate through this file buffer.
{code}
Looks like pandas is expecting the file handle to be (line) iterable as well as supporting read().
--
This message was sent by Atlassian Jira
(v8.20.1#820001)