You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Bradshaw (Jira)" <ji...@apache.org> on 2021/12/13 23:58:00 UTC

[jira] [Created] (BEAM-13454) Dataframe read_fwf fails reading incrementally.

Robert Bradshaw created BEAM-13454:
--------------------------------------

             Summary: Dataframe read_fwf fails reading incrementally.
                 Key: BEAM-13454
                 URL: https://issues.apache.org/jira/browse/BEAM-13454
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-core
            Reporter: Robert Bradshaw
            Assignee: Robert Bradshaw


When trying to use beam.dataframe.io.read_fwf one gets the error.


{code:python}
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 1206, in process_with_sized_restriction
    return self.do_fn_invoker.invoke_process(
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 698, in invoke_process
    residual = self._invoke_process_per_window(
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 836, in _invoke_process_per_window
    self.output_processor.process_outputs(
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/common.py", line 1334, in process_outputs
    for result in results:
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/dataframe/io.py", line 545, in process
    frames = reader(handle, *self.args, **self.kwargs)
  File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 848, in read_fwf
    return _read(filepath_or_buffer, kwds)
  File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 942, in __init__
    self.engine = self._check_file_or_buffer(f, engine)
  File "/Users/robertwb/Work/beam/venv-3.8/lib/python3.8/site-packages/pandas/io/parsers.py", line 1003, in _check_file_or_buffer
    raise ValueError(msg)
ValueError: The 'python' engine cannot iterate through this file buffer.
{code}


Looks like pandas is expecting the file handle to be (line) iterable as well as supporting read().



--
This message was sent by Atlassian Jira
(v8.20.1#820001)