You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Udi Meiri (JIRA)" <ji...@apache.org> on 2018/03/30 00:30:00 UTC

[jira] [Commented] (BEAM-3965) HDFS read broken

    [ https://issues.apache.org/jira/browse/BEAM-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420007#comment-16420007 ] 

Udi Meiri commented on BEAM-3965:
---------------------------------

Our HDFS integration test was writing but not reading from HDFS, so reading from HDFS should be added to the integration test.

> HDFS read broken
> ----------------
>
>                 Key: BEAM-3965
>                 URL: https://issues.apache.org/jira/browse/BEAM-3965
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Udi Meiri
>            Assignee: Udi Meiri
>            Priority: Major
>
> When running a command like:
> {noformat}
> python setup.py sdist > /dev/null && python -m apache_beam.examples.wordcount --output gs://.../py-wordcount-output \
>   --hdfs_host ... --hdfs_port 50070 --hdfs_user ehudm --runner DataflowRunner --project ... \
>   --temp_location gs://.../temp-hdfs-int --staging_location gs://.../staging-hdfs-int \
>   --sdk_location dist/apache-beam-2.5.0.dev0.tar.gz --input hdfs://kinglear.txt
> {noformat}
> I get:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
>     "__main__", fname, loader, pkg_name)
>   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
>     exec code in run_globals
>   File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 136, in <module>
>     run()
>   File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 90, in run
>     lines = p | 'read' >> ReadFromText(known_args.input)
>   File "apache_beam/io/textio.py", line 522, in __init__
>     skip_header_lines=skip_header_lines)
>   File "apache_beam/io/textio.py", line 117, in __init__
>     validate=validate)
>   File "apache_beam/io/filebasedsource.py", line 119, in __init__
>     self._validate()
>   File "apache_beam/options/value_provider.py", line 124, in _f
>     return fnc(self, *args, **kwargs)
>   File "apache_beam/io/filebasedsource.py", line 176, in _validate
>     match_result = FileSystems.match([pattern], limits=[1])[0]
>   File "apache_beam/io/filesystems.py", line 159, in match
>     return filesystem.match(patterns, limits)
>   File "apache_beam/io/hadoopfilesystem.py", line 221, in match
>     raise BeamIOError('Match operation failed', exceptions)
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'hdfs://kinglear.txt': KeyError('name',)}
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)