You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Udi Meiri (JIRA)" <ji...@apache.org> on 2018/03/30 00:30:00 UTC
[jira] [Commented] (BEAM-3965) HDFS read broken
[ https://issues.apache.org/jira/browse/BEAM-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420007#comment-16420007 ]
Udi Meiri commented on BEAM-3965:
---------------------------------
Our HDFS integration test was writing but not reading from HDFS, so reading from HDFS should be added to the integration test.
> HDFS read broken
> ----------------
>
> Key: BEAM-3965
> URL: https://issues.apache.org/jira/browse/BEAM-3965
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Udi Meiri
> Assignee: Udi Meiri
> Priority: Major
>
> When running a command like:
> {noformat}
> python setup.py sdist > /dev/null && python -m apache_beam.examples.wordcount --output gs://.../py-wordcount-output \
> --hdfs_host ... --hdfs_port 50070 --hdfs_user ehudm --runner DataflowRunner --project ... \
> --temp_location gs://.../temp-hdfs-int --staging_location gs://.../staging-hdfs-int \
> --sdk_location dist/apache-beam-2.5.0.dev0.tar.gz --input hdfs://kinglear.txt
> {noformat}
> I get:
> {noformat}
> Traceback (most recent call last):
> File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
> File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
> File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 136, in <module>
> run()
> File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 90, in run
> lines = p | 'read' >> ReadFromText(known_args.input)
> File "apache_beam/io/textio.py", line 522, in __init__
> skip_header_lines=skip_header_lines)
> File "apache_beam/io/textio.py", line 117, in __init__
> validate=validate)
> File "apache_beam/io/filebasedsource.py", line 119, in __init__
> self._validate()
> File "apache_beam/options/value_provider.py", line 124, in _f
> return fnc(self, *args, **kwargs)
> File "apache_beam/io/filebasedsource.py", line 176, in _validate
> match_result = FileSystems.match([pattern], limits=[1])[0]
> File "apache_beam/io/filesystems.py", line 159, in match
> return filesystem.match(patterns, limits)
> File "apache_beam/io/hadoopfilesystem.py", line 221, in match
> raise BeamIOError('Match operation failed', exceptions)
> apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'hdfs://kinglear.txt': KeyError('name',)}
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)