You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Aljoscha Krettek (JIRA)" <ji...@apache.org> on 2017/08/28 14:49:00 UTC

[jira] [Commented] (BEAM-2457) Error: "Unable to find registrar for hdfs" - need to prevent/improve error message

    [ https://issues.apache.org/jira/browse/BEAM-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143846#comment-16143846 ] 

Aljoscha Krettek commented on BEAM-2457:
----------------------------------------

Is there any update on this? I have a jar file that I build from the quickstart that exhibits this problem. This is the output I get:
{code}
$ java -cp word-count-beam-0.1-DIRECT.jar org.apache.beam.examples.WordCount --runner=DirectRunner  --inputFile=hdfs:///tmp/wc-in  --output=hdfs:///tmp/wc-out
Exception in thread "main" java.lang.IllegalStateException: Unable to find registrar for hdfs
	at org.apache.beam.sdk.io.FileSystems.getFileSystemInternal(FileSystems.java:447)
	at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:517)
	at org.apache.beam.sdk.io.FileBasedSink.convertToFileResourceIfPossible(FileBasedSink.java:204)
	at org.apache.beam.sdk.io.TextIO$Write.to(TextIO.java:296)
	at org.apache.beam.examples.WordCount.main(WordCount.java:182)
{code}

This is with Beam 2.1.0, the project was created from the Beam 2.1.0 examples archetype. I also tried this with the Flink Runner before and get the same results.

On Beam 2.2.0-SNAPSHOT I get this instead:
{code}
$ java -cp word-count-beam-22-0.1-DIRECT.jar org.apache.beam.examples.WordCount --runner=DirectRunner  --inputFile=hdfs:///tmp/wc-in  --output=hdfs:///tmp/wc-out
Aug 28, 2017 2:46:34 PM org.apache.beam.sdk.io.FileBasedSource getEstimatedSizeBytes
INFO: Filepattern hdfs:///tmp/wc-in matched 0 files with total size 0
Aug 28, 2017 2:46:34 PM org.apache.beam.sdk.io.FileBasedSource split
INFO: Splitting filepattern hdfs:///tmp/wc-in into bundles of size 0 took 0 ms and produced 0 files and 0 bundles
Aug 28, 2017 2:46:35 PM org.apache.beam.sdk.io.WriteFiles finalizeForDestinationFillEmptyShards
INFO: Finalizing write operation TextWriteOperation{tempDirectory=/home/hadoop/hdfs:/tmp/.temp-beam-2017-08-240_14-46-34-1/, windowedWrites=false} for destination null num shards 0.
Aug 28, 2017 2:46:35 PM org.apache.beam.sdk.io.WriteFiles finalizeForDestinationFillEmptyShards
INFO: Creating 1 empty output shards in addition to 0 written for a total of 1 for destination null.
{code}

i.e. it's writing to my local filesystem under the path {{hdfs:}}.

> Error: "Unable to find registrar for hdfs" - need to prevent/improve error message
> ----------------------------------------------------------------------------------
>
>                 Key: BEAM-2457
>                 URL: https://issues.apache.org/jira/browse/BEAM-2457
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: 2.0.0
>            Reporter: Stephen Sisk
>            Assignee: Flavio Fiszman
>
> I've noticed a number of user reports where jobs are failing with the error message "Unable to find registrar for hdfs": 
> * https://stackoverflow.com/questions/44497662/apache-beamunable-to-find-registrar-for-hdfs/44508533?noredirect=1#comment76026835_44508533
> * https://lists.apache.org/thread.html/144c384e54a141646fcbe854226bb3668da091c5dc7fa2d471626e9b@%3Cuser.beam.apache.org%3E
> * https://lists.apache.org/thread.html/e4d5ac744367f9d036a1f776bba31b9c4fe377d8f11a4b530be9f829@%3Cuser.beam.apache.org%3E 
> This isn't too many reports, but it is the only time I can recall so many users reporting the same error message in a such a short amount of time. 
> We believe the problem is one of two things: 
> 1) bad uber jar creation
> 2) incorrect HDFS configuration
> However, it's highly possible this could have some other root cause. 
> It seems like it'd be useful to:
> 1) Follow up with the above reports to see if they've resolved the issue, and if so what fixed it. There may be another root cause out there.
> 2) Improve the error message to include more information about how to resolve it
> 3) See if we can improve detection of the error cases to give more specific information (specifically, if HDFS is miconfigured, can we detect that somehow and tell the user exactly that?)
> 4) update documentation



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)