You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Shashank Prabhakara <sh...@infoworks.io> on 2018/01/08 09:36:06 UTC
Need help running beam word count example on apex/hdfs
Hi All,
I want to test beam on apex using the word count example provided in the
beam repository, but I'm facing some difficulties while executing word
count as described in the documentation.
I'm running hadoop version 2.8.2 on debian in a multi-node environment.
I cloned the beam github repository - master branch and executed:
cd examples/java
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
-Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
--runner=ApexRunner --embeddedExecution=false" -Papex-runner
However the driver hangs (waited for > 1hr) after printing the classpath on
the console. I have attached the stdout and stacktrace to this (Pls let me
know if not visible in ML).
Thanks in advance for any help.
Regards,
Shashank
Re: Need help running beam word count example on apex/hdfs
Posted by Shashank Prabhakara <sh...@infoworks.io>.
For anyone who faces the same issue, I was not able to make work with "mvn
compile exec:java ...". Instead, I ran with "hadoop jar ..." command which
magically fixed this. Best guess is that maven is picking up incompatible
version of commons-io from the wrong side of dependency tree.
Regards,
Shashank
On Mon, Jan 8, 2018 at 4:07 PM, Shashank Prabhakara <sh...@infoworks.io>
wrote:
> Forgot to mention:
>
> Execution works in embedded mode and counts are created on the local fs. I
> need this to run on hdfs/yarn with --embeddedExecution=false.
>
> Regards,
> Shashank
>
> On Mon, Jan 8, 2018 at 3:06 PM, Shashank Prabhakara <shashank@infoworks.io
> > wrote:
>
>> Hi All,
>>
>> I want to test beam on apex using the word count example provided in the
>> beam repository, but I'm facing some difficulties while executing word
>> count as described in the documentation.
>>
>> I'm running hadoop version 2.8.2 on debian in a multi-node environment.
>> I cloned the beam github repository - master branch and executed:
>>
>> cd examples/java
>> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
>> -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
>> --runner=ApexRunner --embeddedExecution=false" -Papex-runner
>>
>> However the driver hangs (waited for > 1hr) after printing the classpath
>> on the console. I have attached the stdout and stacktrace to this (Pls let
>> me know if not visible in ML).
>>
>> Thanks in advance for any help.
>>
>> Regards,
>> Shashank
>>
>
>
Re: Need help running beam word count example on apex/hdfs
Posted by Shashank Prabhakara <sh...@infoworks.io>.
Forgot to mention:
Execution works in embedded mode and counts are created on the local fs. I
need this to run on hdfs/yarn with --embeddedExecution=false.
Regards,
Shashank
On Mon, Jan 8, 2018 at 3:06 PM, Shashank Prabhakara <sh...@infoworks.io>
wrote:
> Hi All,
>
> I want to test beam on apex using the word count example provided in the
> beam repository, but I'm facing some difficulties while executing word
> count as described in the documentation.
>
> I'm running hadoop version 2.8.2 on debian in a multi-node environment.
> I cloned the beam github repository - master branch and executed:
>
> cd examples/java
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
> -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
> --runner=ApexRunner --embeddedExecution=false" -Papex-runner
>
> However the driver hangs (waited for > 1hr) after printing the classpath
> on the console. I have attached the stdout and stacktrace to this (Pls let
> me know if not visible in ML).
>
> Thanks in advance for any help.
>
> Regards,
> Shashank
>