You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Shashank Prabhakara <sh...@infoworks.io> on 2018/01/08 09:36:06 UTC

Need help running beam word count example on apex/hdfs

Hi All,

I want to test beam on apex using the word count example provided in the
beam repository, but I'm facing some difficulties while executing word
count as described in the documentation.

I'm running hadoop version 2.8.2 on debian in a multi-node environment.
I cloned the beam github repository - master branch and executed:

cd examples/java
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
-Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
--runner=ApexRunner --embeddedExecution=false" -Papex-runner

However the driver hangs (waited for > 1hr) after printing the classpath on
the console. I have attached the stdout and stacktrace to this (Pls let me
know if not visible in ML).

Thanks in advance for any help.

Regards,
Shashank

Re: Need help running beam word count example on apex/hdfs

Posted by Shashank Prabhakara <sh...@infoworks.io>.
For anyone who faces the same issue, I was not able to make work with "mvn
compile exec:java ...". Instead, I ran with "hadoop jar ..." command which
magically fixed this. Best guess is that maven is picking up incompatible
version of commons-io from the wrong side of dependency tree.

Regards,
Shashank

On Mon, Jan 8, 2018 at 4:07 PM, Shashank Prabhakara <sh...@infoworks.io>
wrote:

> Forgot to mention:
>
> Execution works in embedded mode and counts are created on the local fs. I
> need this to run on hdfs/yarn with --embeddedExecution=false.
>
> Regards,
> Shashank
>
> On Mon, Jan 8, 2018 at 3:06 PM, Shashank Prabhakara <shashank@infoworks.io
> > wrote:
>
>> Hi All,
>>
>> I want to test beam on apex using the word count example provided in the
>> beam repository, but I'm facing some difficulties while executing word
>> count as described in the documentation.
>>
>> I'm running hadoop version 2.8.2 on debian in a multi-node environment.
>> I cloned the beam github repository - master branch and executed:
>>
>> cd examples/java
>> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
>> -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
>> --runner=ApexRunner --embeddedExecution=false" -Papex-runner
>>
>> However the driver hangs (waited for > 1hr) after printing the classpath
>> on the console. I have attached the stdout and stacktrace to this (Pls let
>> me know if not visible in ML).
>>
>> Thanks in advance for any help.
>>
>> Regards,
>> Shashank
>>
>
>

Re: Need help running beam word count example on apex/hdfs

Posted by Shashank Prabhakara <sh...@infoworks.io>.
Forgot to mention:

Execution works in embedded mode and counts are created on the local fs. I
need this to run on hdfs/yarn with --embeddedExecution=false.

Regards,
Shashank

On Mon, Jan 8, 2018 at 3:06 PM, Shashank Prabhakara <sh...@infoworks.io>
wrote:

> Hi All,
>
> I want to test beam on apex using the word count example provided in the
> beam repository, but I'm facing some difficulties while executing word
> count as described in the documentation.
>
> I'm running hadoop version 2.8.2 on debian in a multi-node environment.
> I cloned the beam github repository - master branch and executed:
>
> cd examples/java
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount
> -Dexec.args="--inputFile=/tmp/input/pom.xml --output=/tmp/output/counts
> --runner=ApexRunner --embeddedExecution=false" -Papex-runner
>
> However the driver hangs (waited for > 1hr) after printing the classpath
> on the console. I have attached the stdout and stacktrace to this (Pls let
> me know if not visible in ML).
>
> Thanks in advance for any help.
>
> Regards,
> Shashank
>