You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2014/09/19 20:21:20 UTC

RDD pipe example. Is this a bug or a feature?

Hi

I am wrote a little java job to try and figure out how RDD pipe works.
Bellow is my test shell script. If in the script I turn on debugging I get
output. In my console. If debugging is turned off in the shell script, I do
not see anything in my console. Is this a bug or feature?

I am running the job locally on a Mac

Thanks

Andy


Here is my Java

        rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();



#!/bin/sh 



#

# Use this shell script to figure out how spark RDD pipe() works

#



set -x # turns shell debugging on

#set +x # turns shell debugging off



while read x ; 

do 

echo RDDPipe.sh $x ;

Done



Here is the output if debugging is turned on

$ !grep

grep RDDPipe run.sh.out

+ echo RDDPipe.sh 0

+ echo RDDPipe.sh 0

+ echo RDDPipe.sh 2

+ echo RDDPipe.sh 0

+ echo RDDPipe.sh 3

+ echo RDDPipe.sh 0

+ echo RDDPipe.sh 0

$ 



Re: RDD pipe example. Is this a bug or a feature?

Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Your proposed use of rdd.pipe("foo") to communicate with an external
process seems fine. The "foo" program should read its input from
stdin, perform its computations, and write its results back to stdout.
Note that "foo" will be run on the workers, invoked once per
partition, and the result will be an RDD[String] containing an entry
for each line of output from your program.

-Jey

On Fri, Sep 19, 2014 at 3:59 PM, Andy Davidson
<An...@santacruzintegration.com> wrote:
> Hi Jey
>
> Many thanks for the code example. Here is what I really want to do. I want
> to use Spark Stream and python. Unfortunately pySpark does not support
> streams yet. It was suggested the way to work around this was to use an RDD
> pipe. The example bellow was a little experiment.
>
> You can think of my system as following the standard unix shell script pipe
> design
>
> Stream of data -> spark -> down stream system not implemented in spark
>
> After seeing your example code I now understand how the stdin and stdout get
> configured.
>
> It seem like pipe() does not work the way I want. I guess I could open a
> socket and write to the down stream process.
>
> Any suggestions would be greatly appreciated
>
> Thanks Andy
>
> From: Jey Kottalam <je...@cs.berkeley.edu>
> Reply-To: <je...@cs.berkeley.edu>
> Date: Friday, September 19, 2014 at 12:35 PM
> To: Andrew Davidson <An...@SantaCruzIntegration.com>
> Cc: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Re: RDD pipe example. Is this a bug or a feature?
>
> Hi Andy,
>
> That's a feature -- you'll have to print out the return value from
> collect() if you want the contents to show up on stdout.
>
> Probably something like this:
>
> for(Iterator<String> iter = rdd.pipe(pwd +
> "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
>    System.out.println(iter.next());
>
>
> Hope that helps,
> -Jey
>
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> <An...@santacruzintegration.com> wrote:
>
> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
> Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
>          rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: RDD pipe example. Is this a bug or a feature?

Posted by Andy Davidson <An...@SantaCruzIntegration.com>.
Hi Jey

Many thanks for the code example. Here is what I really want to do. I want
to use Spark Stream and python. Unfortunately pySpark does not support
streams yet. It was suggested the way to work around this was to use an RDD
pipe. The example bellow was a little experiment.

You can think of my system as following the standard unix shell script pipe
design

Stream of data -> spark -> down stream system not implemented in spark

After seeing your example code I now understand how the stdin and stdout get
configured. 

It seem like pipe() does not work the way I want. I guess I could open a
socket and write to the down stream process.

Any suggestions would be greatly appreciated

Thanks Andy 

From:  Jey Kottalam <je...@cs.berkeley.edu>
Reply-To:  <je...@cs.berkeley.edu>
Date:  Friday, September 19, 2014 at 12:35 PM
To:  Andrew Davidson <An...@SantaCruzIntegration.com>
Cc:  "user@spark.apache.org" <us...@spark.apache.org>
Subject:  Re: RDD pipe example. Is this a bug or a feature?

> Hi Andy,
> 
> That's a feature -- you'll have to print out the return value from
> collect() if you want the contents to show up on stdout.
> 
> Probably something like this:
> 
> for(Iterator<String> iter = rdd.pipe(pwd +
> "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
>    System.out.println(iter.next());
> 
> 
> Hope that helps,
> -Jey
> 
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> <An...@santacruzintegration.com> wrote:
>>  Hi
>> 
>>  I am wrote a little java job to try and figure out how RDD pipe works.
>>  Bellow is my test shell script. If in the script I turn on debugging I get
>>  output. In my console. If debugging is turned off in the shell script, I do
>>  not see anything in my console. Is this a bug or feature?
>> 
>>  I am running the job locally on a Mac
>> 
>>  Thanks
>> 
>>  Andy
>> 
>> 
>>  Here is my Java
>> 
>>          rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>> 
>> 
>> 
>>  #!/bin/sh
>> 
>> 
>>  #
>> 
>>  # Use this shell script to figure out how spark RDD pipe() works
>> 
>>  #
>> 
>> 
>>  set -x # turns shell debugging on
>> 
>>  #set +x # turns shell debugging off
>> 
>> 
>>  while read x ;
>> 
>>  do
>> 
>>  echo RDDPipe.sh $x ;
>> 
>>  Done
>> 
>> 
>> 
>>  Here is the output if debugging is turned on
>> 
>>  $ !grep
>> 
>>  grep RDDPipe run.sh.out
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 2
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 3
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  $
> 



Re: RDD pipe example. Is this a bug or a feature?

Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Hi Andy,

That's a feature -- you'll have to print out the return value from
collect() if you want the contents to show up on stdout.

Probably something like this:

for(Iterator<String> iter = rdd.pipe(pwd +
"/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
   System.out.println(iter.next());


Hope that helps,
-Jey

On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
<An...@santacruzintegration.com> wrote:
> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
> Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
>         rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: RDD pipe example. Is this a bug or a feature?

Posted by Sean Owen <so...@cloudera.com>.
What is in 'rdd' here, to double check? Do you mean the spark shell when
you say console? At the end you're grepping output from some redirected
output but where is that from?
On Sep 19, 2014 7:21 PM, "Andy Davidson" <An...@santacruzintegration.com>
wrote:

> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
>  Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
>         rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $
>