You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2014/09/19 20:21:20 UTC
RDD pipe example. Is this a bug or a feature?
Hi
I am wrote a little java job to try and figure out how RDD pipe works.
Bellow is my test shell script. If in the script I turn on debugging I get
output. In my console. If debugging is turned off in the shell script, I do
not see anything in my console. Is this a bug or feature?
I am running the job locally on a Mac
Thanks
Andy
Here is my Java
rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
#!/bin/sh
#
# Use this shell script to figure out how spark RDD pipe() works
#
set -x # turns shell debugging on
#set +x # turns shell debugging off
while read x ;
do
echo RDDPipe.sh $x ;
Done
Here is the output if debugging is turned on
$ !grep
grep RDDPipe run.sh.out
+ echo RDDPipe.sh 0
+ echo RDDPipe.sh 0
+ echo RDDPipe.sh 2
+ echo RDDPipe.sh 0
+ echo RDDPipe.sh 3
+ echo RDDPipe.sh 0
+ echo RDDPipe.sh 0
$
Re: RDD pipe example. Is this a bug or a feature?
Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Your proposed use of rdd.pipe("foo") to communicate with an external
process seems fine. The "foo" program should read its input from
stdin, perform its computations, and write its results back to stdout.
Note that "foo" will be run on the workers, invoked once per
partition, and the result will be an RDD[String] containing an entry
for each line of output from your program.
-Jey
On Fri, Sep 19, 2014 at 3:59 PM, Andy Davidson
<An...@santacruzintegration.com> wrote:
> Hi Jey
>
> Many thanks for the code example. Here is what I really want to do. I want
> to use Spark Stream and python. Unfortunately pySpark does not support
> streams yet. It was suggested the way to work around this was to use an RDD
> pipe. The example bellow was a little experiment.
>
> You can think of my system as following the standard unix shell script pipe
> design
>
> Stream of data -> spark -> down stream system not implemented in spark
>
> After seeing your example code I now understand how the stdin and stdout get
> configured.
>
> It seem like pipe() does not work the way I want. I guess I could open a
> socket and write to the down stream process.
>
> Any suggestions would be greatly appreciated
>
> Thanks Andy
>
> From: Jey Kottalam <je...@cs.berkeley.edu>
> Reply-To: <je...@cs.berkeley.edu>
> Date: Friday, September 19, 2014 at 12:35 PM
> To: Andrew Davidson <An...@SantaCruzIntegration.com>
> Cc: "user@spark.apache.org" <us...@spark.apache.org>
> Subject: Re: RDD pipe example. Is this a bug or a feature?
>
> Hi Andy,
>
> That's a feature -- you'll have to print out the return value from
> collect() if you want the contents to show up on stdout.
>
> Probably something like this:
>
> for(Iterator<String> iter = rdd.pipe(pwd +
> "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
> System.out.println(iter.next());
>
>
> Hope that helps,
> -Jey
>
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> <An...@santacruzintegration.com> wrote:
>
> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
> Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
> rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: RDD pipe example. Is this a bug or a feature?
Posted by Andy Davidson <An...@SantaCruzIntegration.com>.
Hi Jey
Many thanks for the code example. Here is what I really want to do. I want
to use Spark Stream and python. Unfortunately pySpark does not support
streams yet. It was suggested the way to work around this was to use an RDD
pipe. The example bellow was a little experiment.
You can think of my system as following the standard unix shell script pipe
design
Stream of data -> spark -> down stream system not implemented in spark
After seeing your example code I now understand how the stdin and stdout get
configured.
It seem like pipe() does not work the way I want. I guess I could open a
socket and write to the down stream process.
Any suggestions would be greatly appreciated
Thanks Andy
From: Jey Kottalam <je...@cs.berkeley.edu>
Reply-To: <je...@cs.berkeley.edu>
Date: Friday, September 19, 2014 at 12:35 PM
To: Andrew Davidson <An...@SantaCruzIntegration.com>
Cc: "user@spark.apache.org" <us...@spark.apache.org>
Subject: Re: RDD pipe example. Is this a bug or a feature?
> Hi Andy,
>
> That's a feature -- you'll have to print out the return value from
> collect() if you want the contents to show up on stdout.
>
> Probably something like this:
>
> for(Iterator<String> iter = rdd.pipe(pwd +
> "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
> System.out.println(iter.next());
>
>
> Hope that helps,
> -Jey
>
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> <An...@santacruzintegration.com> wrote:
>> Hi
>>
>> I am wrote a little java job to try and figure out how RDD pipe works.
>> Bellow is my test shell script. If in the script I turn on debugging I get
>> output. In my console. If debugging is turned off in the shell script, I do
>> not see anything in my console. Is this a bug or feature?
>>
>> I am running the job locally on a Mac
>>
>> Thanks
>>
>> Andy
>>
>>
>> Here is my Java
>>
>> rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>>
>>
>>
>> #!/bin/sh
>>
>>
>> #
>>
>> # Use this shell script to figure out how spark RDD pipe() works
>>
>> #
>>
>>
>> set -x # turns shell debugging on
>>
>> #set +x # turns shell debugging off
>>
>>
>> while read x ;
>>
>> do
>>
>> echo RDDPipe.sh $x ;
>>
>> Done
>>
>>
>>
>> Here is the output if debugging is turned on
>>
>> $ !grep
>>
>> grep RDDPipe run.sh.out
>>
>> + echo RDDPipe.sh 0
>>
>> + echo RDDPipe.sh 0
>>
>> + echo RDDPipe.sh 2
>>
>> + echo RDDPipe.sh 0
>>
>> + echo RDDPipe.sh 3
>>
>> + echo RDDPipe.sh 0
>>
>> + echo RDDPipe.sh 0
>>
>> $
>
Re: RDD pipe example. Is this a bug or a feature?
Posted by Jey Kottalam <je...@cs.berkeley.edu>.
Hi Andy,
That's a feature -- you'll have to print out the return value from
collect() if you want the contents to show up on stdout.
Probably something like this:
for(Iterator<String> iter = rdd.pipe(pwd +
"/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
System.out.println(iter.next());
Hope that helps,
-Jey
On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
<An...@santacruzintegration.com> wrote:
> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
> Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
> rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: RDD pipe example. Is this a bug or a feature?
Posted by Sean Owen <so...@cloudera.com>.
What is in 'rdd' here, to double check? Do you mean the spark shell when
you say console? At the end you're grepping output from some redirected
output but where is that from?
On Sep 19, 2014 7:21 PM, "Andy Davidson" <An...@santacruzintegration.com>
wrote:
> Hi
>
> I am wrote a little java job to try and figure out how RDD pipe works.
> Bellow is my test shell script. If in the script I turn on debugging I get
> output. In my console. If debugging is turned off in the shell script, I do
> not see anything in my console. Is this a bug or feature?
>
> I am running the job locally on a Mac
>
> Thanks
>
> Andy
>
>
> Here is my Java
>
> rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>
>
> #!/bin/sh
>
>
> #
>
> # Use this shell script to figure out how spark RDD pipe() works
>
> #
>
>
> set -x # turns shell debugging on
>
> #set +x # turns shell debugging off
>
>
> while read x ;
>
> do
>
> echo RDDPipe.sh $x ;
>
> Done
>
>
> Here is the output if debugging is turned on
>
> $ !grep
>
> grep RDDPipe run.sh.out
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 2
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 3
>
> + echo RDDPipe.sh 0
>
> + echo RDDPipe.sh 0
>
> $
>