You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ta...@yahoo.com.INVALID on 2015/10/05 15:16:51 UTC

Spark handling parallel requests

Hi ,
i am using Scala , doing a socket program to catch multiple requests at same time and then call a function which uses spark to handle each process , i have a multi-threaded server to handle the multiple requests and pass each to spark , but there's a bottleneck as the spark doesn't initialize a sub task for the new request , is it even possible to do parallel processing using single spark job ?Best Regards, --  Best Regards, -- Tarek Abouzeid

Re: Spark handling parallel requests

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Yes, there are kafka consumers/producers for almost all the languages, you
can read more over here
https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-PHP
Here's a repo for the php version https://github.com/EVODelavega/phpkafka

Thanks
Best Regards

On Sun, Oct 18, 2015 at 12:58 PM, <ta...@yahoo.com> wrote:

> hi Akhlis
>
> its a must to push data to a socket as i am using php as a web service to
> push data to socket , then spark catch the data on that socket and process
> it , is there a way to push data from php to kafka directly ?
>
> --  Best Regards, -- Tarek Abouzeid
>
>
>
> On Sunday, October 18, 2015 10:26 AM, "tarek.abouzeid91@yahoo.com" <
> tarek.abouzeid91@yahoo.com> wrote:
>
>
> hi Xiao,
> 1- requests are not similar at all , but they use solr and do commit
> sometimes
> 2- no caching is required
> 3- the throughput must be very high yeah , the requests are tiny but the
> system may receive 100 request/sec ,
> does kafka support listening to a socket ?
>
> --  Best Regards, -- Tarek Abouzeid
>
>
>
> On Monday, October 12, 2015 10:50 AM, Xiao Li <ga...@gmail.com>
> wrote:
>
>
> Hi, Tarek,
>
> It is hard to answer your question. Are these requests similar? Caching
> your results or intermediate results in your applications? Or does that
> mean your throughput requirement is very high? Throttling the number of
> concurrent requests? ...
>
> As Akhil said, Kafka might help in your case. Otherwise, you need to read
> the designs or even source codes of Kafka and Spark Streaming.
>
>  Best wishes,
>
> Xiao Li
>
>
> 2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
> Instead of pushing your requests to the socket, why don't you push them to
> a Kafka or any other message queue and use spark streaming to process them?
>
> Thanks
> Best Regards
>
> On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid>
> wrote:
>
> Hi ,
> i am using Scala , doing a socket program to catch multiple requests at
> same time and then call a function which uses spark to handle each process
> , i have a multi-threaded server to handle the multiple requests and pass
> each to spark , but there's a bottleneck as the spark doesn't initialize a
> sub task for the new request , is it even possible to do parallel
> processing using single spark job ?
> Best Regards,
>
> --  Best Regards, -- Tarek Abouzeid
>
>
>
>
>
>
>
>

Re: Spark handling parallel requests

Posted by ta...@yahoo.com.INVALID.
Thanks guys for your advice , i will have a look on the custom receivers , thanks again guys for your efforts   --  Best Regards, -- Tarek Abouzeid 


     On Monday, October 19, 2015 6:50 PM, Adrian Tanase <at...@adobe.com> wrote:
   

 To answer your specific question, you can’t push data to Kafka through a socket – you need a smart client library as the cluster setup is pretty advanced (also requires zookeeper).
I bet there are php libraries for Kafka although after a quick search it seems they’re still pretty young. Also – Kafka shines at larger deployments and throughput (tens of thousands to millions of events per second) and may be overkill for 100 events / sec. 
Here are some other ideas:   
   - Use a lighter weight message broker like Rabbit MQ or MQTT – both have good integrations with spark and should be simpler to integrate with PHP
   - Instead of doing a socket call, log the event on disk – this opens up 2 strategies      
      - If you have access to shared storage, spark could read the files directly
      - Otherwise, you could rely on something like Flume that can poll your logs and forward them to spark (There is a default integration in the spark external package)

   - Lastly, why not try to build on one of the custom receivers? There are plenty code samples in the docs and examples      
      - This may not be a good choice if you can’t afford to lose any messages – in this case your life is harder as you’ll need to also use WAL based implementation

Hope this helps,-adrian
From: "tarek.abouzeid91@yahoo.com.INVALID"
Reply-To: "tarek.abouzeid91@yahoo.com"
Date: Sunday, October 18, 2015 at 10:28 AM
To: Xiao Li, Akhil Das
Cc: "user@spark.apache.org"
Subject: Re: Spark handling parallel requests

hi Akhlis 
its a must to push data to a socket as i am using php as a web service to push data to socket , then spark catch the data on that socket and process it , is there a way to push data from php to kafka directly ? --  Best Regards, -- Tarek Abouzeid


On Sunday, October 18, 2015 10:26 AM, "tarek.abouzeid91@yahoo.com" <ta...@yahoo.com> wrote:


hi Xiao,1- requests are not similar at all , but they use solr and do commit sometimes 2- no caching is required3- the throughput must be very high yeah , the requests are tiny but the system may receive 100 request/sec , does kafka support listening to a socket ? --  Best Regards, -- Tarek Abouzeid


On Monday, October 12, 2015 10:50 AM, Xiao Li <ga...@gmail.com> wrote:


Hi, Tarek, 
It is hard to answer your question. Are these requests similar? Caching your results or intermediate results in your applications? Or does that mean your throughput requirement is very high? Throttling the number of concurrent requests? ...
As Akhil said, Kafka might help in your case. Otherwise, you need to read the designs or even source codes of Kafka and Spark Streaming. 
 Best wishes, 
Xiao Li

2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

Instead of pushing your requests to the socket, why don't you push them to a Kafka or any other message queue and use spark streaming to process them?
ThanksBest Regards
On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid> wrote:

Hi ,
i am using Scala , doing a socket program to catch multiple requests at same time and then call a function which uses spark to handle each process , i have a multi-threaded server to handle the multiple requests and pass each to spark , but there's a bottleneck as the spark doesn't initialize a sub task for the new request , is it even possible to do parallel processing using single spark job ?Best Regards, --  Best Regards, -- Tarek Abouzeid









  

Re: Spark handling parallel requests

Posted by Adrian Tanase <at...@adobe.com>.
To answer your specific question, you can’t push data to Kafka through a socket – you need a smart client library as the cluster setup is pretty advanced (also requires zookeeper).

I bet there are php libraries for Kafka although after a quick search it seems they’re still pretty young. Also – Kafka shines at larger deployments and throughput (tens of thousands to millions of events per second) and may be overkill for 100 events / sec.

Here are some other ideas:

  *   Use a lighter weight message broker like Rabbit MQ or MQTT – both have good integrations with spark and should be simpler to integrate with PHP
  *   Instead of doing a socket call, log the event on disk – this opens up 2 strategies
     *   If you have access to shared storage, spark could read the files directly
     *   Otherwise, you could rely on something like Flume<https://flume.apache.org/> that can poll your logs and forward them to spark (There is a default integration in the spark external package)
  *   Lastly, why not try to build on one of the custom receivers<http://spark.apache.org/docs/latest/streaming-custom-receivers.html>? There are plenty code samples in the docs and examples
     *   This may not be a good choice if you can’t afford to lose any messages – in this case your life is harder as you’ll need to also use WAL based implementation

Hope this helps,
-adrian

From: "tarek.abouzeid91@yahoo.com.INVALID<ma...@yahoo.com.INVALID>"
Reply-To: "tarek.abouzeid91@yahoo.com<ma...@yahoo.com>"
Date: Sunday, October 18, 2015 at 10:28 AM
To: Xiao Li, Akhil Das
Cc: "user@spark.apache.org<ma...@spark.apache.org>"
Subject: Re: Spark handling parallel requests

hi Akhlis

its a must to push data to a socket as i am using php as a web service to push data to socket , then spark catch the data on that socket and process it , is there a way to push data from php to kafka directly ?

--  Best Regards, -- Tarek Abouzeid



On Sunday, October 18, 2015 10:26 AM, "tarek.abouzeid91@yahoo.com<ma...@yahoo.com>" <ta...@yahoo.com>> wrote:


hi Xiao,
1- requests are not similar at all , but they use solr and do commit sometimes
2- no caching is required
3- the throughput must be very high yeah , the requests are tiny but the system may receive 100 request/sec ,
does kafka support listening to a socket ?

--  Best Regards, -- Tarek Abouzeid



On Monday, October 12, 2015 10:50 AM, Xiao Li <ga...@gmail.com>> wrote:


Hi, Tarek,

It is hard to answer your question. Are these requests similar? Caching your results or intermediate results in your applications? Or does that mean your throughput requirement is very high? Throttling the number of concurrent requests? ...

As Akhil said, Kafka might help in your case. Otherwise, you need to read the designs or even source codes of Kafka and Spark Streaming.

 Best wishes,

Xiao Li


2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>>:
Instead of pushing your requests to the socket, why don't you push them to a Kafka or any other message queue and use spark streaming to process them?

Thanks
Best Regards

On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid>> wrote:
Hi ,
i am using Scala , doing a socket program to catch multiple requests at same time and then call a function which uses spark to handle each process , i have a multi-threaded server to handle the multiple requests and pass each to spark , but there's a bottleneck as the spark doesn't initialize a sub task for the new request , is it even possible to do parallel processing using single spark job ?
Best Regards,

--  Best Regards, -- Tarek Abouzeid







Re: Spark handling parallel requests

Posted by ta...@yahoo.com.INVALID.
hi Akhlis 
its a must to push data to a socket as i am using php as a web service to push data to socket , then spark catch the data on that socket and process it , is there a way to push data from php to kafka directly ? --  Best Regards, -- Tarek Abouzeid 


     On Sunday, October 18, 2015 10:26 AM, "tarek.abouzeid91@yahoo.com" <ta...@yahoo.com> wrote:
   

 hi Xiao,1- requests are not similar at all , but they use solr and do commit sometimes 2- no caching is required3- the throughput must be very high yeah , the requests are tiny but the system may receive 100 request/sec , does kafka support listening to a socket ? --  Best Regards, -- Tarek Abouzeid 


     On Monday, October 12, 2015 10:50 AM, Xiao Li <ga...@gmail.com> wrote:
   

 Hi, Tarek, 
It is hard to answer your question. Are these requests similar? Caching your results or intermediate results in your applications? Or does that mean your throughput requirement is very high? Throttling the number of concurrent requests? ...
As Akhil said, Kafka might help in your case. Otherwise, you need to read the designs or even source codes of Kafka and Spark Streaming. 
 Best wishes, 
Xiao Li

2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

Instead of pushing your requests to the socket, why don't you push them to a Kafka or any other message queue and use spark streaming to process them?
ThanksBest Regards
On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid> wrote:

Hi ,
i am using Scala , doing a socket program to catch multiple requests at same time and then call a function which uses spark to handle each process , i have a multi-threaded server to handle the multiple requests and pass each to spark , but there's a bottleneck as the spark doesn't initialize a sub task for the new request , is it even possible to do parallel processing using single spark job ?Best Regards, --  Best Regards, -- Tarek Abouzeid





   

  

Re: Spark handling parallel requests

Posted by ta...@yahoo.com.INVALID.
hi Xiao,1- requests are not similar at all , but they use solr and do commit sometimes 2- no caching is required3- the throughput must be very high yeah , the requests are tiny but the system may receive 100 request/sec , does kafka support listening to a socket ? --  Best Regards, -- Tarek Abouzeid 


     On Monday, October 12, 2015 10:50 AM, Xiao Li <ga...@gmail.com> wrote:
   

 Hi, Tarek, 
It is hard to answer your question. Are these requests similar? Caching your results or intermediate results in your applications? Or does that mean your throughput requirement is very high? Throttling the number of concurrent requests? ...
As Akhil said, Kafka might help in your case. Otherwise, you need to read the designs or even source codes of Kafka and Spark Streaming. 
 Best wishes, 
Xiao Li

2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

Instead of pushing your requests to the socket, why don't you push them to a Kafka or any other message queue and use spark streaming to process them?
ThanksBest Regards
On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid> wrote:

Hi ,
i am using Scala , doing a socket program to catch multiple requests at same time and then call a function which uses spark to handle each process , i have a multi-threaded server to handle the multiple requests and pass each to spark , but there's a bottleneck as the spark doesn't initialize a sub task for the new request , is it even possible to do parallel processing using single spark job ?Best Regards, --  Best Regards, -- Tarek Abouzeid





  

Re: Spark handling parallel requests

Posted by Xiao Li <ga...@gmail.com>.
Hi, Tarek,

It is hard to answer your question. Are these requests similar? Caching
your results or intermediate results in your applications? Or does that
mean your throughput requirement is very high? Throttling the number of
concurrent requests? ...

As Akhil said, Kafka might help in your case. Otherwise, you need to read
the designs or even source codes of Kafka and Spark Streaming.

 Best wishes,

Xiao Li


2015-10-11 23:19 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

> Instead of pushing your requests to the socket, why don't you push them to
> a Kafka or any other message queue and use spark streaming to process them?
>
> Thanks
> Best Regards
>
> On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid>
> wrote:
>
>> Hi ,
>> i am using Scala , doing a socket program to catch multiple requests at
>> same time and then call a function which uses spark to handle each process
>> , i have a multi-threaded server to handle the multiple requests and pass
>> each to spark , but there's a bottleneck as the spark doesn't initialize a
>> sub task for the new request , is it even possible to do parallel
>> processing using single spark job ?
>> Best Regards,
>>
>> --  Best Regards, -- Tarek Abouzeid
>>
>
>

Re: Spark handling parallel requests

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Instead of pushing your requests to the socket, why don't you push them to
a Kafka or any other message queue and use spark streaming to process them?

Thanks
Best Regards

On Mon, Oct 5, 2015 at 6:46 PM, <ta...@yahoo.com.invalid> wrote:

> Hi ,
> i am using Scala , doing a socket program to catch multiple requests at
> same time and then call a function which uses spark to handle each process
> , i have a multi-threaded server to handle the multiple requests and pass
> each to spark , but there's a bottleneck as the spark doesn't initialize a
> sub task for the new request , is it even possible to do parallel
> processing using single spark job ?
> Best Regards,
>
> --  Best Regards, -- Tarek Abouzeid
>