You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Noam Kfir <No...@perion.com> on 2014/12/02 16:14:12 UTC

IP to geo information in spark streaming application

Hi


I'm new to spark streaming.

I'm currently writing spark streaming application to standardize events coming from Kinesis.

As part of the logic, I want to use IP to geo information library or service.

My questions:

1) If I would use some REST service for this task, do U think it would cause performance penalty (over using library based solution)

2) If I would use a library based solution, I will have to use some local db file.
What mechanism should I use in order to transfer such db file? a broadcast variable?

?Tx, Noam.

Re: IP to geo information in spark streaming application

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
1. If you use some custom API library, there's a chance to end up with
Serialization errors and all, but a normal http REST api would work fine
except there could be a bit of performance lag + those api's might limit
the number of requests.

2. I would go for this approach, either i will broadcast the ip data or i
would cache it in a normal RDD and then i would join it with the stream
data.

Thanks
Best Regards

On Tue, Dec 2, 2014 at 8:44 PM, Noam Kfir <No...@perion.com> wrote:

>  Hi
>
>
>  I'm new to spark streaming.
>
> I'm currently writing spark streaming application to standardize events
> coming from Kinesis.
>
> As part of the logic, I want to use IP to geo information
> library or service.
>
> My questions:
>
> 1) If I would use some REST service for this task, do U think it would
> cause performance penalty (over using library based solution)
>
> 2) If I would use a library based solution, I will have to use some local
> db file.
> What mechanism should I use in order to transfer such db file? a broadcast
> variable?
>
> ​Tx, Noam.
>

Re: IP to geo information in spark streaming application

Posted by qinwei <we...@dewmobile.net>.








1) I think using library based solution is a better idea, we used that, and it works.2) We used broadcast variable, and it works


qinwei
 From: Noam KfirDate: 2014-12-02 23:14To: user@spark.apache.orgSubject: IP to geo information in spark streaming application






Hi





I'm new to spark streaming.


I'm currently writing spark streaming application to standardize events coming from Kinesis.


As part of the logic, I want to use IP to geo information library or service.


My questions:
1) If I would use some REST service for this task, do U think it would cause performance penalty (over using library based solution)


2) If I would use a library based solution, I will have to use some local db file.

What mechanism should I use in order to transfer such db file? a broadcast variable?


​Tx, Noam.