You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by vinod kumar <vi...@gmail.com> on 2015/07/13 12:24:37 UTC

Spark Intro

Hi Everyone,

I am developing application which handles bulk of data around millions(This
may vary as per user's requirement) records.As of now I am using
MsSqlServer as back-end and it works fine  but when I perform some
operation on large data I am getting overflow exceptions.I heard about
spark that it was fastest computation engine Than SQL(Correct me if I am
worng).so i thought to switch my application to spark.Is my decision is
right?
My User Enviroment is
#.Window 8
#.Data in millions.
#.Need to perform filtering and Sorting operations with aggregartions
frequently.(for analystics)

Thanks in-advance,

Vinod

Re: Spark Intro

Posted by vinod kumar <vi...@gmail.com>.
Thank you Hafsa

On Tue, Jul 14, 2015 at 11:09 AM, Hafsa Asif <ha...@matchinguu.com>
wrote:

> Hi,
> I was also in the same situation as we were using MySQL. Let me give some
> clearfications:
> 1. Spark provides a great methodology for big data analysis. So, if you
> want to make your system more analytical and want deep prepared analytical
> methods to analyze your data, then its a very good option.
> 2. If you want to get rid of old behavior of MS SQL and want to take fast
> responses from database with huge datasets then you can take any NOSQL
> database.
>
> In my case I select Aerospike for data storage and apply Spark analytical
> engine on it. It gives me really good response and I have a plan to go in
> real production with this combination.
>
> Best,
> Hafsa
>
> 2015-07-14 11:49 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
>> It might take some time to understand the echo system. I'm not sure about
>> what kind of environment you are having (like #cores, Memory etc.), To
>> start with, you can basically use a jdbc connector or dump your data as csv
>> and load it into Spark and query it. You get the advantage of caching if
>> you have more memory, also if you have enough cores 40000 records are
>> nothing.
>>
>> Thanks
>> Best Regards
>>
>> On Tue, Jul 14, 2015 at 3:09 PM, vinod kumar <vi...@gmail.com>
>> wrote:
>>
>>> Hi Akhil
>>>
>>> Is my choice to switch to spark is good? because I don't have enough
>>> information regards limitation and working environment of spark.
>>> I tried spark SQL but it seems it returns data slower than compared to
>>> MsSQL.( I have tested with data which has 40000 records)
>>>
>>>
>>>
>>> On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das <ak...@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> This is where you can get started
>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I am developing application which handles bulk of data around
>>>>> millions(This may vary as per user's requirement) records.As of now I am
>>>>> using MsSqlServer as back-end and it works fine  but when I perform some
>>>>> operation on large data I am getting overflow exceptions.I heard about
>>>>> spark that it was fastest computation engine Than SQL(Correct me if I am
>>>>> worng).so i thought to switch my application to spark.Is my decision is
>>>>> right?
>>>>> My User Enviroment is
>>>>> #.Window 8
>>>>> #.Data in millions.
>>>>> #.Need to perform filtering and Sorting operations with aggregartions
>>>>> frequently.(for analystics)
>>>>>
>>>>> Thanks in-advance,
>>>>>
>>>>> Vinod
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark Intro

Posted by vaquar khan <va...@gmail.com>.
Totally agreed with hafasa, you need to identify your requirements and
needs before choose spark.

If you want to handle data with fast access go to no sql (mongo,aerospike
etc) if you need data analytical then spark is best .

Regards,
Vaquar khan
On 14 Jul 2015 20:39, "Hafsa Asif" <ha...@matchinguu.com> wrote:

> Hi,
> I was also in the same situation as we were using MySQL. Let me give some
> clearfications:
> 1. Spark provides a great methodology for big data analysis. So, if you
> want to make your system more analytical and want deep prepared analytical
> methods to analyze your data, then its a very good option.
> 2. If you want to get rid of old behavior of MS SQL and want to take fast
> responses from database with huge datasets then you can take any NOSQL
> database.
>
> In my case I select Aerospike for data storage and apply Spark analytical
> engine on it. It gives me really good response and I have a plan to go in
> real production with this combination.
>
> Best,
> Hafsa
>
> 2015-07-14 11:49 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:
>
>> It might take some time to understand the echo system. I'm not sure about
>> what kind of environment you are having (like #cores, Memory etc.), To
>> start with, you can basically use a jdbc connector or dump your data as csv
>> and load it into Spark and query it. You get the advantage of caching if
>> you have more memory, also if you have enough cores 40000 records are
>> nothing.
>>
>> Thanks
>> Best Regards
>>
>> On Tue, Jul 14, 2015 at 3:09 PM, vinod kumar <vi...@gmail.com>
>> wrote:
>>
>>> Hi Akhil
>>>
>>> Is my choice to switch to spark is good? because I don't have enough
>>> information regards limitation and working environment of spark.
>>> I tried spark SQL but it seems it returns data slower than compared to
>>> MsSQL.( I have tested with data which has 40000 records)
>>>
>>>
>>>
>>> On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das <ak...@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> This is where you can get started
>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I am developing application which handles bulk of data around
>>>>> millions(This may vary as per user's requirement) records.As of now I am
>>>>> using MsSqlServer as back-end and it works fine  but when I perform some
>>>>> operation on large data I am getting overflow exceptions.I heard about
>>>>> spark that it was fastest computation engine Than SQL(Correct me if I am
>>>>> worng).so i thought to switch my application to spark.Is my decision is
>>>>> right?
>>>>> My User Enviroment is
>>>>> #.Window 8
>>>>> #.Data in millions.
>>>>> #.Need to perform filtering and Sorting operations with aggregartions
>>>>> frequently.(for analystics)
>>>>>
>>>>> Thanks in-advance,
>>>>>
>>>>> Vinod
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark Intro

Posted by Hafsa Asif <ha...@matchinguu.com>.
Hi,
I was also in the same situation as we were using MySQL. Let me give some
clearfications:
1. Spark provides a great methodology for big data analysis. So, if you
want to make your system more analytical and want deep prepared analytical
methods to analyze your data, then its a very good option.
2. If you want to get rid of old behavior of MS SQL and want to take fast
responses from database with huge datasets then you can take any NOSQL
database.

In my case I select Aerospike for data storage and apply Spark analytical
engine on it. It gives me really good response and I have a plan to go in
real production with this combination.

Best,
Hafsa

2015-07-14 11:49 GMT+02:00 Akhil Das <ak...@sigmoidanalytics.com>:

> It might take some time to understand the echo system. I'm not sure about
> what kind of environment you are having (like #cores, Memory etc.), To
> start with, you can basically use a jdbc connector or dump your data as csv
> and load it into Spark and query it. You get the advantage of caching if
> you have more memory, also if you have enough cores 40000 records are
> nothing.
>
> Thanks
> Best Regards
>
> On Tue, Jul 14, 2015 at 3:09 PM, vinod kumar <vi...@gmail.com>
> wrote:
>
>> Hi Akhil
>>
>> Is my choice to switch to spark is good? because I don't have enough
>> information regards limitation and working environment of spark.
>> I tried spark SQL but it seems it returns data slower than compared to
>> MsSQL.( I have tested with data which has 40000 records)
>>
>>
>>
>> On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> This is where you can get started
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Hi Everyone,
>>>>
>>>> I am developing application which handles bulk of data around
>>>> millions(This may vary as per user's requirement) records.As of now I am
>>>> using MsSqlServer as back-end and it works fine  but when I perform some
>>>> operation on large data I am getting overflow exceptions.I heard about
>>>> spark that it was fastest computation engine Than SQL(Correct me if I am
>>>> worng).so i thought to switch my application to spark.Is my decision is
>>>> right?
>>>> My User Enviroment is
>>>> #.Window 8
>>>> #.Data in millions.
>>>> #.Need to perform filtering and Sorting operations with aggregartions
>>>> frequently.(for analystics)
>>>>
>>>> Thanks in-advance,
>>>>
>>>> Vinod
>>>>
>>>
>>>
>>
>

Re: Spark Intro

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
It might take some time to understand the echo system. I'm not sure about
what kind of environment you are having (like #cores, Memory etc.), To
start with, you can basically use a jdbc connector or dump your data as csv
and load it into Spark and query it. You get the advantage of caching if
you have more memory, also if you have enough cores 40000 records are
nothing.

Thanks
Best Regards

On Tue, Jul 14, 2015 at 3:09 PM, vinod kumar <vi...@gmail.com>
wrote:

> Hi Akhil
>
> Is my choice to switch to spark is good? because I don't have enough
> information regards limitation and working environment of spark.
> I tried spark SQL but it seems it returns data slower than compared to
> MsSQL.( I have tested with data which has 40000 records)
>
>
>
> On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> This is where you can get started
>> https://spark.apache.org/docs/latest/sql-programming-guide.html
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
>> wrote:
>>
>>>
>>> Hi Everyone,
>>>
>>> I am developing application which handles bulk of data around
>>> millions(This may vary as per user's requirement) records.As of now I am
>>> using MsSqlServer as back-end and it works fine  but when I perform some
>>> operation on large data I am getting overflow exceptions.I heard about
>>> spark that it was fastest computation engine Than SQL(Correct me if I am
>>> worng).so i thought to switch my application to spark.Is my decision is
>>> right?
>>> My User Enviroment is
>>> #.Window 8
>>> #.Data in millions.
>>> #.Need to perform filtering and Sorting operations with aggregartions
>>> frequently.(for analystics)
>>>
>>> Thanks in-advance,
>>>
>>> Vinod
>>>
>>
>>
>

Re: Spark Intro

Posted by vinod kumar <vi...@gmail.com>.
Hi Akhil

Is my choice to switch to spark is good? because I don't have enough
information regards limitation and working environment of spark.
I tried spark SQL but it seems it returns data slower than compared to
MsSQL.( I have tested with data which has 40000 records)



On Tue, Jul 14, 2015 at 3:50 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> This is where you can get started
> https://spark.apache.org/docs/latest/sql-programming-guide.html
>
> Thanks
> Best Regards
>
> On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
> wrote:
>
>>
>> Hi Everyone,
>>
>> I am developing application which handles bulk of data around
>> millions(This may vary as per user's requirement) records.As of now I am
>> using MsSqlServer as back-end and it works fine  but when I perform some
>> operation on large data I am getting overflow exceptions.I heard about
>> spark that it was fastest computation engine Than SQL(Correct me if I am
>> worng).so i thought to switch my application to spark.Is my decision is
>> right?
>> My User Enviroment is
>> #.Window 8
>> #.Data in millions.
>> #.Need to perform filtering and Sorting operations with aggregartions
>> frequently.(for analystics)
>>
>> Thanks in-advance,
>>
>> Vinod
>>
>
>

Re: Spark Intro

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
This is where you can get started
https://spark.apache.org/docs/latest/sql-programming-guide.html

Thanks
Best Regards

On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar <vi...@gmail.com>
wrote:

>
> Hi Everyone,
>
> I am developing application which handles bulk of data around
> millions(This may vary as per user's requirement) records.As of now I am
> using MsSqlServer as back-end and it works fine  but when I perform some
> operation on large data I am getting overflow exceptions.I heard about
> spark that it was fastest computation engine Than SQL(Correct me if I am
> worng).so i thought to switch my application to spark.Is my decision is
> right?
> My User Enviroment is
> #.Window 8
> #.Data in millions.
> #.Need to perform filtering and Sorting operations with aggregartions
> frequently.(for analystics)
>
> Thanks in-advance,
>
> Vinod
>