You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2018/05/23 11:46:59 UTC

Bulk / Fast Read and Write with MSSQL Server and Spark

All,

I am looking for approach to do bulk read / write with MSSQL Server and
Apache Spark 2.2 , please let me know if any library / driver for the same.

Thank you.
Chetan

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by kedarsdixit <ke...@persistent.com>.

Hi,

I had came across  this
<https://stephanefrechette.com/connect-sql-server-using-apache-spark/#.WwVVosThXIU>  
a while ago check if this is helpful.

Regards,
~Kedar Dixit
Data Science @ Persistent Systems Ltd.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Ajay, You can use Sqoop if wants to ingest data to HDFS. This is POC where
customer wants to prove that Spark ETL would be faster than C# based raw
SQL Statements. That's all, There are no time-stamp based columns in Source
tables to make it incremental load.

On Thu, May 24, 2018 at 1:08 AM, ayan guha <gu...@gmail.com> wrote:

> Curious question: what is the reason of using spark here? Why not simple
> sql-based ETL?
>
> On Thu, May 24, 2018 at 5:09 AM, Ajay <aj...@gmail.com> wrote:
>
>> Do you worry about spark overloading the SQL server?  We have had this
>> issue in the past where all spark slaves tend to send lots of data at once
>> to SQL and that slows down the latency of the rest of the system. We
>> overcame this by using sqoop and running it in a controlled environment.
>>
>> On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <
>> chetan.opensource@gmail.com> wrote:
>>
>>> Super, just giving high level idea what i want to do. I have one source
>>> schema which is MS SQL Server 2008 and target is also MS SQL Server 2008.
>>> Currently there is c# based ETL application which does extract transform
>>> and load as customer specific schema including indexing etc.
>>>
>>>
>>> Thanks
>>>
>>> On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <
>>> kedarnath_dixit@persistent.com> wrote:
>>>
>>>> Yes.
>>>>
>>>> Regards,
>>>> Kedar Dixit
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>
>> --
>> Thanks,
>> Ajay
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by ayan guha <gu...@gmail.com>.

Curious question: what is the reason of using spark here? Why not simple
sql-based ETL?

On Thu, May 24, 2018 at 5:09 AM, Ajay <aj...@gmail.com> wrote:

> Do you worry about spark overloading the SQL server?  We have had this
> issue in the past where all spark slaves tend to send lots of data at once
> to SQL and that slows down the latency of the rest of the system. We
> overcame this by using sqoop and running it in a controlled environment.
>
> On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <ch...@gmail.com>
> wrote:
>
>> Super, just giving high level idea what i want to do. I have one source
>> schema which is MS SQL Server 2008 and target is also MS SQL Server 2008.
>> Currently there is c# based ETL application which does extract transform
>> and load as customer specific schema including indexing etc.
>>
>>
>> Thanks
>>
>> On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <kedarnath_dixit@persistent.
>> com> wrote:
>>
>>> Yes.
>>>
>>> Regards,
>>> Kedar Dixit
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>
> --
> Thanks,
> Ajay
>



-- 
Best Regards,
Ayan Guha

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by Ajay <aj...@gmail.com>.

Do you worry about spark overloading the SQL server?  We have had this
issue in the past where all spark slaves tend to send lots of data at once
to SQL and that slows down the latency of the rest of the system. We
overcame this by using sqoop and running it in a controlled environment.

On Wed, May 23, 2018 at 7:32 AM Chetan Khatri <ch...@gmail.com>
wrote:

> Super, just giving high level idea what i want to do. I have one source
> schema which is MS SQL Server 2008 and target is also MS SQL Server 2008.
> Currently there is c# based ETL application which does extract transform
> and load as customer specific schema including indexing etc.
>
>
> Thanks
>
> On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <
> kedarnath_dixit@persistent.com> wrote:
>
>> Yes.
>>
>> Regards,
>> Kedar Dixit
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>

-- 
Thanks,
Ajay

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Super, just giving high level idea what i want to do. I have one source
schema which is MS SQL Server 2008 and target is also MS SQL Server 2008.
Currently there is c# based ETL application which does extract transform
and load as customer specific schema including indexing etc.

Thanks

On Wed, May 23, 2018 at 7:11 PM, kedarsdixit <kedarnath_dixit@persistent.com
> wrote:

> Yes.
>
> Regards,
> Kedar Dixit
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by kedarsdixit <ke...@persistent.com>.

Yes.

Regards,
Kedar Dixit



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by Chetan Khatri <ch...@gmail.com>.

Thank you Kedar Dixit, Silvio Fiorito.

Just one question that - even it's not an azure cloud MS-SQL Server. It
should support MS-SQL Server installed on local machine. right ?

Thank you.

On Wed, May 23, 2018 at 6:18 PM, Silvio Fiorito <
silvio.fiorito@granturing.com> wrote:

> Try this https://docs.microsoft.com/en-us/azure/sql-database/sql-
> database-spark-connector
>
>
>
>
>
> *From: *Chetan Khatri <ch...@gmail.com>
> *Date: *Wednesday, May 23, 2018 at 7:47 AM
> *To: *user <us...@spark.apache.org>
> *Subject: *Bulk / Fast Read and Write with MSSQL Server and Spark
>
>
>
> All,
>
>
>
> I am looking for approach to do bulk read / write with MSSQL Server and
> Apache Spark 2.2 , please let me know if any library / driver for the same.
>
>
>
> Thank you.
>
> Chetan
>

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

Posted by Silvio Fiorito <si...@granturing.com>.

Try this https://docs.microsoft.com/en-us/azure/sql-database/sql-database-spark-connector

From: Chetan Khatri <ch...@gmail.com>
Date: Wednesday, May 23, 2018 at 7:47 AM
To: user <us...@spark.apache.org>
Subject: Bulk / Fast Read and Write with MSSQL Server and Spark

All,

I am looking for approach to do bulk read / write with MSSQL Server and Apache Spark 2.2 , please let me know if any library / driver for the same.

Thank you.
Chetan