You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Avadhut Narayan Joshi <AJ...@slb.com.INVALID> on 2020/05/21 14:10:54 UTC

ETL Using Spark

Hello Team

I am working on  ETL using Spark .


  *   I am fetching streaming data from Confluent Kafka
  *   Wanted to do aggregations by combining streaming data with Data from SQL Server

For achieving above use case


  1.  Can I fetch data from SQL Server into Spark based on where conditions ?
  2.  Can such data fetched from SQL Server combined with Streaming data and again streamed back into SQL Server ?

Is above use case valid ? Do we have any examples for above ?

Regards
Avadhut


Schlumberger-Private

Re: ETL Using Spark

Posted by "vijay.bvp" <bv...@gmail.com>.
Hi Avadhut Narayan JoshiThe use case is achievable using Spark. Connection to
SQL Server possible as Mich mentioned below as longs as there a JDBC driver
that can connect to SQL ServerFor a production workloads important points to
consider,  >> what is the QoS requirements for your case? at least once, at
most once, exactly-once  >> how to handle Spark Streaming job restarts?
(because of error or you have to put a new version of application) >> What
are your error handling strategies? >> How do you deal with late arriving
data since you are doing aggregations?It is best to make downstream systems
idempotent, that is very less troublesome way to have maintainable
production workloadsBest RegardsVP thanksVijay



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: ETL Using Spark

Posted by Mich Talebzadeh <mi...@gmail.com>.
Ok

   1. What information are you fetching from MSSQL. Is this reference data?
   2. What information are you processing through Spark via topics?
   3. Assuming you are combining data from MSSQL and Spark and enriching it
   are you posting back to another table in the same database?


Specifically you can fetch data from MSSQL through JDBC connection. Also
the enriched data can be written back to MSSQL through JDBC again


HTH




LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 21 May 2020 at 16:15, Avadhut Narayan Joshi
<AJ...@slb.com.invalid> wrote:

> Hello Team
>
>
>
> I am working on  ETL using Spark .
>
>
>
>    - I am fetching streaming data from Confluent Kafka
>    - Wanted to do aggregations by combining streaming data with Data from
>    SQL Server
>
>
>
> For achieving above use case
>
>
>
>    1. Can I fetch data from SQL Server into Spark based on where
>    conditions ?
>    2. Can such data fetched from SQL Server combined with Streaming data
>    and again streamed back into SQL Server ?
>
>
>
> Is above use case valid ? Do we have any examples for above ?
>
>
>
> Regards
>
> Avadhut
>
> Schlumberger-Private
>