You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Somasundaram Sekar <so...@tigeranalytics.com> on 2018/01/18 08:13:42 UTC

Writing to Redshift from Kafka Streaming source

Hi,



Is it possible to write the Dataframe backed by Kafka Streaming source into
AWS Redshift, we have in the past used
https://github.com/databricks/spark-redshift to write into redshift, but I
presume it will not work with DataFrame##writeStream(). Also writing with
JDBC connector with ForeachWriter is also may not be a good idea given the
way Redshift works.



One possible approach that I have come across from Yelp blog (
https://engineeringblog.yelp.com/2016/10/redshift-connector.html) is to
write the files into S3 and then invoke Redhift COPY(
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) with a Manifest
file having the S3 Object path, in case of Structured Streaming, how can I
control the files into which I write to S3 and have a separate trigger to
create a manifest file after writing say 5 files into S3.



Any other possible solution are also appreciated. Thanks in advance.



Regards,

Somasundaram S

-- 
*Disclaimer*: This e-mail is intended to be delivered only to the named 
addressee(s). If this information is received by anyone other than the 
named addressee(s), the recipient(s) should immediately notify 
info@tigeranalytics.com and promptly delete the transmitted material from 
your computer and server.   In no event shall this material be read, used, 
stored, or retained by anyone other than the named addressee(s) without the 
express written consent of the sender or the named addressee(s). Computer 
viruses can be transmitted viaemail. The recipient should check this email and 
any attachments for viruses. The company accepts no liability for any 
damage caused by any virus transmitted by this email.