You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Trienis <mi...@orcsol.com> on 2015/03/06 00:33:21 UTC

Writing to S3 and retrieving folder names

Hi All,

I am receiving data from AWS Kinesis using Spark Streaming and am writing
the data collected in the dstream to s3 using output function:

dstreamData.saveAsTextFiles("s3n://XXX:XXX@XXXX/")

After the run the application for several seconds, I end up with a sequence
of directories in S3 that look like [PREFIX]-1425597204000.

At the same time I'd like to run a copy command on Redshift that pulls over
the exported data. The problem is that I am not sure how to extract the
folder names from the dstream object in order to construct the appropriate
COPY command.

https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.streaming.dstream.DStream

Anyone have any ideas?

Thanks, Mike.

Re: Writing to S3 and retrieving folder names

Posted by Mike Trienis <mi...@orcsol.com>.
Please ignore my question, you can simply specify the root directory and it
looks like redshift takes care of the rest.

copy mobile
from 's3://BUCKET_NAME/'
credentials ....
json 's3://BUCKET_NAME/jsonpaths.json'

On Thu, Mar 5, 2015 at 3:33 PM, Mike Trienis <mi...@orcsol.com>
wrote:

> Hi All,
>
> I am receiving data from AWS Kinesis using Spark Streaming and am writing
> the data collected in the dstream to s3 using output function:
>
> dstreamData.saveAsTextFiles("s3n://XXX:XXX@XXXX/")
>
> After the run the application for several seconds, I end up with a
> sequence of directories in S3 that look like [PREFIX]-1425597204000.
>
> At the same time I'd like to run a copy command on Redshift that pulls
> over the exported data. The problem is that I am not sure how to extract
> the folder names from the dstream object in order to construct the
> appropriate COPY command.
>
>
> https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.streaming.dstream.DStream
>
> Anyone have any ideas?
>
> Thanks, Mike.
>