You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jörn Franke <jo...@gmail.com> on 2015/09/12 09:32:57 UTC

Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?

Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <va...@yahoo.com.invalid>
a écrit :

> Hi all,
>    I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it  to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file.  I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream  and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>