You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Varadhan, Jawahar" <va...@yahoo.com.INVALID> on 2015/09/12 00:02:21 UTC

SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

Hi all,   I have a coded a custom receiver which receives kafka messages. These Kafka messages have FTP server credentials in them. The receiver then opens the message and uses the ftp credentials in it  to connect to the ftp server. It then streams this huge text file (3.3G) . Finally this stream it read line by line using buffered reader and pushed to the spark streaming via the receiver's "store" method. Spark streaming process receives all these lines and stores it in hdfs.
With this process I could ingest small files (50 mb) but cant ingest this 3.3gb file.  I get a YARN exception of SIGTERM 15 in spark streaming process. Also, I tried going to that 3.3GB file directly (without custom receiver) in spark streaming using ssc.textFileStream  and everything works fine and that file ends in HDFS
Please let me know what I might have to do to get this working with receiver. I know there are better ways to ingest the file but we need to use Spark streaming in our case.
Thanks.

Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

Posted by Jörn Franke <jo...@gmail.com>.
I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?

Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <va...@yahoo.com.invalid>
a écrit :

> Hi all,
>    I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it  to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file.  I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream  and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>

Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files using custom Receiver

Posted by Jörn Franke <jo...@gmail.com>.
I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?

Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <va...@yahoo.com.invalid>
a écrit :

> Hi all,
>    I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it  to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file.  I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream  and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>