You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Varadhan, Jawahar" <va...@yahoo.com.INVALID> on 2015/09/12 00:02:21 UTC
SIGTERM 15 Issue : Spark Streaming for ingesting huge text files
using custom Receiver
Hi all, I have a coded a custom receiver which receives kafka messages. These Kafka messages have FTP server credentials in them. The receiver then opens the message and uses the ftp credentials in it to connect to the ftp server. It then streams this huge text file (3.3G) . Finally this stream it read line by line using buffered reader and pushed to the spark streaming via the receiver's "store" method. Spark streaming process receives all these lines and stores it in hdfs.
With this process I could ingest small files (50 mb) but cant ingest this 3.3gb file. I get a YARN exception of SIGTERM 15 in spark streaming process. Also, I tried going to that 3.3GB file directly (without custom receiver) in spark streaming using ssc.textFileStream and everything works fine and that file ends in HDFS
Please let me know what I might have to do to get this working with receiver. I know there are better ways to ingest the file but we need to use Spark streaming in our case.
Thanks.
Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files
using custom Receiver
Posted by Jörn Franke <jo...@gmail.com>.
I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?
Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <va...@yahoo.com.invalid>
a écrit :
> Hi all,
> I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file. I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>
Re: SIGTERM 15 Issue : Spark Streaming for ingesting huge text files
using custom Receiver
Posted by Jörn Franke <jo...@gmail.com>.
I am not sure what are you trying to achieve here. Have you thought about
using flume? Additionally maybe something like rsync?
Le sam. 12 sept. 2015 à 0:02, Varadhan, Jawahar <va...@yahoo.com.invalid>
a écrit :
> Hi all,
> I have a coded a custom receiver which receives kafka messages. These
> Kafka messages have FTP server credentials in them. The receiver then opens
> the message and uses the ftp credentials in it to connect to the ftp
> server. It then streams this huge text file (3.3G) . Finally this stream it
> read line by line using buffered reader and pushed to the spark streaming
> via the receiver's "store" method. Spark streaming process receives all
> these lines and stores it in hdfs.
>
> With this process I could ingest small files (50 mb) but cant ingest this
> 3.3gb file. I get a YARN exception of SIGTERM 15 in spark streaming
> process. Also, I tried going to that 3.3GB file directly (without custom
> receiver) in spark streaming using ssc.textFileStream and everything works
> fine and that file ends in HDFS
>
> Please let me know what I might have to do to get this working with
> receiver. I know there are better ways to ingest the file but we need to
> use Spark streaming in our case.
>
> Thanks.
>