You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "vijayant soni (JIRA)" <ji...@apache.org> on 2018/11/16 07:35:00 UTC
[jira] [Updated] (SPARK-26086) Spark streaming max records per
batch interval
[ https://issues.apache.org/jira/browse/SPARK-26086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
vijayant soni updated SPARK-26086:
----------------------------------
Description:
We have an Spark Streaming application that reads from Kinesis and writes to Redshift.
*Configuration*:
Number of receivers = 5
Batch interval = 10 mins
spark.streaming.receiver.maxRate = 2000 (records per second)
According to this config, the max records that can be read in a single batch can be calculated using below formula:
{\{Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 }}
But the actual number of records is more that the max number.
Batch I - 6,005,886 records
Batch II - 6,001,623 records
Batch III - 6,010,148 records
Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second.
was:
We have an Spark Streaming application that reads from Kinesis and writes to Redshift.
*Configuration*:
Number of receivers = 5
Batch interval = 10 mins
spark.streaming.receiver.maxRate = 2000 (records per second)
According to this config, the max records that can be read in a single batch can be calculated using below formula:
{\{Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 }}
But the actual number of records is more that the max number.
Batch I - 6,005,886 records
Batch II - 6,001,623 records
Batch III - 6,010,148 records
Please note that receivers are not even reading at the max rate, the records read per receiver per second are near 1900 per second.
> Spark streaming max records per batch interval
> ----------------------------------------------
>
> Key: SPARK-26086
> URL: https://issues.apache.org/jira/browse/SPARK-26086
> Project: Spark
> Issue Type: Bug
> Components: DStreams
> Affects Versions: 2.3.1
> Reporter: vijayant soni
> Priority: Major
>
> We have an Spark Streaming application that reads from Kinesis and writes to Redshift.
> *Configuration*:
> Number of receivers = 5
> Batch interval = 10 mins
> spark.streaming.receiver.maxRate = 2000 (records per second)
> According to this config, the max records that can be read in a single batch can be calculated using below formula:
> {\{Max records per batch = batch_interval * 60 (convert mins to seconds) * 5 (number of receivers) * 2000 (max records per second per receiver) 10 * 60 * 5 * 2000 = 6,000,000 }}
> But the actual number of records is more that the max number.
> Batch I - 6,005,886 records
> Batch II - 6,001,623 records
> Batch III - 6,010,148 records
> Please note that receivers are not even reading at the max rate, the records read per receiver are near 1900 per second.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org