You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by diplomatic Guru <di...@gmail.com> on 2015/10/21 19:00:25 UTC

How to check whether the RDD is empty or not

Hello All,

I have a Spark Streaming job that should  do some action only if the RDD is
not empty. This can be done easily with the spark batch RDD as I could
.take(1) and check whether it is empty or  not. But this cannot been done
in Spark Streaming DStrem


JavaPairInputDStream<LongWritable, Text> input =
ssc.fileStream(iFolder, LongWritable.class,Text.class,
TextInputFormat.class);

 if(inputLines!=null){
//do some action if it is not empty
}

Any ideas please?

Re: How to check whether the RDD is empty or not

Posted by diplomatic Guru <di...@gmail.com>.
Tathagata, thank you for the response.

I have two receivers in my Spark Stream job;  1 reads an endless stream of
data from flume and the other reads data from HDFS directory. However,
files do not get moved into HDFS frequently (let's say it gets moved every
10 minutes). This is where I need to check of there are any events in the
HDFS before doing any action on it.

The RDD.isEmpty() is available in JavaRDD and JavaPairRDD but
not JavaDStream and JavaPairDStream, but I could use foreach and then check
the RDD but it's long winded.

On 21 October 2015 at 20:00, Tathagata Das <td...@databricks.com> wrote:

> What do you mean by checking when a "DStream is empty"? DStream represents
> an endless stream of data, and at point of time checking whether it is
> empty or not does not make sense.
>
> FYI, there is RDD.isEmpty()
>
>
>
> On Wed, Oct 21, 2015 at 10:03 AM, diplomatic Guru <
> diplomaticguru@gmail.com> wrote:
>
>> I tried below code but still carrying out the action even though there is no new data.
>>
>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>>
>>  if(input != null){
>> //do some action if it is not empty
>> }
>>
>>
>> On 21 October 2015 at 18:00, diplomatic Guru <di...@gmail.com>
>> wrote:
>>
>>>
>>> Hello All,
>>>
>>> I have a Spark Streaming job that should  do some action only if the RDD
>>> is not empty. This can be done easily with the spark batch RDD as I could
>>> .take(1) and check whether it is empty or  not. But this cannot been done
>>> in Spark Streaming DStrem
>>>
>>>
>>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>>>
>>>  if(inputLines!=null){
>>> //do some action if it is not empty
>>> }
>>>
>>> Any ideas please?
>>>
>>>
>>>
>>>
>>
>

Re: How to check whether the RDD is empty or not

Posted by Gerard Maas <ge...@gmail.com>.
As TD mentions, there's no such thing as an 'empty DStream'. Some intervals
of a DStream could be empty, in which case the related RDD will be empty.
This means that you should express such condition based on the RDD's of the
DStream. Translated in code:

dstream.foreachRDD{ rdd =>
 if (!rdd.isEmpty) {
...do stuff ...
}
}


On Wed, Oct 21, 2015 at 9:00 PM, Tathagata Das <td...@databricks.com> wrote:

> What do you mean by checking when a "DStream is empty"? DStream represents
> an endless stream of data, and at point of time checking whether it is
> empty or not does not make sense.
>
> FYI, there is RDD.isEmpty()
>
>
>
> On Wed, Oct 21, 2015 at 10:03 AM, diplomatic Guru <
> diplomaticguru@gmail.com> wrote:
>
>> I tried below code but still carrying out the action even though there is no new data.
>>
>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>>
>>  if(input != null){
>> //do some action if it is not empty
>> }
>>
>>
>> On 21 October 2015 at 18:00, diplomatic Guru <di...@gmail.com>
>> wrote:
>>
>>>
>>> Hello All,
>>>
>>> I have a Spark Streaming job that should  do some action only if the RDD
>>> is not empty. This can be done easily with the spark batch RDD as I could
>>> .take(1) and check whether it is empty or  not. But this cannot been done
>>> in Spark Streaming DStrem
>>>
>>>
>>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>>>
>>>  if(inputLines!=null){
>>> //do some action if it is not empty
>>> }
>>>
>>> Any ideas please?
>>>
>>>
>>>
>>>
>>
>

Re: How to check whether the RDD is empty or not

Posted by Tathagata Das <td...@databricks.com>.
What do you mean by checking when a "DStream is empty"? DStream represents
an endless stream of data, and at point of time checking whether it is
empty or not does not make sense.

FYI, there is RDD.isEmpty()



On Wed, Oct 21, 2015 at 10:03 AM, diplomatic Guru <di...@gmail.com>
wrote:

> I tried below code but still carrying out the action even though there is no new data.
>
> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>
>  if(input != null){
> //do some action if it is not empty
> }
>
>
> On 21 October 2015 at 18:00, diplomatic Guru <di...@gmail.com>
> wrote:
>
>>
>> Hello All,
>>
>> I have a Spark Streaming job that should  do some action only if the RDD
>> is not empty. This can be done easily with the spark batch RDD as I could
>> .take(1) and check whether it is empty or  not. But this cannot been done
>> in Spark Streaming DStrem
>>
>>
>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>>
>>  if(inputLines!=null){
>> //do some action if it is not empty
>> }
>>
>> Any ideas please?
>>
>>
>>
>>
>

Re: How to check whether the RDD is empty or not

Posted by diplomatic Guru <di...@gmail.com>.
I tried below code but still carrying out the action even though there
is no new data.

JavaPairInputDStream<LongWritable, Text> input =
ssc.fileStream(iFolder, LongWritable.class,Text.class,
TextInputFormat.class);

 if(input != null){
//do some action if it is not empty
}


On 21 October 2015 at 18:00, diplomatic Guru <di...@gmail.com>
wrote:

>
> Hello All,
>
> I have a Spark Streaming job that should  do some action only if the RDD
> is not empty. This can be done easily with the spark batch RDD as I could
> .take(1) and check whether it is empty or  not. But this cannot been done
> in Spark Streaming DStrem
>
>
> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, LongWritable.class,Text.class, TextInputFormat.class);
>
>  if(inputLines!=null){
> //do some action if it is not empty
> }
>
> Any ideas please?
>
>
>
>