You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sai Prasanna <an...@gmail.com> on 2014/04/24 06:51:13 UTC

Access Last Element of RDD

Hi All, Some help !
RDD.first or RDD.take(1) gives the first item, is there a straight forward
way to access the last element in a similar way ?

I coudnt fine a tail/last method for RDD. !!

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
What i observe is, this way of computing is very inefficient. It returns
all the elements of the RDD to a List which takes considerable amount of
time.
Then it calculates the last element.

I have a file of size 3 GB in which i ran a lot of aggregate operations
which dint took the time that this take(RDD.count) took.

Is there an efficient way ? My guess is there should be one, since its a
basic operation.


On Thu, Apr 24, 2014 at 11:14 AM, Adnan Yaqoob <ns...@gmail.com> wrote:

> This function will return scala List, you can use List's last function to
> get the last element.
>
> For example:
>
> RDD.take(RDD.count()).last
>
>
> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>
>> I want only to access the last element.
>>
>>
>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com>wrote:
>>
>>> Oh ya, Thanks Adnan.
>>>
>>>
>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>
>>>> You can use following code:
>>>>
>>>> RDD.take(RDD.count())
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:
>>>>
>>>>> Hi All, Some help !
>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>> forward way to access the last element in a similar way ?
>>>>>
>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
Thanks Cheng !!


On Thu, Apr 24, 2014 at 5:43 PM, Cheng Lian <li...@gmail.com> wrote:

> You may try this:
>
> val lastOption = sc.textFile("input").mapPartitions { iterator =>
>   if (iterator.isEmpty) {
>     iterator
>   } else {
>     Iterator
>       .continually((iterator.next(), iterator.hasNext()))
>       .collect { case (value, false) => value }
>       .take(1)
>   }
> }.collect().lastOption
>
> Iterator based data access ensures O(1) space complexity and it runs
> faster because different partitions are processed in parallel. lastOptionis used instead of
> last to deal with empty file.
>
>
> On Thu, Apr 24, 2014 at 7:38 PM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Hi All, Finally i wrote the following code, which is felt does optimally
>> if not the most optimum one.
>> Using file pointers, seeking the byte after the last \n but backwards !!
>> This is memory efficient and i hope even unix tail implementation should
>> be something similar !!
>>
>> import java.io.RandomAccessFile
>> import java.io.IOException
>> var FILEPATH="/home/sparkcluster/hadoop-2.3.0/temp";
>>         var fileHandler = new RandomAccessFile( FILEPATH, "r" );
>>         var fileLength = fileHandler.length() - 1;
>>         var cond = 1;
>>         var filePointer = fileLength-1;
>>         var toRead= -1;
>>         while(filePointer != -1 && cond!=0){
>>                  fileHandler.seek( filePointer );
>>                  var readByte = fileHandler.readByte();
>>                  if( readByte == 0xA && filePointer != fileLength )
>> cond=0;
>>                   else if( readByte == 0xD && filePointer != fileLength
>> - 1 ) cond=0;
>>
>>                  filePointer=filePointer-1; toRead=toRead+1;
>>         }
>>         filePointer=filePointer+2;
>>         var bytes : Array[Byte] = new Array[Byte](toRead);
>>         fileHandler.seek(filePointer);
>>         fileHandler.read(bytes);
>>         var bdd=new String(bytes);  /*bdd contains the last line*/
>>
>>
>>
>>
>> On Thu, Apr 24, 2014 at 11:42 AM, Sai Prasanna <an...@gmail.com>wrote:
>>
>>> Thanks Guys !
>>>
>>>
>>> On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
>>> sourav.chandra@livestream.com> wrote:
>>>
>>>> Also same thing can be done using rdd.top(1)(reverseOrdering)
>>>>
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
>>>> sourav.chandra@livestream.com> wrote:
>>>>
>>>>> You can use rdd.takeOrdered(1)(reverseOrdrering)
>>>>>
>>>>> reverseOrdering is you Ordering[T] instance where you define the
>>>>> ordering logic. This you have to pass in the method
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
>>>>> fnothaft@berkeley.edu> wrote:
>>>>>
>>>>>> If you do this, you could simplify to:
>>>>>>
>>>>>> RDD.collect().last
>>>>>>
>>>>>> However, this has the problem of collecting all data to the driver.
>>>>>>
>>>>>> Is your data sorted? If so, you could reverse the sort and take the
>>>>>> first. Alternatively, a hackey implementation might involve a
>>>>>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>>>>>> except for the last. For the last partition, you would filter all elements
>>>>>> except for the last element in your iterator. This should leave one
>>>>>> element, which is your last element.
>>>>>>
>>>>>> Frank Austin Nothaft
>>>>>> fnothaft@berkeley.edu
>>>>>> fnothaft@eecs.berkeley.edu
>>>>>> 202-340-0466
>>>>>>
>>>>>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> This function will return scala List, you can use List's last
>>>>>> function to get the last element.
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> RDD.take(RDD.count()).last
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <
>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>
>>>>>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>>>>>
>>>>>>> I want only to access the last element.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <
>>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>>
>>>>>>>> Oh ya, Thanks Adnan.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> You can use following code:
>>>>>>>>>
>>>>>>>>> RDD.take(RDD.count())
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <
>>>>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All, Some help !
>>>>>>>>>> RDD.first or RDD.take(1) gives the first item, is there a
>>>>>>>>>> straight forward way to access the last element in a similar way ?
>>>>>>>>>>
>>>>>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Sourav Chandra
>>>>>
>>>>> Senior Software Engineer
>>>>>
>>>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>>>
>>>>> sourav.chandra@livestream.com
>>>>>
>>>>> o: +91 80 4121 8723
>>>>>
>>>>> m: +91 988 699 3746
>>>>>
>>>>> skype: sourav.chandra
>>>>>
>>>>> Livestream
>>>>>
>>>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main,
>>>>> 3rd Block, Koramangala Industrial Area,
>>>>>
>>>>> Bangalore 560034
>>>>>
>>>>> www.livestream.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sourav Chandra
>>>>
>>>> Senior Software Engineer
>>>>
>>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>>
>>>> sourav.chandra@livestream.com
>>>>
>>>> o: +91 80 4121 8723
>>>>
>>>> m: +91 988 699 3746
>>>>
>>>> skype: sourav.chandra
>>>>
>>>> Livestream
>>>>
>>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>>>> Block, Koramangala Industrial Area,
>>>>
>>>> Bangalore 560034
>>>>
>>>> www.livestream.com
>>>>
>>>
>>>
>>
>

Re: Access Last Element of RDD

Posted by Cheng Lian <li...@gmail.com>.
You may try this:

val lastOption = sc.textFile("input").mapPartitions { iterator =>
  if (iterator.isEmpty) {
    iterator
  } else {
    Iterator
      .continually((iterator.next(), iterator.hasNext()))
      .collect { case (value, false) => value }
      .take(1)
  }
}.collect().lastOption

Iterator based data access ensures O(1) space complexity and it runs faster
because different partitions are processed in parallel. lastOption is used
instead of last to deal with empty file.


On Thu, Apr 24, 2014 at 7:38 PM, Sai Prasanna <an...@gmail.com>wrote:

> Hi All, Finally i wrote the following code, which is felt does optimally
> if not the most optimum one.
> Using file pointers, seeking the byte after the last \n but backwards !!
> This is memory efficient and i hope even unix tail implementation should
> be something similar !!
>
> import java.io.RandomAccessFile
> import java.io.IOException
> var FILEPATH="/home/sparkcluster/hadoop-2.3.0/temp";
>         var fileHandler = new RandomAccessFile( FILEPATH, "r" );
>         var fileLength = fileHandler.length() - 1;
>         var cond = 1;
>         var filePointer = fileLength-1;
>         var toRead= -1;
>         while(filePointer != -1 && cond!=0){
>                  fileHandler.seek( filePointer );
>                  var readByte = fileHandler.readByte();
>                  if( readByte == 0xA && filePointer != fileLength )
> cond=0;
>                   else if( readByte == 0xD && filePointer != fileLength -
> 1 ) cond=0;
>
>                  filePointer=filePointer-1; toRead=toRead+1;
>         }
>         filePointer=filePointer+2;
>         var bytes : Array[Byte] = new Array[Byte](toRead);
>         fileHandler.seek(filePointer);
>         fileHandler.read(bytes);
>         var bdd=new String(bytes);  /*bdd contains the last line*/
>
>
>
>
> On Thu, Apr 24, 2014 at 11:42 AM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Thanks Guys !
>>
>>
>> On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
>> sourav.chandra@livestream.com> wrote:
>>
>>> Also same thing can be done using rdd.top(1)(reverseOrdering)
>>>
>>>
>>>
>>> On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
>>> sourav.chandra@livestream.com> wrote:
>>>
>>>> You can use rdd.takeOrdered(1)(reverseOrdrering)
>>>>
>>>> reverseOrdering is you Ordering[T] instance where you define the
>>>> ordering logic. This you have to pass in the method
>>>>
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
>>>> fnothaft@berkeley.edu> wrote:
>>>>
>>>>> If you do this, you could simplify to:
>>>>>
>>>>> RDD.collect().last
>>>>>
>>>>> However, this has the problem of collecting all data to the driver.
>>>>>
>>>>> Is your data sorted? If so, you could reverse the sort and take the
>>>>> first. Alternatively, a hackey implementation might involve a
>>>>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>>>>> except for the last. For the last partition, you would filter all elements
>>>>> except for the last element in your iterator. This should leave one
>>>>> element, which is your last element.
>>>>>
>>>>> Frank Austin Nothaft
>>>>> fnothaft@berkeley.edu
>>>>> fnothaft@eecs.berkeley.edu
>>>>> 202-340-0466
>>>>>
>>>>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:
>>>>>
>>>>> This function will return scala List, you can use List's last function
>>>>> to get the last element.
>>>>>
>>>>> For example:
>>>>>
>>>>> RDD.take(RDD.count()).last
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <
>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>
>>>>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>>>>
>>>>>> I want only to access the last element.
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <
>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>
>>>>>>> Oh ya, Thanks Adnan.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>>>>>
>>>>>>>> You can use following code:
>>>>>>>>
>>>>>>>> RDD.take(RDD.count())
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <
>>>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi All, Some help !
>>>>>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>>>>>> forward way to access the last element in a similar way ?
>>>>>>>>>
>>>>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Sourav Chandra
>>>>
>>>> Senior Software Engineer
>>>>
>>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>>
>>>> sourav.chandra@livestream.com
>>>>
>>>> o: +91 80 4121 8723
>>>>
>>>> m: +91 988 699 3746
>>>>
>>>> skype: sourav.chandra
>>>>
>>>> Livestream
>>>>
>>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>>>> Block, Koramangala Industrial Area,
>>>>
>>>> Bangalore 560034
>>>>
>>>> www.livestream.com
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Sourav Chandra
>>>
>>> Senior Software Engineer
>>>
>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>
>>> sourav.chandra@livestream.com
>>>
>>> o: +91 80 4121 8723
>>>
>>> m: +91 988 699 3746
>>>
>>> skype: sourav.chandra
>>>
>>> Livestream
>>>
>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>>> Block, Koramangala Industrial Area,
>>>
>>> Bangalore 560034
>>>
>>> www.livestream.com
>>>
>>
>>
>

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
Hi All, Finally i wrote the following code, which is felt does optimally if
not the most optimum one.
Using file pointers, seeking the byte after the last \n but backwards !!
This is memory efficient and i hope even unix tail implementation should be
something similar !!

import java.io.RandomAccessFile
import java.io.IOException
var FILEPATH="/home/sparkcluster/hadoop-2.3.0/temp";
        var fileHandler = new RandomAccessFile( FILEPATH, "r" );
        var fileLength = fileHandler.length() - 1;
        var cond = 1;
        var filePointer = fileLength-1;
        var toRead= -1;
        while(filePointer != -1 && cond!=0){
                 fileHandler.seek( filePointer );
                 var readByte = fileHandler.readByte();
                 if( readByte == 0xA && filePointer != fileLength ) cond=0;
                 else if( readByte == 0xD && filePointer != fileLength - 1
) cond=0;

                 filePointer=filePointer-1; toRead=toRead+1;
        }
        filePointer=filePointer+2;
        var bytes : Array[Byte] = new Array[Byte](toRead);
        fileHandler.seek(filePointer);
        fileHandler.read(bytes);
        var bdd=new String(bytes);  /*bdd contains the last line*/




On Thu, Apr 24, 2014 at 11:42 AM, Sai Prasanna <an...@gmail.com>wrote:

> Thanks Guys !
>
>
> On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
> sourav.chandra@livestream.com> wrote:
>
>> Also same thing can be done using rdd.top(1)(reverseOrdering)
>>
>>
>>
>> On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
>> sourav.chandra@livestream.com> wrote:
>>
>>> You can use rdd.takeOrdered(1)(reverseOrdrering)
>>>
>>> reverseOrdering is you Ordering[T] instance where you define the
>>> ordering logic. This you have to pass in the method
>>>
>>>
>>>
>>> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
>>> fnothaft@berkeley.edu> wrote:
>>>
>>>> If you do this, you could simplify to:
>>>>
>>>> RDD.collect().last
>>>>
>>>> However, this has the problem of collecting all data to the driver.
>>>>
>>>> Is your data sorted? If so, you could reverse the sort and take the
>>>> first. Alternatively, a hackey implementation might involve a
>>>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>>>> except for the last. For the last partition, you would filter all elements
>>>> except for the last element in your iterator. This should leave one
>>>> element, which is your last element.
>>>>
>>>> Frank Austin Nothaft
>>>> fnothaft@berkeley.edu
>>>> fnothaft@eecs.berkeley.edu
>>>> 202-340-0466
>>>>
>>>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:
>>>>
>>>> This function will return scala List, you can use List's last function
>>>> to get the last element.
>>>>
>>>> For example:
>>>>
>>>> RDD.take(RDD.count()).last
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <ansaiprasanna@gmail.com
>>>> > wrote:
>>>>
>>>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>>>
>>>>> I want only to access the last element.
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <
>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>
>>>>>> Oh ya, Thanks Adnan.
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>>>>
>>>>>>> You can use following code:
>>>>>>>
>>>>>>> RDD.take(RDD.count())
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <
>>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi All, Some help !
>>>>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>>>>> forward way to access the last element in a similar way ?
>>>>>>>>
>>>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Sourav Chandra
>>>
>>> Senior Software Engineer
>>>
>>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>>
>>> sourav.chandra@livestream.com
>>>
>>> o: +91 80 4121 8723
>>>
>>> m: +91 988 699 3746
>>>
>>> skype: sourav.chandra
>>>
>>> Livestream
>>>
>>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>>> Block, Koramangala Industrial Area,
>>>
>>> Bangalore 560034
>>>
>>> www.livestream.com
>>>
>>
>>
>>
>> --
>>
>> Sourav Chandra
>>
>> Senior Software Engineer
>>
>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>
>> sourav.chandra@livestream.com
>>
>> o: +91 80 4121 8723
>>
>> m: +91 988 699 3746
>>
>> skype: sourav.chandra
>>
>> Livestream
>>
>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>> Block, Koramangala Industrial Area,
>>
>> Bangalore 560034
>>
>> www.livestream.com
>>
>
>

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
Thanks Guys !


On Thu, Apr 24, 2014 at 11:29 AM, Sourav Chandra <
sourav.chandra@livestream.com> wrote:

> Also same thing can be done using rdd.top(1)(reverseOrdering)
>
>
>
> On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
> sourav.chandra@livestream.com> wrote:
>
>> You can use rdd.takeOrdered(1)(reverseOrdrering)
>>
>> reverseOrdering is you Ordering[T] instance where you define the ordering
>> logic. This you have to pass in the method
>>
>>
>>
>> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
>> fnothaft@berkeley.edu> wrote:
>>
>>> If you do this, you could simplify to:
>>>
>>> RDD.collect().last
>>>
>>> However, this has the problem of collecting all data to the driver.
>>>
>>> Is your data sorted? If so, you could reverse the sort and take the
>>> first. Alternatively, a hackey implementation might involve a
>>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>>> except for the last. For the last partition, you would filter all elements
>>> except for the last element in your iterator. This should leave one
>>> element, which is your last element.
>>>
>>> Frank Austin Nothaft
>>> fnothaft@berkeley.edu
>>> fnothaft@eecs.berkeley.edu
>>> 202-340-0466
>>>
>>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:
>>>
>>> This function will return scala List, you can use List's last function
>>> to get the last element.
>>>
>>> For example:
>>>
>>> RDD.take(RDD.count()).last
>>>
>>>
>>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com>wrote:
>>>
>>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>>
>>>> I want only to access the last element.
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <ansaiprasanna@gmail.com
>>>> > wrote:
>>>>
>>>>> Oh ya, Thanks Adnan.
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>>>
>>>>>> You can use following code:
>>>>>>
>>>>>> RDD.take(RDD.count())
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <
>>>>>> ansaiprasanna@gmail.com> wrote:
>>>>>>
>>>>>>> Hi All, Some help !
>>>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>>>> forward way to access the last element in a similar way ?
>>>>>>>
>>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Sourav Chandra
>>
>> Senior Software Engineer
>>
>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>>
>> sourav.chandra@livestream.com
>>
>> o: +91 80 4121 8723
>>
>> m: +91 988 699 3746
>>
>> skype: sourav.chandra
>>
>> Livestream
>>
>> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
>> Block, Koramangala Industrial Area,
>>
>> Bangalore 560034
>>
>> www.livestream.com
>>
>
>
>
> --
>
> Sourav Chandra
>
> Senior Software Engineer
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>
> sourav.chandra@livestream.com
>
> o: +91 80 4121 8723
>
> m: +91 988 699 3746
>
> skype: sourav.chandra
>
> Livestream
>
> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
> Block, Koramangala Industrial Area,
>
> Bangalore 560034
>
> www.livestream.com
>

Re: Access Last Element of RDD

Posted by Sourav Chandra <so...@livestream.com>.
Also same thing can be done using rdd.top(1)(reverseOrdering)



On Thu, Apr 24, 2014 at 11:28 AM, Sourav Chandra <
sourav.chandra@livestream.com> wrote:

> You can use rdd.takeOrdered(1)(reverseOrdrering)
>
> reverseOrdering is you Ordering[T] instance where you define the ordering
> logic. This you have to pass in the method
>
>
>
> On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
> fnothaft@berkeley.edu> wrote:
>
>> If you do this, you could simplify to:
>>
>> RDD.collect().last
>>
>> However, this has the problem of collecting all data to the driver.
>>
>> Is your data sorted? If so, you could reverse the sort and take the
>> first. Alternatively, a hackey implementation might involve a
>> mapPartitionsWithIndex that returns an empty iterator for all partitions
>> except for the last. For the last partition, you would filter all elements
>> except for the last element in your iterator. This should leave one
>> element, which is your last element.
>>
>> Frank Austin Nothaft
>> fnothaft@berkeley.edu
>> fnothaft@eecs.berkeley.edu
>> 202-340-0466
>>
>> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:
>>
>> This function will return scala List, you can use List's last function to
>> get the last element.
>>
>> For example:
>>
>> RDD.take(RDD.count()).last
>>
>>
>> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com>wrote:
>>
>>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>>
>>> I want only to access the last element.
>>>
>>>
>>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com>wrote:
>>>
>>>> Oh ya, Thanks Adnan.
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>>
>>>>> You can use following code:
>>>>>
>>>>> RDD.take(RDD.count())
>>>>>
>>>>>
>>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <ansaiprasanna@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi All, Some help !
>>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>>> forward way to access the last element in a similar way ?
>>>>>>
>>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>
> --
>
> Sourav Chandra
>
> Senior Software Engineer
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>
> sourav.chandra@livestream.com
>
> o: +91 80 4121 8723
>
> m: +91 988 699 3746
>
> skype: sourav.chandra
>
> Livestream
>
> "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
> Block, Koramangala Industrial Area,
>
> Bangalore 560034
>
> www.livestream.com
>



-- 

Sourav Chandra

Senior Software Engineer

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

sourav.chandra@livestream.com

o: +91 80 4121 8723

m: +91 988 699 3746

skype: sourav.chandra

Livestream

"Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
Block, Koramangala Industrial Area,

Bangalore 560034

www.livestream.com

Re: Access Last Element of RDD

Posted by Sourav Chandra <so...@livestream.com>.
You can use rdd.takeOrdered(1)(reverseOrdrering)

reverseOrdering is you Ordering[T] instance where you define the ordering
logic. This you have to pass in the method



On Thu, Apr 24, 2014 at 11:21 AM, Frank Austin Nothaft <
fnothaft@berkeley.edu> wrote:

> If you do this, you could simplify to:
>
> RDD.collect().last
>
> However, this has the problem of collecting all data to the driver.
>
> Is your data sorted? If so, you could reverse the sort and take the first.
> Alternatively, a hackey implementation might involve a
> mapPartitionsWithIndex that returns an empty iterator for all partitions
> except for the last. For the last partition, you would filter all elements
> except for the last element in your iterator. This should leave one
> element, which is your last element.
>
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466
>
> On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:
>
> This function will return scala List, you can use List's last function to
> get the last element.
>
> For example:
>
> RDD.take(RDD.count()).last
>
>
> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>>
>> I want only to access the last element.
>>
>>
>> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com>wrote:
>>
>>> Oh ya, Thanks Adnan.
>>>
>>>
>>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>>
>>>> You can use following code:
>>>>
>>>> RDD.take(RDD.count())
>>>>
>>>>
>>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:
>>>>
>>>>> Hi All, Some help !
>>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>>> forward way to access the last element in a similar way ?
>>>>>
>>>>> I coudnt fine a tail/last method for RDD. !!
>>>>>
>>>>
>>>>
>>>
>>
>
>


-- 

Sourav Chandra

Senior Software Engineer

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

sourav.chandra@livestream.com

o: +91 80 4121 8723

m: +91 988 699 3746

skype: sourav.chandra

Livestream

"Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
Block, Koramangala Industrial Area,

Bangalore 560034

www.livestream.com

Re: Access Last Element of RDD

Posted by Frank Austin Nothaft <fn...@berkeley.edu>.
If you do this, you could simplify to:

RDD.collect().last

However, this has the problem of collecting all data to the driver.

Is your data sorted? If so, you could reverse the sort and take the first. Alternatively, a hackey implementation might involve a mapPartitionsWithIndex that returns an empty iterator for all partitions except for the last. For the last partition, you would filter all elements except for the last element in your iterator. This should leave one element, which is your last element.

Frank Austin Nothaft
fnothaft@berkeley.edu
fnothaft@eecs.berkeley.edu
202-340-0466

On Apr 23, 2014, at 10:44 PM, Adnan Yaqoob <ns...@gmail.com> wrote:

> This function will return scala List, you can use List's last function to get the last element.
> 
> For example:
> 
> RDD.take(RDD.count()).last
> 
> 
> On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com> wrote:
> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
> 
> I want only to access the last element.
> 
> 
> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com> wrote:
> Oh ya, Thanks Adnan.
> 
> 
> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com> wrote:
> You can use following code:
> 
> RDD.take(RDD.count())
> 
> 
> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com> wrote:
> Hi All, Some help !
> RDD.first or RDD.take(1) gives the first item, is there a straight forward way to access the last element in a similar way ?
> 
> I coudnt fine a tail/last method for RDD. !!
> 
> 
> 
> 


Re: Access Last Element of RDD

Posted by Adnan Yaqoob <ns...@gmail.com>.
This function will return scala List, you can use List's last function to
get the last element.

For example:

RDD.take(RDD.count()).last


On Thu, Apr 24, 2014 at 10:28 AM, Sai Prasanna <an...@gmail.com>wrote:

> Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.
>
> I want only to access the last element.
>
>
> On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Oh ya, Thanks Adnan.
>>
>>
>> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com>wrote:
>>
>>> You can use following code:
>>>
>>> RDD.take(RDD.count())
>>>
>>>
>>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:
>>>
>>>> Hi All, Some help !
>>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>>> forward way to access the last element in a similar way ?
>>>>
>>>> I coudnt fine a tail/last method for RDD. !!
>>>>
>>>
>>>
>>
>

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
Adnan, but RDD.take(RDD.count()) returns all the elements of the RDD.

I want only to access the last element.


On Thu, Apr 24, 2014 at 10:33 AM, Sai Prasanna <an...@gmail.com>wrote:

> Oh ya, Thanks Adnan.
>
>
> On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com> wrote:
>
>> You can use following code:
>>
>> RDD.take(RDD.count())
>>
>>
>> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:
>>
>>> Hi All, Some help !
>>> RDD.first or RDD.take(1) gives the first item, is there a straight
>>> forward way to access the last element in a similar way ?
>>>
>>> I coudnt fine a tail/last method for RDD. !!
>>>
>>
>>
>

Re: Access Last Element of RDD

Posted by Sai Prasanna <an...@gmail.com>.
Oh ya, Thanks Adnan.


On Thu, Apr 24, 2014 at 10:30 AM, Adnan Yaqoob <ns...@gmail.com> wrote:

> You can use following code:
>
> RDD.take(RDD.count())
>
>
> On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:
>
>> Hi All, Some help !
>> RDD.first or RDD.take(1) gives the first item, is there a straight
>> forward way to access the last element in a similar way ?
>>
>> I coudnt fine a tail/last method for RDD. !!
>>
>
>

Re: Access Last Element of RDD

Posted by Adnan Yaqoob <ns...@gmail.com>.
You can use following code:

RDD.take(RDD.count())


On Thu, Apr 24, 2014 at 9:51 AM, Sai Prasanna <an...@gmail.com>wrote:

> Hi All, Some help !
> RDD.first or RDD.take(1) gives the first item, is there a straight forward
> way to access the last element in a similar way ?
>
> I coudnt fine a tail/last method for RDD. !!
>