You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Raghav <ra...@gmail.com> on 2016/11/22 05:45:03 UTC
newbie question about RDD
Hi
I am extremely new to Spark. I have to read a file form HDFS, and get it in
memory in RDD format.
I have a Java class as follows:
class Person {
private long UUID;
private String FirstName;
private String LastName;
private String zip;
// public methods
}
The file in HDFS is as follows:
UUID. FirstName LastName Zip
7462 John Doll 06903
5231 Brad Finley 32820
Can someone point me how to get a JavaRDD<Person> object by reading the
file in HDFS ?
Thanks.
--
Raghav
Re: newbie question about RDD
Posted by Mohit Durgapal <du...@gmail.com>.
Hi Raghav,
Please refer to the following code:
SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setAppName("PersonApp");
//creating java spark context
JavaSparkContext sc = new JavaSparkContext(sparkConf);
//reading file from hfs into spark rdd , the name node is localhost
JavaRDD<String> personStringRDD =
sc.textFile("hdfs://localhost:9000/custom/inputPersonFile.txt");
//Converting from String RDD to Person RDD ...this is just an example,
you can replace the parsing with a better exception handled code
JavaRDD<Person> personObjectRDD = personStringRDD.map(personRow -> {
String[] personValues = personRow.split("\t");
return new Person(Long.parseLong(personValues[0]),
personValues[1], personValues[2],
personValues[3]);
});
//finally just printing the count of objects
System.out.println("Person count = "+personObjectRDD.count());
Regards
Mohit
On Tue, Nov 22, 2016 at 11:17 AM, Raghav <ra...@gmail.com> wrote:
> Sorry I forgot to ask how can I use spark context here ? I have hdfs
> directory path of the files, as well as the name node of hdfs cluster.
>
> Thanks for your help.
>
> On Mon, Nov 21, 2016 at 9:45 PM, Raghav <ra...@gmail.com> wrote:
>
>> Hi
>>
>> I am extremely new to Spark. I have to read a file form HDFS, and get it
>> in memory in RDD format.
>>
>> I have a Java class as follows:
>>
>> class Person {
>> private long UUID;
>> private String FirstName;
>> private String LastName;
>> private String zip;
>>
>> // public methods
>> }
>>
>> The file in HDFS is as follows:
>>
>> UUID. FirstName LastName Zip
>> 7462 John Doll 06903
>> 5231 Brad Finley 32820
>>
>>
>> Can someone point me how to get a JavaRDD<Person> object by reading the
>> file in HDFS ?
>>
>> Thanks.
>>
>> --
>> Raghav
>>
>
>
>
> --
> Raghav
>
Re: newbie question about RDD
Posted by Raghav <ra...@gmail.com>.
Sorry I forgot to ask how can I use spark context here ? I have hdfs
directory path of the files, as well as the name node of hdfs cluster.
Thanks for your help.
On Mon, Nov 21, 2016 at 9:45 PM, Raghav <ra...@gmail.com> wrote:
> Hi
>
> I am extremely new to Spark. I have to read a file form HDFS, and get it
> in memory in RDD format.
>
> I have a Java class as follows:
>
> class Person {
> private long UUID;
> private String FirstName;
> private String LastName;
> private String zip;
>
> // public methods
> }
>
> The file in HDFS is as follows:
>
> UUID. FirstName LastName Zip
> 7462 John Doll 06903
> 5231 Brad Finley 32820
>
>
> Can someone point me how to get a JavaRDD<Person> object by reading the
> file in HDFS ?
>
> Thanks.
>
> --
> Raghav
>
--
Raghav