You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arun Kumar <ar...@gmail.com> on 2013/10/28 09:52:25 UTC

Reading custom inputformat from hadoop dfs

Hi

I am trying to read some custom sequence file from hadoop file system. The
CustomInputFormat Class implements InputFormat<WritableComparable,
Writable> . I am able to read the file in JavaRDD as follows

JobConf job = new JobConf();

FileInputFormat.setInputPaths(job, new Path(input));

JavaPairRDD<WritableComparable, Writable> rdd =
spark.hadoopRDD(job, CustomInputFormat.class, WritableComparable.class,
Writable.class);

But I want to read directly from scala api, I am trying as follows


val job = new JobConf();

FileInputFormat.setInputPaths(job, new Path(input));

spark.hadoopRDD(job, classOf[CustomInputFormat],
classOf[WritableComparable[Object]], classOf[Writable]);


I am getting the following error:

[error] argument expression's type is not compatible with formal parameter
type;

[error]  found   : java.lang.Class[CustomInputFormat]

[error]  required: java.lang.Class[_ <:
org.apache.hadoop.mapred.InputFormat[?K,?V]]


But my CustomInputFormat Class implements InputFormat<WritableComparable,
Writable>. Is the generics causing the compilation problem?
WritableComparable expects a type parameter.

Re: Reading custom inputformat from hadoop dfs

Posted by Silvio Fiorito <si...@granturing.com>.
I was having the same probs trying to read from HCatalog with Scala API. The way around this was that I created a wrapper InputFormat in Java that uses Spark's SerializableWritable.

I hacked this up Friday afternoon, tested a few times, and it seemed to work well.

Here's an example: https://gist.github.com/granturing/7201912

I'm new to Spark and Scala so this may not be the "right way", but it worked for me! :)

From: Arun Kumar <ar...@gmail.com>>
Reply-To: "user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>" <us...@spark.incubator.apache.org>>
Date: Monday, October 28, 2013 4:52 AM
To: "user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>" <us...@spark.incubator.apache.org>>
Subject: Reading custom inputformat from hadoop dfs

Hi

I am trying to read some custom sequence file from hadoop file system. The CustomInputFormat Class implements InputFormat<WritableComparable, Writable> . I am able to read the file in JavaRDD as follows


JobConf job = new JobConf();

FileInputFormat.setInputPaths(job, new Path(input));

JavaPairRDD<WritableComparable, Writable> rdd = spark.hadoopRDD(job, CustomInputFormat.class, WritableComparable.class, Writable.class);

But I want to read directly from scala api, I am trying as follows


val job = new JobConf();

FileInputFormat.setInputPaths(job, new Path(input));

spark.hadoopRDD(job, classOf[CustomInputFormat], classOf[WritableComparable[Object]], classOf[Writable]);


I am getting the following error:

[error] argument expression's type is not compatible with formal parameter type;

[error]  found   : java.lang.Class[CustomInputFormat]

[error]  required: java.lang.Class[_ <: org.apache.hadoop.mapred.InputFormat[?K,?V]]


But my CustomInputFormat Class implements InputFormat<WritableComparable, Writable>. Is the generics causing the compilation problem? WritableComparable expects a type parameter.