You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by unk1102 <um...@gmail.com> on 2015/07/31 20:24:51 UTC

How to create Spark DataFrame using custom Hadoop InputFormat?

Hi I am having my own Hadoop custom InputFormat which I need to use in
creating DataFrame. I tried to do the following

JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
DataFrame myFormatAsDataframe =
sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
myFormatAsDataframe.show();

Above code does not work and throws exception saying
java.lang.IllegalArgumentException object is not an instance of declaring
class

My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do
I make it work with Spark please guide I am new to Spark. Thank in advance.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Umesh Kacha <um...@gmail.com>.
Hi Ted thanks much for the reply. I cant share code on public forum. I have
created custom format by extending Hadoop mapred InputFormat class and same
way RecordReader class. If you can help me how do I use the same in
DataFrame it would be very helpful.

On Sat, Aug 1, 2015 at 12:12 AM, Ted Yu <yu...@gmail.com> wrote:

> Can you pastebin the complete stack trace ?
>
> If you can show skeleton of MyInputFormat and MyRecordWritable, that
> would provide additional information as well.
>
> Cheers
>
> On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:
>
>> Hi I am having my own Hadoop custom InputFormat which I need to use in
>> creating DataFrame. I tried to do the following
>>
>> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>>
>> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
>> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
>> DataFrame myFormatAsDataframe =
>> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
>> myFormatAsDataframe.show();
>>
>> Above code does not work and throws exception saying
>> java.lang.IllegalArgumentException object is not an instance of declaring
>> class
>>
>> My custom Hadoop InputFormat works very well with Hive,MapReduce etc How
>> do
>> I make it work with Spark please guide I am new to Spark. Thank in
>> advance.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the complete stack trace ?

If you can show skeleton of MyInputFormat and MyRecordWritable, that would
provide additional information as well.

Cheers

On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:

> Hi I am having my own Hadoop custom InputFormat which I need to use in
> creating DataFrame. I tried to do the following
>
> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>
> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
> DataFrame myFormatAsDataframe =
> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
> myFormatAsDataframe.show();
>
> Above code does not work and throws exception saying
> java.lang.IllegalArgumentException object is not an instance of declaring
> class
>
> My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do
> I make it work with Spark please guide I am new to Spark. Thank in advance.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Umesh Kacha <um...@gmail.com>.
Hi thanks Void works I use same custom format in Hive and it works with
Void as key. Please share example if you have to create DataFrame using
custom Hadoop format.
On Aug 1, 2015 2:07 AM, "Ted Yu" <yu...@gmail.com> wrote:

> I don't think using Void class is the right choice - it is not even a
> Writable.
>
> BTW in the future, capture text output instead of image.
>
> Thanks
>
> On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha <um...@gmail.com>
> wrote:
>
>> Hi Ted thanks My key is always Void because my custom format file is non
>> splittable so key is Void and values is  MyRecordWritable which extends
>> Hadoop Writable. I am sharing my log as snap please dont mind as I cant
>> paste code outside.
>>
>> Regards,
>> Umesh
>>
>> On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Looking closer at the code you posted, the error likely was caused by
>>> the 3rd parameter: Void.class
>>>
>>> It is supposed to be the class of key.
>>>
>>> FYI
>>>
>>> On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:
>>>
>>>> Hi I am having my own Hadoop custom InputFormat which I need to use in
>>>> creating DataFrame. I tried to do the following
>>>>
>>>> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>>>>
>>>> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
>>>> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
>>>> DataFrame myFormatAsDataframe =
>>>> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
>>>> myFormatAsDataframe.show();
>>>>
>>>> Above code does not work and throws exception saying
>>>> java.lang.IllegalArgumentException object is not an instance of
>>>> declaring
>>>> class
>>>>
>>>> My custom Hadoop InputFormat works very well with Hive,MapReduce etc
>>>> How do
>>>> I make it work with Spark please guide I am new to Spark. Thank in
>>>> advance.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Ted Yu <yu...@gmail.com>.
I don't think using Void class is the right choice - it is not even a
Writable.

BTW in the future, capture text output instead of image.

Thanks

On Fri, Jul 31, 2015 at 12:35 PM, Umesh Kacha <um...@gmail.com> wrote:

> Hi Ted thanks My key is always Void because my custom format file is non
> splittable so key is Void and values is  MyRecordWritable which extends
> Hadoop Writable. I am sharing my log as snap please dont mind as I cant
> paste code outside.
>
> Regards,
> Umesh
>
> On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Looking closer at the code you posted, the error likely was caused by the
>> 3rd parameter: Void.class
>>
>> It is supposed to be the class of key.
>>
>> FYI
>>
>> On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:
>>
>>> Hi I am having my own Hadoop custom InputFormat which I need to use in
>>> creating DataFrame. I tried to do the following
>>>
>>> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>>>
>>> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
>>> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
>>> DataFrame myFormatAsDataframe =
>>> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
>>> myFormatAsDataframe.show();
>>>
>>> Above code does not work and throws exception saying
>>> java.lang.IllegalArgumentException object is not an instance of declaring
>>> class
>>>
>>> My custom Hadoop InputFormat works very well with Hive,MapReduce etc How
>>> do
>>> I make it work with Spark please guide I am new to Spark. Thank in
>>> advance.
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Umesh Kacha <um...@gmail.com>.
Hi Ted thanks My key is always Void because my custom format file is non
splittable so key is Void and values is  MyRecordWritable which extends
Hadoop Writable. I am sharing my log as snap please dont mind as I cant
paste code outside.

Regards,
Umesh

On Sat, Aug 1, 2015 at 12:59 AM, Ted Yu <yu...@gmail.com> wrote:

> Looking closer at the code you posted, the error likely was caused by the
> 3rd parameter: Void.class
>
> It is supposed to be the class of key.
>
> FYI
>
> On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:
>
>> Hi I am having my own Hadoop custom InputFormat which I need to use in
>> creating DataFrame. I tried to do the following
>>
>> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>>
>> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
>> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
>> DataFrame myFormatAsDataframe =
>> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
>> myFormatAsDataframe.show();
>>
>> Above code does not work and throws exception saying
>> java.lang.IllegalArgumentException object is not an instance of declaring
>> class
>>
>> My custom Hadoop InputFormat works very well with Hive,MapReduce etc How
>> do
>> I make it work with Spark please guide I am new to Spark. Thank in
>> advance.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: How to create Spark DataFrame using custom Hadoop InputFormat?

Posted by Ted Yu <yu...@gmail.com>.
Looking closer at the code you posted, the error likely was caused by the
3rd parameter: Void.class

It is supposed to be the class of key.

FYI

On Fri, Jul 31, 2015 at 11:24 AM, unk1102 <um...@gmail.com> wrote:

> Hi I am having my own Hadoop custom InputFormat which I need to use in
> creating DataFrame. I tried to do the following
>
> JavaPairRDD<Void,MyRecordWritable> myFormatAsPairRdd =
>
> jsc.hadoopFile("hdfs://tmp/data/myformat.xyz",MyInputFormat.class,Void.class,MyRecordWritable.class);
> JavaRDD<MyRecordWritable> myformatRdd =  myFormatAsPairRdd.values();
> DataFrame myFormatAsDataframe =
> sqlContext.createDataFrame(myformatRdd,MyFormatSchema.class);
> myFormatAsDataframe.show();
>
> Above code does not work and throws exception saying
> java.lang.IllegalArgumentException object is not an instance of declaring
> class
>
> My custom Hadoop InputFormat works very well with Hive,MapReduce etc How do
> I make it work with Spark please guide I am new to Spark. Thank in advance.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-Spark-DataFrame-using-custom-Hadoop-InputFormat-tp24101.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>