You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Naresh Peshwe <na...@gmail.com> on 2022/11/06 06:07:15 UTC

ClassCastException while reading parquet data via Hive metastore

Hi all,
I am trying to read data (using spark sql) via a hive metastore which has a
column of type bigint. Underlying parquet data has int as the datatype for
the same column. I am getting the following error while trying to read the
data using spark sql -

java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot
be cast to org.apache.hadoop.io.LongWritable
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
...

I believe it is related to
https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on
how I can work around this issue?

Spark version: 2.4.5

Regards,

Naresh

Re: ClassCastException while reading parquet data via Hive metastore

Posted by Naresh Peshwe <na...@gmail.com>.
Understood, thanks Evyatar.

On Mon, Nov 7, 2022, 17:42 Evy M <ev...@gmail.com> wrote:

> TBH I'm not sure why there is an issue casting the int to BigInt and I'm
> also not sure about the Jira ticket, I hope someone else can help here.
> Regarding the solution - IMO the more correct solution here would be to
> modify the Hive table to use INT since it seems that there is no need to
> use BigInt (Long). This approach is also far more simple since it won't
> require any rewrites of the data which might be a costly operation -
> changing the table in the metastore is a pretty effortless operation.
>
> Best,
> Evyatar
>
> On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe <na...@gmail.com>
> wrote:
>
>> Hi Evyatar,
>> Yes, directly reading the parquet data works. Since we use hive metastore
>> to obfuscate the underlying datastore details, we want to avoid directly
>> accessing the files.
>> I guess then the only option is to either change the data or change the
>> schema of the hive metastore as you suggested right?
>> But int to long / bigint seems to be a reasonable evolution (correct me
>> if I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
>> reason for that getting closed?
>>
>>
>> Regards,
>> Naresh
>>
>>
>> On Mon, Nov 7, 2022, 16:55 Evy M <ev...@gmail.com> wrote:
>>
>>> Hi Naresh,
>>>
>>> Have you tried any of the following in order to resolve your issue:
>>>
>>>    1. Reading the Parquet files (directly, not via Hive [i.e,
>>>    spark.read.parquet(<path>)]), casting to LongType and creating the hive
>>>    table based on this dataframe? Hive's BigInt and Spark's Long should have
>>>    the same values as seen here Hive Types
>>>    <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
>>>    ; Spark Types
>>>    <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
>>>    2. Modifying the hive table to have the columns as INT? If the
>>>    underlying data is an INT, I guess there is no reason to have a BigInt
>>>    definition for that column.
>>>
>>> I hope this might help.
>>>
>>> Best,
>>> Evyatar
>>>
>>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <na...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>> I am trying to read data (using spark sql) via a hive metastore which
>>>> has a column of type bigint. Underlying parquet data has int as the
>>>> datatype for the same column. I am getting the following error while trying
>>>> to read the data using spark sql -
>>>>
>>>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable
>>>> at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>>>> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>>>> ...
>>>>
>>>> I believe it is related to https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I can work around this issue?
>>>>
>>>> Spark version: 2.4.5
>>>>
>>>> Regards,
>>>>
>>>> Naresh
>>>>
>>>>
>>>>

Re: ClassCastException while reading parquet data via Hive metastore

Posted by Evy M <ev...@gmail.com>.
TBH I'm not sure why there is an issue casting the int to BigInt and I'm
also not sure about the Jira ticket, I hope someone else can help here.
Regarding the solution - IMO the more correct solution here would be to
modify the Hive table to use INT since it seems that there is no need to
use BigInt (Long). This approach is also far more simple since it won't
require any rewrites of the data which might be a costly operation -
changing the table in the metastore is a pretty effortless operation.

Best,
Evyatar

On Mon, 7 Nov 2022 at 13:37, Naresh Peshwe <na...@gmail.com>
wrote:

> Hi Evyatar,
> Yes, directly reading the parquet data works. Since we use hive metastore
> to obfuscate the underlying datastore details, we want to avoid directly
> accessing the files.
> I guess then the only option is to either change the data or change the
> schema of the hive metastore as you suggested right?
> But int to long / bigint seems to be a reasonable evolution (correct me if
> I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
> reason for that getting closed?
>
>
> Regards,
> Naresh
>
>
> On Mon, Nov 7, 2022, 16:55 Evy M <ev...@gmail.com> wrote:
>
>> Hi Naresh,
>>
>> Have you tried any of the following in order to resolve your issue:
>>
>>    1. Reading the Parquet files (directly, not via Hive [i.e,
>>    spark.read.parquet(<path>)]), casting to LongType and creating the hive
>>    table based on this dataframe? Hive's BigInt and Spark's Long should have
>>    the same values as seen here Hive Types
>>    <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
>>    ; Spark Types
>>    <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
>>    2. Modifying the hive table to have the columns as INT? If the
>>    underlying data is an INT, I guess there is no reason to have a BigInt
>>    definition for that column.
>>
>> I hope this might help.
>>
>> Best,
>> Evyatar
>>
>> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <na...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>> I am trying to read data (using spark sql) via a hive metastore which
>>> has a column of type bigint. Underlying parquet data has int as the
>>> datatype for the same column. I am getting the following error while trying
>>> to read the data using spark sql -
>>>
>>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable
>>> at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>>> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>>> ...
>>>
>>> I believe it is related to https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I can work around this issue?
>>>
>>> Spark version: 2.4.5
>>>
>>> Regards,
>>>
>>> Naresh
>>>
>>>
>>>

Re: ClassCastException while reading parquet data via Hive metastore

Posted by Naresh Peshwe <na...@gmail.com>.
Hi Evyatar,
Yes, directly reading the parquet data works. Since we use hive metastore
to obfuscate the underlying datastore details, we want to avoid directly
accessing the files.
I guess then the only option is to either change the data or change the
schema of the hive metastore as you suggested right?
But int to long / bigint seems to be a reasonable evolution (correct me if
I'm wrong). Is it possible to reopen the jira i mentioned earlier? Any
reason for that getting closed?


Regards,
Naresh


On Mon, Nov 7, 2022, 16:55 Evy M <ev...@gmail.com> wrote:

> Hi Naresh,
>
> Have you tried any of the following in order to resolve your issue:
>
>    1. Reading the Parquet files (directly, not via Hive [i.e,
>    spark.read.parquet(<path>)]), casting to LongType and creating the hive
>    table based on this dataframe? Hive's BigInt and Spark's Long should have
>    the same values as seen here Hive Types
>    <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
>    ; Spark Types
>    <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
>    2. Modifying the hive table to have the columns as INT? If the
>    underlying data is an INT, I guess there is no reason to have a BigInt
>    definition for that column.
>
> I hope this might help.
>
> Best,
> Evyatar
>
> On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <na...@gmail.com>
> wrote:
>
>> Hi all,
>> I am trying to read data (using spark sql) via a hive metastore which has
>> a column of type bigint. Underlying parquet data has int as the datatype
>> for the same column. I am getting the following error while trying to read
>> the data using spark sql -
>>
>> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable
>> at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
>> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
>> ...
>>
>> I believe it is related to https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I can work around this issue?
>>
>> Spark version: 2.4.5
>>
>> Regards,
>>
>> Naresh
>>
>>
>>

Re: ClassCastException while reading parquet data via Hive metastore

Posted by Evy M <ev...@gmail.com>.
Hi Naresh,

Have you tried any of the following in order to resolve your issue:

   1. Reading the Parquet files (directly, not via Hive [i.e,
   spark.read.parquet(<path>)]), casting to LongType and creating the hive
   table based on this dataframe? Hive's BigInt and Spark's Long should have
   the same values as seen here Hive Types
   <https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-IntegralTypes(TINYINT,SMALLINT,INT/INTEGER,BIGINT)>
   ; Spark Types
   <https://spark.apache.org/docs/latest/sql-ref-datatypes.html>.
   2. Modifying the hive table to have the columns as INT? If the
   underlying data is an INT, I guess there is no reason to have a BigInt
   definition for that column.

I hope this might help.

Best,
Evyatar

On Sun, 6 Nov 2022 at 15:21, Naresh Peshwe <na...@gmail.com>
wrote:

> Hi all,
> I am trying to read data (using spark sql) via a hive metastore which has
> a column of type bigint. Underlying parquet data has int as the datatype
> for the same column. I am getting the following error while trying to read
> the data using spark sql -
>
> java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable
> at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableLongObjectInspector.get(WritableLongObjectInspector.java:36)
> at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$6.apply(TableReader.scala:418)
> ...
>
> I believe it is related to https://issues.apache.org/jira/browse/SPARK-17477. Any suggestions on how I can work around this issue?
>
> Spark version: 2.4.5
>
> Regards,
>
> Naresh
>
>
>