You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "adhumal@yume.com" <ad...@yume.com> on 2016/03/04 19:27:27 UTC

ClassCastException while de-serializing(loading into hive table) decimals written in avro schema backed Parquet format

Hi,

I am trying to serialize csv data in to Parquet format using Avro Schema(Avro Backed) & again reading that into hive tables.

But when I am running a query for decimal field I am getting following error message:

> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable

This is successfully getting serialized using following sample code snippet(Sample code to serialize one single record):

    import java.io.File;
    import java.io.IOException;
    import java.math.BigDecimal;
    import java.math.BigInteger;
    import java.nio.ByteBuffer;

    import org.apache.avro.Schema;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericData.Record;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.hadoop.fs.Path;
    import org.apache.parquet.avro.AvroSchemaConverter;
    import org.apache.parquet.avro.AvroWriteSupport;
    import org.apache.parquet.hadoop.ParquetWriter;
    import org.apache.parquet.hadoop.metadata.CompressionCodecName;
    import org.apache.parquet.schema.MesspidType;

    public class AvroParquetConverter {

        public static void main(String[] args) throws IOException {
            Schema avroSchema = new Schema.Parser().parse(new File("schema.avsc"));
            GenericRecord myrecord = new GenericData.Record(avroSchema);
            String outputFilename = "/home/jai/sample1000-snappy.parquet";
            Path outputPath = new Path(outputFilename);
            MesspidType parquetSchema = new AvroSchemaConverter()
                    .convert(avroSchema);
            AvroWriteSupport writeSupport = new AvroWriteSupport(parquetSchema,
                    avroSchema);
            CompressionCodecName compressionCodecSnappy = CompressionCodecName.SNAPPY;
            int blockSize = 256 * 1024 * 1024;
            int ppidSize = 64 * 1024;

            ParquetWriter parquetWriterSnappy = new ParquetWriter(outputPath,
                    writeSupport, compressionCodecSnappy, blockSize, ppidSize);
            BigDecimal bd = new BigDecimal(20);
            GenericRecord myrecordTemp = new GenericData.Record(avroSchema);
            myrecord.put("name", "Abhijeet1");
            myrecord.put("pid", 20);
            myrecord.put("favorite_number", 22);
            String bd1 = "13.5";
            BigDecimal bdecimal = new BigDecimal(bd1);
            bdecimal.setScale(15, 6);
            BigInteger bi = bdecimal.unscaledValue();
            byte[] barray = bi.toByteArray();
            ByteBuffer byteBuffer = ByteBuffer.allocate(barray.length);
            byteBuffer.put(barray);
            byteBuffer.rewind();
            myrecord.put("price", byteBuffer);
            parquetWriterSnappy.write(myrecord);
            parquetWriterSnappy.close();
        }
    }

Tried decimal to bytebuffer conversion is done using following statement as well:

    ByteBuffer.wrap(bdecimal.unscaledValue().toByteArray());

Following is the avro schema file

    {
        "namespace": "avropoc",
        "type": "record",
        "name": "User",
        "fields": [
                 {"name": "name", "type": "string", "default" : "null"},
                 {"name": "favorite_number",  "type": "int", "default": 0 },
                 {"name": "pid",  "type":"int", "default" : 0 },
                 {"name": "price", "type": {"type" : "bytes","logicalType":"decimal","precision":15,"scale":6}, "default" : 0 }
         ]
    }

Also tried following modification in to schema:


    {"name": "price", "type": "bytes","logicalType":"decimal","precision":15,"scale":6, "default" : 0 }


And I am creating Hive table as follows:

    create external table avroparquet1
    ( name string, favorite_number int,
    pid int, price DECIMAL(15,6))
    STORED AS PARQUET;


This looks like parquet/avro/hive related issue where it is not able to deserialize Decimals which in case of avro needs to be written as ByteBuffer.

I have tried this on avro 1.8.0, parquet 1.8.1 & Hive 1.1.0.

Any help would be appreciated.

Thanks,
Abhijeet

RE: ClassCastException while de-serializing(loading into hive table) decimals written in avro schema backed Parquet format

Posted by "adhumal@yume.com" <ad...@yume.com>.

Any updates on the same?

I am stuck up with this and there is no other help available. I have tried most of possible combinations till now.

Not sure if I am missing anything or still there is some bug?


________________________________
From: Abhijeet Dhumal
Sent: Saturday, March 05, 2016 12:05 AM
To: dev-help@parquet.apache.org
Subject: ClassCastException while de-serializing(loading into hive table) decimals written in avro schema backed Parquet format


Hi,

I am trying to serialize csv data in to Parquet format using Avro Schema(Avro Backed) & again reading that into hive tables.

But when I am running a query for decimal field I am getting following error message:

> Failed with exception
> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable
> cannot be cast to org.apache.hadoop.hive.serde2.io.HiveDecimalWritable

This is successfully getting serialized using following sample code snippet(Sample code to serialize one single record):

    import java.io.File;
    import java.io.IOException;
    import java.math.BigDecimal;
    import java.math.BigInteger;
    import java.nio.ByteBuffer;

    import org.apache.avro.Schema;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericData.Record;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.hadoop.fs.Path;
    import org.apache.parquet.avro.AvroSchemaConverter;
    import org.apache.parquet.avro.AvroWriteSupport;
    import org.apache.parquet.hadoop.ParquetWriter;
    import org.apache.parquet.hadoop.metadata.CompressionCodecName;
    import org.apache.parquet.schema.MesspidType;

    public class AvroParquetConverter {

        public static void main(String[] args) throws IOException {
            Schema avroSchema = new Schema.Parser().parse(new File("schema.avsc"));
            GenericRecord myrecord = new GenericData.Record(avroSchema);
            String outputFilename = "/home/jai/sample1000-snappy.parquet";
            Path outputPath = new Path(outputFilename);
            MesspidType parquetSchema = new AvroSchemaConverter()
                    .convert(avroSchema);
            AvroWriteSupport writeSupport = new AvroWriteSupport(parquetSchema,
                    avroSchema);
            CompressionCodecName compressionCodecSnappy = CompressionCodecName.SNAPPY;
            int blockSize = 256 * 1024 * 1024;
            int ppidSize = 64 * 1024;

            ParquetWriter parquetWriterSnappy = new ParquetWriter(outputPath,
                    writeSupport, compressionCodecSnappy, blockSize, ppidSize);
            BigDecimal bd = new BigDecimal(20);
            GenericRecord myrecordTemp = new GenericData.Record(avroSchema);
            myrecord.put("name", "Abhijeet1");
            myrecord.put("pid", 20);
            myrecord.put("favorite_number", 22);
            String bd1 = "13.5";
            BigDecimal bdecimal = new BigDecimal(bd1);
            bdecimal.setScale(15, 6);
            BigInteger bi = bdecimal.unscaledValue();
            byte[] barray = bi.toByteArray();
            ByteBuffer byteBuffer = ByteBuffer.allocate(barray.length);
            byteBuffer.put(barray);
            byteBuffer.rewind();
            myrecord.put("price", byteBuffer);
            parquetWriterSnappy.write(myrecord);
            parquetWriterSnappy.close();
        }
    }

Tried decimal to bytebuffer conversion is done using following statement as well:

    ByteBuffer.wrap(bdecimal.unscaledValue().toByteArray());

Following is the avro schema file

    {
        "namespace": "avropoc",
        "type": "record",
        "name": "User",
        "fields": [
                 {"name": "name", "type": "string", "default" : "null"},
                 {"name": "favorite_number",  "type": "int", "default": 0 },
                 {"name": "pid",  "type":"int", "default" : 0 },
                 {"name": "price", "type": {"type" : "bytes","logicalType":"decimal","precision":15,"scale":6}, "default" : 0 }
         ]
    }

Also tried following modification in to schema:


    {"name": "price", "type": "bytes","logicalType":"decimal","precision":15,"scale":6, "default" : 0 }


And I am creating Hive table as follows:

    create external table avroparquet1
    ( name string, favorite_number int,
    pid int, price DECIMAL(15,6))
    STORED AS PARQUET;


This looks like parquet/avro/hive related issue where it is not able to deserialize Decimals which in case of avro needs to be written as ByteBuffer.

I have tried this on avro 1.8.0, parquet 1.8.1 & Hive 1.1.0.

Any help would be appreciated.

Thanks,
Abhijeet