You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Salim Memon <sa...@capitalone.com.INVALID> on 2023/12/11 18:27:40 UTC

Converting from Parquet -> Decimal Type Failure

Hi Devs,

We are currently running into an issue where the parquet schema when
reading from the footer of the file, contains the logical type decimal with
a precision and scale. The field also contains the optional primitive type
of int64 or int32. When we pass this through the Avro converter, it ends up
returning a Long as the first check within the avro converter looks for
primitive types first and so loses the decimal value.

eg: 8.25 -> 825

Attached are screenshots of the MessageType (parquet schema) and the output
of the Avro converter. Is there anything I can do to retain the precision?

Parquet-Avro version: 1.12.0
Language: Java
AvroReadSupport.READ_INT96_AS_FIXED, true

Best,

Salim Memon

______________________________________________________________________



The information contained in this e-mail may be confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.




Re: [External Sender] Re: Converting from Parquet -> Decimal Type Failure

Posted by Salim Memon <sa...@capitalone.com.INVALID>.
Got it, thanks Martin.

Best,

Salim Memon
Cell: (832) 314 5518



On Wed, Dec 13, 2023 at 3:04 AM Martin Grigorov <mg...@apache.org>
wrote:

> Hi Salim,
>
> You have to contact dev@parquet.apache.org instead.
>
> https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java
> <https://urldefense.com/v3/__https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java__;!!FrPt2g6CO4Wadw!KPwveradWE_eh4jPeqxb5qBbgjiZTBdoKurr1VggngE5YKpWnWeDbuullbg3E9cycbAbLX6nYfiUslTboAxwA88$>
>
> Martin
>
> On Tue, Dec 12, 2023 at 6:30 PM Salim Memon <sa...@capitalone.com>
> wrote:
>
>> Morning Martin,
>>
>> The library we are using to parse the parquet file is
>> org.apache.parquet-hadoop-1.12.0 (ParquetFileReader.java), and to convert
>> the file from parquet schema to avro schema we are using
>> org.apache.parquet-avro-1.12.0 (AvroSchemaConverter.java). Here is the code
>> snippet doing the work.
>>
>> Path schemaPath =
>> HadoopParquetUtils.getFirstFullParquetPath(hadoopFilePath, configuration);
>> ParquetFileReader r =
>> ParquetFileReader.open(HadoopInputFile.fromPath(schemaPath, configuration));
>> MessageType messageType = r.getFooter().getFileMetaData().getSchema();
>> AvroSchemaConverter converter = new AvroSchemaConverter(configuration);
>> Schema schema = converter.convert(messageType);
>>
>> Best,
>>
>> Salim Memon
>> Cell: (832) 314 5518
>>
>>
>>
>> On Tue, Dec 12, 2023 at 3:00 AM Martin Grigorov <mg...@apache.org>
>> wrote:
>>
>>> Hi Salim,
>>>
>>> Could you please give more details about the Avro tool/library you use ?
>>> I have the feeling you use some third party library that is not
>>> supported by the Apache Avro team.
>>>
>>> Martin
>>>
>>> On Mon, Dec 11, 2023 at 9:47 PM Salim Memon
>>> <sa...@capitalone.com.invalid> wrote:
>>>
>>>> Hi Devs,
>>>>
>>>> We are currently running into an issue where the parquet schema when
>>>> reading from the footer of the file, contains the logical type decimal with
>>>> a precision and scale. The field also contains the optional primitive type
>>>> of int64 or int32. When we pass this through the Avro converter, it ends up
>>>> returning a Long as the first check within the avro converter looks for
>>>> primitive types first and so loses the decimal value.
>>>>
>>>> eg: 8.25 -> 825
>>>>
>>>> Attached are screenshots of the MessageType (parquet schema) and the
>>>> output of the Avro converter. Is there anything I can do to retain the
>>>> precision?
>>>>
>>>> Parquet-Avro version: 1.12.0
>>>> Language: Java
>>>> AvroReadSupport.READ_INT96_AS_FIXED, true
>>>>
>>>> Best,
>>>>
>>>> Salim Memon
>>>> ------------------------------
>>>>
>>>> The information contained in this e-mail may be confidential and/or
>>>> proprietary to Capital One and/or its affiliates and may only be used
>>>> solely in performance of work or services for Capital One. The information
>>>> transmitted herewith is intended only for use by the individual or entity
>>>> to which it is addressed. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that any review, retransmission,
>>>> dissemination, distribution, copying or other use of, or taking of any
>>>> action in reliance upon this information is strictly prohibited. If you
>>>> have received this communication in error, please contact the sender and
>>>> delete the material from your computer.
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>
>> The information contained in this e-mail may be confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
>>
>>
>>

______________________________________________________________________



The information contained in this e-mail may be confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.




Re: [External Sender] Re: Converting from Parquet -> Decimal Type Failure

Posted by Martin Grigorov <mg...@apache.org>.
Hi Salim,

You have to contact dev@parquet.apache.org instead.
https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java

Martin

On Tue, Dec 12, 2023 at 6:30 PM Salim Memon <sa...@capitalone.com>
wrote:

> Morning Martin,
>
> The library we are using to parse the parquet file is
> org.apache.parquet-hadoop-1.12.0 (ParquetFileReader.java), and to convert
> the file from parquet schema to avro schema we are using
> org.apache.parquet-avro-1.12.0 (AvroSchemaConverter.java). Here is the code
> snippet doing the work.
>
> Path schemaPath =
> HadoopParquetUtils.getFirstFullParquetPath(hadoopFilePath, configuration);
> ParquetFileReader r =
> ParquetFileReader.open(HadoopInputFile.fromPath(schemaPath, configuration));
> MessageType messageType = r.getFooter().getFileMetaData().getSchema();
> AvroSchemaConverter converter = new AvroSchemaConverter(configuration);
> Schema schema = converter.convert(messageType);
>
> Best,
>
> Salim Memon
> Cell: (832) 314 5518
>
>
>
> On Tue, Dec 12, 2023 at 3:00 AM Martin Grigorov <mg...@apache.org>
> wrote:
>
>> Hi Salim,
>>
>> Could you please give more details about the Avro tool/library you use ?
>> I have the feeling you use some third party library that is not supported
>> by the Apache Avro team.
>>
>> Martin
>>
>> On Mon, Dec 11, 2023 at 9:47 PM Salim Memon
>> <sa...@capitalone.com.invalid> wrote:
>>
>>> Hi Devs,
>>>
>>> We are currently running into an issue where the parquet schema when
>>> reading from the footer of the file, contains the logical type decimal with
>>> a precision and scale. The field also contains the optional primitive type
>>> of int64 or int32. When we pass this through the Avro converter, it ends up
>>> returning a Long as the first check within the avro converter looks for
>>> primitive types first and so loses the decimal value.
>>>
>>> eg: 8.25 -> 825
>>>
>>> Attached are screenshots of the MessageType (parquet schema) and the
>>> output of the Avro converter. Is there anything I can do to retain the
>>> precision?
>>>
>>> Parquet-Avro version: 1.12.0
>>> Language: Java
>>> AvroReadSupport.READ_INT96_AS_FIXED, true
>>>
>>> Best,
>>>
>>> Salim Memon
>>> ------------------------------
>>>
>>> The information contained in this e-mail may be confidential and/or
>>> proprietary to Capital One and/or its affiliates and may only be used
>>> solely in performance of work or services for Capital One. The information
>>> transmitted herewith is intended only for use by the individual or entity
>>> to which it is addressed. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any review, retransmission,
>>> dissemination, distribution, copying or other use of, or taking of any
>>> action in reliance upon this information is strictly prohibited. If you
>>> have received this communication in error, please contact the sender and
>>> delete the material from your computer.
>>>
>>>
>>>
>>>
>>> ------------------------------
>
> The information contained in this e-mail may be confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>

Re: Converting from Parquet -> Decimal Type Failure

Posted by Martin Grigorov <mg...@apache.org>.
Hi Salim,

Could you please give more details about the Avro tool/library you use ?
I have the feeling you use some third party library that is not supported
by the Apache Avro team.

Martin

On Mon, Dec 11, 2023 at 9:47 PM Salim Memon
<sa...@capitalone.com.invalid> wrote:

> Hi Devs,
>
> We are currently running into an issue where the parquet schema when
> reading from the footer of the file, contains the logical type decimal with
> a precision and scale. The field also contains the optional primitive type
> of int64 or int32. When we pass this through the Avro converter, it ends up
> returning a Long as the first check within the avro converter looks for
> primitive types first and so loses the decimal value.
>
> eg: 8.25 -> 825
>
> Attached are screenshots of the MessageType (parquet schema) and the
> output of the Avro converter. Is there anything I can do to retain the
> precision?
>
> Parquet-Avro version: 1.12.0
> Language: Java
> AvroReadSupport.READ_INT96_AS_FIXED, true
>
> Best,
>
> Salim Memon
> ------------------------------
>
> The information contained in this e-mail may be confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>