You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Ravi Papisetti (rpapiset)" <rp...@cisco.com> on 2019/03/07 19:48:44 UTC

Convert Avro to ORC or JSON processor - retaining the data type

Hi,

Nifi version 1.7

We have a dataflow that would get data from Oracle database and load into hive tables.

Data flow is something like below:
GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.

Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being written as
{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}

When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.


Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)

Appreciate any help on this.

Thanks,
Ravi Papisetti

Re: Convert Avro to ORC or JSON processor - retaining the data type

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Ravi,

How about storing those as string, and cast strings into numeric data
type int/bigint when you query them?
https://stackoverflow.com/questions/28867438/hive-converting-a-string-to-bigint

Thanks,
Koji

On Sat, Mar 9, 2019 at 6:10 AM Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
>
> Thanks Koji for the response. Our users want to run hiveql queries with some comparators and cannot work with string type for numeric data type.
>
> Any other options?
>
> Thanks,
> Ravi Papisetti
>
> On 07/03/19, 7:14 PM, "Koji Kawamura" <ij...@gmail.com> wrote:
>
>     Hi Ravi,
>
>     I looked at following links, Hive does support some logical types like
>     timestamp-millis, but not sure if decimal is supported.
>     https://issues.apache.org/jira/browse/HIVE-8131
>     https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion
>
>     If treating the number as String works in your use-case, then I'd
>     recommend disabling "Use Avro Logical Types" at ExecuteSQL.
>
>     Thanks,
>     Koji
>
>     On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
>     <rp...@cisco.com> wrote:
>     >
>     > Hi,
>     >
>     >
>     >
>     > Nifi version 1.7
>     >
>     >
>     >
>     > We have a dataflow that would get data from Oracle database and load into hive tables.
>     >
>     >
>     >
>     > Data flow is something like below:
>     >
>     > GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
>     >
>     >
>     >
>     > Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being written as
>     >
>     > {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
>     >
>     >
>     >
>     > When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
>     >
>     >
>     >
>     > Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
>     >
>     >
>     >
>     > Appreciate any help on this.
>     >
>     >
>     >
>     > Thanks,
>     >
>     > Ravi Papisetti
>
>

Re: Convert Avro to ORC or JSON processor - retaining the data type

Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Thanks Koji for the response. Our users want to run hiveql queries with some comparators and cannot work with string type for numeric data type.

Any other options?

Thanks,
Ravi Papisetti

On 07/03/19, 7:14 PM, "Koji Kawamura" <ij...@gmail.com> wrote:

    Hi Ravi,
    
    I looked at following links, Hive does support some logical types like
    timestamp-millis, but not sure if decimal is supported.
    https://issues.apache.org/jira/browse/HIVE-8131
    https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion
    
    If treating the number as String works in your use-case, then I'd
    recommend disabling "Use Avro Logical Types" at ExecuteSQL.
    
    Thanks,
    Koji
    
    On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
    <rp...@cisco.com> wrote:
    >
    > Hi,
    >
    >
    >
    > Nifi version 1.7
    >
    >
    >
    > We have a dataflow that would get data from Oracle database and load into hive tables.
    >
    >
    >
    > Data flow is something like below:
    >
    > GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
    >
    >
    >
    > Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being written as
    >
    > {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
    >
    >
    >
    > When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
    >
    >
    >
    > Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
    >
    >
    >
    > Appreciate any help on this.
    >
    >
    >
    > Thanks,
    >
    > Ravi Papisetti
    


Re: Convert Avro to ORC or JSON processor - retaining the data type

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Ravi,

I looked at following links, Hive does support some logical types like
timestamp-millis, but not sure if decimal is supported.
https://issues.apache.org/jira/browse/HIVE-8131
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion

If treating the number as String works in your use-case, then I'd
recommend disabling "Use Avro Logical Types" at ExecuteSQL.

Thanks,
Koji

On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
>
> Hi,
>
>
>
> Nifi version 1.7
>
>
>
> We have a dataflow that would get data from Oracle database and load into hive tables.
>
>
>
> Data flow is something like below:
>
> GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
>
>
>
> Data at source (ex: column "cpyKey" NUMBER)  in Number/INT format is being written as
>
> {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
>
>
>
> When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC)  file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
>
>
>
> Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
>
>
>
> Appreciate any help on this.
>
>
>
> Thanks,
>
> Ravi Papisetti