You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Ravi Papisetti (rpapiset)" <rp...@cisco.com> on 2019/03/07 19:48:44 UTC
Convert Avro to ORC or JSON processor - retaining the data type
Hi,
Nifi version 1.7
We have a dataflow that would get data from Oracle database and load into hive tables.
Data flow is something like below:
GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
Data at source (ex: column "cpyKey" NUMBER) in Number/INT format is being written as
{"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC) file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
Appreciate any help on this.
Thanks,
Ravi Papisetti
Re: Convert Avro to ORC or JSON processor - retaining the data type
Posted by Koji Kawamura <ij...@gmail.com>.
Hi Ravi,
How about storing those as string, and cast strings into numeric data
type int/bigint when you query them?
https://stackoverflow.com/questions/28867438/hive-converting-a-string-to-bigint
Thanks,
Koji
On Sat, Mar 9, 2019 at 6:10 AM Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
>
> Thanks Koji for the response. Our users want to run hiveql queries with some comparators and cannot work with string type for numeric data type.
>
> Any other options?
>
> Thanks,
> Ravi Papisetti
>
> On 07/03/19, 7:14 PM, "Koji Kawamura" <ij...@gmail.com> wrote:
>
> Hi Ravi,
>
> I looked at following links, Hive does support some logical types like
> timestamp-millis, but not sure if decimal is supported.
> https://issues.apache.org/jira/browse/HIVE-8131
> https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion
>
> If treating the number as String works in your use-case, then I'd
> recommend disabling "Use Avro Logical Types" at ExecuteSQL.
>
> Thanks,
> Koji
>
> On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
> <rp...@cisco.com> wrote:
> >
> > Hi,
> >
> >
> >
> > Nifi version 1.7
> >
> >
> >
> > We have a dataflow that would get data from Oracle database and load into hive tables.
> >
> >
> >
> > Data flow is something like below:
> >
> > GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
> >
> >
> >
> > Data at source (ex: column "cpyKey" NUMBER) in Number/INT format is being written as
> >
> > {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
> >
> >
> >
> > When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC) file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
> >
> >
> >
> > Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
> >
> >
> >
> > Appreciate any help on this.
> >
> >
> >
> > Thanks,
> >
> > Ravi Papisetti
>
>
Re: Convert Avro to ORC or JSON processor - retaining the data type
Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Thanks Koji for the response. Our users want to run hiveql queries with some comparators and cannot work with string type for numeric data type.
Any other options?
Thanks,
Ravi Papisetti
On 07/03/19, 7:14 PM, "Koji Kawamura" <ij...@gmail.com> wrote:
Hi Ravi,
I looked at following links, Hive does support some logical types like
timestamp-millis, but not sure if decimal is supported.
https://issues.apache.org/jira/browse/HIVE-8131
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion
If treating the number as String works in your use-case, then I'd
recommend disabling "Use Avro Logical Types" at ExecuteSQL.
Thanks,
Koji
On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
>
> Hi,
>
>
>
> Nifi version 1.7
>
>
>
> We have a dataflow that would get data from Oracle database and load into hive tables.
>
>
>
> Data flow is something like below:
>
> GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
>
>
>
> Data at source (ex: column "cpyKey" NUMBER) in Number/INT format is being written as
>
> {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
>
>
>
> When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC) file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
>
>
>
> Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
>
>
>
> Appreciate any help on this.
>
>
>
> Thanks,
>
> Ravi Papisetti
Re: Convert Avro to ORC or JSON processor - retaining the data type
Posted by Koji Kawamura <ij...@gmail.com>.
Hi Ravi,
I looked at following links, Hive does support some logical types like
timestamp-millis, but not sure if decimal is supported.
https://issues.apache.org/jira/browse/HIVE-8131
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-AvrotoHivetypeconversion
If treating the number as String works in your use-case, then I'd
recommend disabling "Use Avro Logical Types" at ExecuteSQL.
Thanks,
Koji
On Fri, Mar 8, 2019 at 4:48 AM Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
>
> Hi,
>
>
>
> Nifi version 1.7
>
>
>
> We have a dataflow that would get data from Oracle database and load into hive tables.
>
>
>
> Data flow is something like below:
>
> GenerateTableFetch -> ExecuteSQL > AvrotoJson/ORC (we tried both) > PutHDFS > ListHDFS> ReplaceTExt (to build load data query form the file) > PutHiveQL.
>
>
>
> Data at source (ex: column "cpyKey" NUMBER) in Number/INT format is being written as
>
> {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"cpyKey","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}
>
>
>
> When this is inserted into hive table weather data is loaded from ORC (convertAvroToORC) file or JSON (ConvertAvroToJSON) file, querying data from hive throws parsing exception with incompatible data types.
>
>
>
> Error: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.IllegalArgumentException: ORC does not support type conversion from file type binary (1) to reader type bigint (1) (state=,code=0)
>
>
>
> Appreciate any help on this.
>
>
>
> Thanks,
>
> Ravi Papisetti