You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Mohit <mo...@open-insights.co.in> on 2018/07/20 13:12:19 UTC
GenerateTableFetch -> RPG -> ExecuteSQL fetching duplicate records from Netezza but the count is same.
Hi all,
I am fetching data from Netezza using GenerateTableFetch -> RPG ->
ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
some tables with more than a million rows, it fetches duplicate rows.
Partition Size varies from 3 million to 30 million with respect to table
size. For table with ~300 million rows, size is 30 million and likewise.
For Example -
Table : abc
Netezza count - 3265421
Hive Count - 3265421
Duplicate rows in Hive - 97070
Is this the expected behaviour while fetching from Netezza?
Regards,
Mohit
Re: GenerateTableFetch -> RPG -> ExecuteSQL fetching duplicate
records from Netezza but the count is same.
Posted by Pierre Villard <pi...@gmail.com>.
Hi,
What's the configuration of the GTF processor? Is data written to the
source table while executing the workflow?
How do you check for duplicate rows in Hive?
Thanks
2018-07-20 15:12 GMT+02:00 Mohit <mo...@open-insights.co.in>:
> Hi all,
>
> I am fetching data from Netezza using GenerateTableFetch -> RPG ->
> ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
> some tables with more than a million rows, it fetches duplicate rows.
>
>
>
> Partition Size varies from 3 million to 30 million with respect to table
> size. For table with ~300 million rows, size is 30 million and likewise.
>
>
>
> For Example –
>
>
>
> Table : abc
>
> Netezza count - 3265421
>
> Hive Count - 3265421
>
> Duplicate rows in Hive - 97070
>
>
>
> Is this the expected behaviour while fetching from Netezza?
>
>
>
> Regards,
>
> Mohit
>