You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Mohit <mo...@open-insights.co.in> on 2018/07/20 13:12:19 UTC

GenerateTableFetch -> RPG -> ExecuteSQL fetching duplicate records from Netezza but the count is same.

Hi all,

I am fetching data from Netezza using GenerateTableFetch -> RPG ->
ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
some tables with more than a million rows, it fetches duplicate rows.

 

Partition Size  varies from 3 million to 30 million with respect to table
size. For table with ~300 million rows, size is 30 million and likewise.  

 

For Example -

 

Table : abc

Netezza count -  3265421

Hive Count - 3265421

Duplicate rows in Hive -  97070

 

Is this the expected behaviour while fetching from Netezza?

 

Regards,

Mohit

Re: GenerateTableFetch -> RPG -> ExecuteSQL fetching duplicate records from Netezza but the count is same.

Posted by Pierre Villard <pi...@gmail.com>.

Hi,

What's the configuration of the GTF processor? Is data written to the
source table while executing the workflow?
How do you check for duplicate rows in Hive?

Thanks

2018-07-20 15:12 GMT+02:00 Mohit <mo...@open-insights.co.in>:

> Hi all,
>
> I am fetching data from Netezza using GenerateTableFetch -> RPG ->
> ExecuteSQL -> PutHDFS . It is working fine for most of the time, but for
> some tables with more than a million rows, it fetches duplicate rows.
>
>
>
> Partition Size  varies from 3 million to 30 million with respect to table
> size. For table with ~300 million rows, size is 30 million and likewise.
>
>
>
> For Example –
>
>
>
> Table : abc
>
> Netezza count -  3265421
>
> Hive Count - 3265421
>
> Duplicate rows in Hive -  97070
>
>
>
> Is this the expected behaviour while fetching from Netezza?
>
>
>
> Regards,
>
> Mohit
>