You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2019/03/26 15:43:00 UTC

[jira] [Commented] (NIFI-6134) InferAvroSchema does not honour Analysis sampling.

    [ https://issues.apache.org/jira/browse/NIFI-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801861#comment-16801861 ] 

Matt Burgess commented on NIFI-6134:
------------------------------------

The documentation for the Number Of Records To Analyze states that it "only applies to JSON content type". Strangely enough though, looking at the Kite code, for CSV schema inference it should be hard-coded to 25 rows (whether the property is set to 10 or not). I don't think the Kite SDK is being maintained so unfortunately I think this has to be considered "works as designed". You may have better luck with the CSVReader schema inference available as of NiFi 1.9.0.

> InferAvroSchema does not honour Analysis sampling.
> --------------------------------------------------
>
>                 Key: NIFI-6134
>                 URL: https://issues.apache.org/jira/browse/NIFI-6134
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.9.0
>         Environment: Windows
>            Reporter: Steven Fister
>            Priority: Critical
>         Attachments: image-2019-03-20-18-04-50-595.png
>
>
> When using the InferAvroSchema setting the inferred.avro.schema setting even when setting to 25 or 250 it still only samples 10 records.  In my sample I skip the first line which is a header that has blanks in it.  The module only samples the remaining 9 after the first line in skipped in the  CSV File.
> !image-2019-03-20-18-04-50-595.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)