You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "David Handermann (Jira)" <ji...@apache.org> on 2021/12/08 17:27:00 UTC

[jira] [Resolved] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

     [ https://issues.apache.org/jira/browse/NIFI-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Handermann resolved NIFI-8292.
------------------------------------
    Fix Version/s: 1.14.0
         Assignee: David Handermann
       Resolution: Fixed

NIFI-8439 incorporated an update of the parquet-hadoop library from 1.10.0 to 1.12.0, which resolves the issue with JSON schema serialization.

> ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter
> ------------------------------------------------------------------------------
>
>                 Key: NIFI-8292
>                 URL: https://issues.apache.org/jira/browse/NIFI-8292
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.11.4, 1.13.0
>         Environment: docker
>            Reporter: Nikolay Nikolaev
>            Assignee: David Handermann
>            Priority: Major
>             Fix For: 1.14.0
>
>         Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log
>
>
> h1. Steps to reproduce the bug
> # Start NiFi in Docker:
> {code}docker pull apache/nifi:latest
> docker run -p 8083:8080 --name nifi_container_latest -v <your path to logs-folder>:/opt/nifi/nifi-current/logs -v <your path to file-folder>:/file_folder apache/nifi:latest{code}
> # upload tamplate  [^Test_Parquet_Reader_Writer.xml]  (see an attach)
> # create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
> # enable all 4 controller services in NiFi Flow Configuration
> # start flow
> # get an error in "ConvertRecord(JSON_to_Parquet)" processor
> # stop flow
> # check *logs-folder* (see nifi-app.log) and *file_folder* (contains parquet-files and json-files). In nifi-app.log will bee the error like this (full message see in  [^cut_from_nifi-app.log] ):
> {quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.standard.ConvertRecord ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default, section=1], offset=128, length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007]; will route to failure: org.apache.avro.SchemaParseException: Can't redefine: list
> org.apache.avro.SchemaParseException: Can't redefine: list
> 	at org.apache.avro.Schema$Names.put(Schema.java:1128)
> 	at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
> 	at ...{quote}
> h1. Description
> This test flow generate 3 JSON's via GenerateFlowFile processor:
> Simple JSON:
> {code}
> 	{ 
> 	  "field1": "value_field",
>       "feild2": "value_field2"
> 	}
> {code}
> 1st JSON:
> {code}
> 	{ 
> 	  "field1": "value_field",
> 	  "array1": [
> 		{
> 		  "feild2": "value_field2"
> 		}
> 	  ]
> 	}
> {code}
> 2st JSON:
> {code}
> 	{ 
> 	  "field": "value_field",
> 	  "array1": [
> 		{
> 		  "array2": ["a_value_array2","b_value_array2"
> 		  ]
> 		}
> 	  ]
> 	}
> {code}
> Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back to JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis  JSON- and Parquet files are saved to the *file_folder*.
> In the *file_folder* we can see, that all JSON's was seccessfull converted into parquet-files. But back to JSON only "Simple JSON" and "1st JSON"  was converted. The "2st JSON" сauses an error in ConvertRecord.
> So, in certain cases ParquetReader can't read file, which was created by ParquerRecordSetWriter, for example in case of "2st JSON"(which has more complex nesting structure).
> This bug is reproduced in the version 1.11.4 and 1.13.0. 
> In version 1.12.1 I couldn't reproduce it because of NIFI-7817



--
This message was sent by Atlassian Jira
(v8.20.1#820001)