You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Nikolay Nikolaev (Jira)" <ji...@apache.org> on 2021/03/04 08:59:00 UTC
[jira] [Created] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

Nikolay Nikolaev created NIFI-8292:
--------------------------------------

             Summary: ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter
                 Key: NIFI-8292
                 URL: https://issues.apache.org/jira/browse/NIFI-8292
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.13.0, 1.11.4
         Environment: docker
            Reporter: Nikolay Nikolaev
         Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log

h1. Steps to reproduce the bug
# Start NiFi in Docker:
{code}docker pull apache/nifi:latest
docker run -p 8083:8080 --name nifi_container_latest -v <*your path to logs-folder*>:/opt/nifi/nifi-current/logs -v <*your path to file-folder*>:/file_folder apache/nifi:latest{code}
# upload tamplate  [^Test_Parquet_Reader_Writer.xml]  (see an attach)
# create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
# enable all 4 controller services in NiFi Flow Configuration
# start flow
# get an error in "ConvertRecord(JSON_to_Parquet)" processor
# stop flow
# check *logs-folder* (see nifi-app.log) and *file_folder* (contains parquet-files and json-files). In nifi-app.log will bee the error like this (full message see in  [^cut_from_nifi-app.log] ):
{quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.standard.ConvertRecord ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default, section=1], offset=128, length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007]; will route to failure: org.apache.avro.SchemaParseException: Can't redefine: list
org.apache.avro.SchemaParseException: Can't redefine: list
	at org.apache.avro.Schema$Names.put(Schema.java:1128)
	at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
	at ...{quote}

h1. Description
This test flow generate 3 JSON's via GenerateFlowFile processor:
Simple JSON:
{code}
	{ 
	  "field1": "value_field",
      "feild2": "value_field2"
	}
{code}
1st JSON:
{code}
	{ 
	  "field1": "value_field",
	  "array1": [
		{
		  "feild2": "value_field2"
		}
	  ]
	}
{code}
2st JSON:
{code}
	{ 
	  "field": "value_field",
	  "array1": [
		{
		  "array2": ["a_value_array2","b_value_array2"
		  ]
		}
	  ]
	}
{code}
Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back to JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis  JSON- and Parquet files are saved to the file_folder.
In the file_folder we can see, that all JSON's was seccessfull converted into parquet-files. But back to JSON only "Simple JSON" and "1st JSON"  was converted. The 2st JSON сauses an error in ConvertRecord.
So, in certain cases ParquetReader can't read file, which was created by ParquerRecordSetWriter, for example in case of "2st JSON"(which has more complex nesting structure).

This bug is reproduced in the version 1.11.4 and 1.13.0. 
In version 1.12.1 I couldn't reproduce it because of NIFI-7817



--
This message was sent by Atlassian Jira
(v8.3.4#803005)