You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com.INVALID> on 2020/04/21 13:01:53 UTC

Record Readers and Writers

Hi
I want to use the ConvertRecord Processor with it's underlying Record Readers and Writers to convert files from XML or JSON to a bespoke format and probably vice versa.I have looked at the Readers/Writers currently provided and decided that I can use the XML/JSON ones provided but I will need to write something for the bespoke format. So I started looking at the current source code for the Readers/Writers to see how they work and what I would need to do. When running the unit tests on the XMLReader I notice on the console that the output is in JSON format.My question is, is JSON the common format that all records are converted to and from? 
Also is there any specific documentation on writing Reader/Writers, I have only found the developers guide?
Many thanksDave  


Re: Record Readers and Writers

Posted by Matt Burgess <ma...@apache.org>.
Dave,

That JSON is actually the schema of the data, not the data itself.
Avro Schemas are indeed stored in JSON format, under the hood we have
utilities for changing back and forth between Avro schemas and
internal object representations of NiFi Record schemas. We don't
serialize the Record schemas to text, instead we just convert to an
Avro Schema (in JSON format) and send that along with the flow file
(if the Schema Write Strategy indicates to do so). That way other
tools that know about Avro schemas wouldn't also have to know about
some NiFi Schema text format.

Regards,
Matt



On Tue, Apr 21, 2020 at 2:26 PM DAVID SMITH
<da...@btinternet.com.invalid> wrote:
>
> Hi Matt
> Thanks for your reply, I will certainly take on board everything you and Andy advise and I will look at classes you mentioned and I will also read the links provided.
> I ran the TestXMLReader as a junit in Eclipse, a sample of the the console output is :
> 20:15:01.675 [pool-1-thread-1] DEBUG org.apache.nifi.schema.access.AvroSchemaTextStrategy - For {path=target, filename=253762304418.mockFlowFile, xml.stream.is.array=true, uuid=34fb0980-8fc3-4c41-b4f5-3078d26b6f67} found schema text {
>   "namespace": "nifi",
>   "name": "test",
>   "type": "record",
>   "fields": [
>     { "name": "ID", "type": "string" },
>     { "name": "NAME", "type": "string" },
>     { "name": "AGE", "type": "int" },
>     { "name": "COUNTRY", "type": "string" }
>   ]
> }
>
>
> Anyway, thanks again I have something to go on now.
> Dave
>    On Tuesday, 21 April 2020, 17:47:21 BST, Andy LoPresto <al...@apache.org> wrote:
>
>  Hi Dave,
>
> The underlying internal “record format” is not JSON. Avro [1] is used to describe schemas across all record formats, but the internal data storage is NiFi specific. You may be interested in these articles by Mark Payne and Bryan Bende [2][3][4] and the potential use of the ScriptedReader [5] or ScriptedRecordSetWriter [6] to prototype your needed conversions.
>
> [1] https://avro.apache.org/ <https://avro.apache.org/>
> [2] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi <https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi>
> [3] https://blogs.apache.org/nifi/entry/real-time-sql-on-event <https://blogs.apache.org/nifi/entry/real-time-sql-on-event>
> [4] https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries <https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries>
> [5] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedReader/index.html
> [6] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedRecordSetWriter/index.html
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Apr 21, 2020, at 6:01 AM, DAVID SMITH <da...@btinternet.com.INVALID> wrote:
> >
> > Hi
> > I want to use the ConvertRecord Processor with it's underlying Record Readers and Writers to convert files from XML or JSON to a bespoke format and probably vice versa.I have looked at the Readers/Writers currently provided and decided that I can use the XML/JSON ones provided but I will need to write something for the bespoke format. So I started looking at the current source code for the Readers/Writers to see how they work and what I would need to do. When running the unit tests on the XMLReader I notice on the console that the output is in JSON format.My question is, is JSON the common format that all records are converted to and from?
> > Also is there any specific documentation on writing Reader/Writers, I have only found the developers guide?
> > Many thanksDave
> >
>

Re: Record Readers and Writers

Posted by DAVID SMITH <da...@btinternet.com.INVALID>.
Hi Matt
Thanks for your reply, I will certainly take on board everything you and Andy advise and I will look at classes you mentioned and I will also read the links provided.
I ran the TestXMLReader as a junit in Eclipse, a sample of the the console output is :
20:15:01.675 [pool-1-thread-1] DEBUG org.apache.nifi.schema.access.AvroSchemaTextStrategy - For {path=target, filename=253762304418.mockFlowFile, xml.stream.is.array=true, uuid=34fb0980-8fc3-4c41-b4f5-3078d26b6f67} found schema text {
  "namespace": "nifi",
  "name": "test",
  "type": "record",
  "fields": [
    { "name": "ID", "type": "string" },
    { "name": "NAME", "type": "string" },
    { "name": "AGE", "type": "int" },
    { "name": "COUNTRY", "type": "string" }
  ]
}


Anyway, thanks again I have something to go on now.
Dave
   On Tuesday, 21 April 2020, 17:47:21 BST, Andy LoPresto <al...@apache.org> wrote:  
 
 Hi Dave,

The underlying internal “record format” is not JSON. Avro [1] is used to describe schemas across all record formats, but the internal data storage is NiFi specific. You may be interested in these articles by Mark Payne and Bryan Bende [2][3][4] and the potential use of the ScriptedReader [5] or ScriptedRecordSetWriter [6] to prototype your needed conversions. 

[1] https://avro.apache.org/ <https://avro.apache.org/>
[2] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi <https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi>
[3] https://blogs.apache.org/nifi/entry/real-time-sql-on-event <https://blogs.apache.org/nifi/entry/real-time-sql-on-event>
[4] https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries <https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries>
[5] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedReader/index.html
[6] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedRecordSetWriter/index.html

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 21, 2020, at 6:01 AM, DAVID SMITH <da...@btinternet.com.INVALID> wrote:
> 
> Hi
> I want to use the ConvertRecord Processor with it's underlying Record Readers and Writers to convert files from XML or JSON to a bespoke format and probably vice versa.I have looked at the Readers/Writers currently provided and decided that I can use the XML/JSON ones provided but I will need to write something for the bespoke format. So I started looking at the current source code for the Readers/Writers to see how they work and what I would need to do. When running the unit tests on the XMLReader I notice on the console that the output is in JSON format.My question is, is JSON the common format that all records are converted to and from? 
> Also is there any specific documentation on writing Reader/Writers, I have only found the developers guide?
> Many thanksDave  
> 
  

Re: Record Readers and Writers

Posted by Andy LoPresto <al...@apache.org>.
Hi Dave,

The underlying internal “record format” is not JSON. Avro [1] is used to describe schemas across all record formats, but the internal data storage is NiFi specific. You may be interested in these articles by Mark Payne and Bryan Bende [2][3][4] and the potential use of the ScriptedReader [5] or ScriptedRecordSetWriter [6] to prototype your needed conversions. 

[1] https://avro.apache.org/ <https://avro.apache.org/>
[2] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi <https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi>
[3] https://blogs.apache.org/nifi/entry/real-time-sql-on-event <https://blogs.apache.org/nifi/entry/real-time-sql-on-event>
[4] https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries <https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries>
[5] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedReader/index.html
[6] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-scripting-nar/1.11.4/org.apache.nifi.record.script.ScriptedRecordSetWriter/index.html

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 21, 2020, at 6:01 AM, DAVID SMITH <da...@btinternet.com.INVALID> wrote:
> 
> Hi
> I want to use the ConvertRecord Processor with it's underlying Record Readers and Writers to convert files from XML or JSON to a bespoke format and probably vice versa.I have looked at the Readers/Writers currently provided and decided that I can use the XML/JSON ones provided but I will need to write something for the bespoke format. So I started looking at the current source code for the Readers/Writers to see how they work and what I would need to do. When running the unit tests on the XMLReader I notice on the console that the output is in JSON format.My question is, is JSON the common format that all records are converted to and from? 
> Also is there any specific documentation on writing Reader/Writers, I have only found the developers guide?
> Many thanksDave  
> 


Re: Record Readers and Writers

Posted by Matt Burgess <ma...@apache.org>.
Dave,

Which unit test file(s) for XMLReader are you seeing that the output
is JSON? I ran TestXMLReader and TestXMLRecordReader and didn't see
any console output. Most unit tests for Readers only test that the
internal Record objects have been parsed correctly, they don't usually
go an extra step and pass them through a Writer, since that's not the
unit-under-test.

For stripped-down examples of Reader/Writer implementations, take a
look at the unit tests for ScriptedReader and ScriptedRecordSet
writer, they use simple scripts that have the bare-bones config needed
to implement a Reader/Writer, such as implementing a
RecordReaderFactory, using its method(s) to create RecordReader(s),
and using the Reader to iterate over records as they are parsed. An
important point is to avoid (at all costs!) reading the entire input
into memory and then parsing/iterating. The Readers/Writers are
designed specifically to only parse/write one record at a time, so
memory usage does not become an issue and thus very large files can be
read/written.

Regards,
Matt


On Tue, Apr 21, 2020 at 9:02 AM DAVID SMITH
<da...@btinternet.com.invalid> wrote:
>
> Hi
> I want to use the ConvertRecord Processor with it's underlying Record Readers and Writers to convert files from XML or JSON to a bespoke format and probably vice versa.I have looked at the Readers/Writers currently provided and decided that I can use the XML/JSON ones provided but I will need to write something for the bespoke format. So I started looking at the current source code for the Readers/Writers to see how they work and what I would need to do. When running the unit tests on the XMLReader I notice on the console that the output is in JSON format.My question is, is JSON the common format that all records are converted to and from?
> Also is there any specific documentation on writing Reader/Writers, I have only found the developers guide?
> Many thanksDave
>