You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Vincent, Mike" <mv...@mitre.org> on 2019/01/24 18:09:33 UTC

Convert JSON to single line

I'm ingesting Windows Event logs with ConsumeWIndowsEventLog and then using TransformXML according to:

https://community.hortonworks.com/articles/29474/nifi-converting-xml-to-json.html

To make them JSON.  The flow continues to MergeContent, CompressContent and then PutS3Object.

The issue I'm having is when examining the content of the uploaded files (i.e, download and unzip them), they are JSON pretty-printed structures rather than single-line JSON "records".

For example:

{
    Field: 1,
    Field: 2
}

I don't want that, I *want*:

{Field:1,Field:2}

What Convert / Transform / other processor can I use and how to configure to squish the JSON structure to a single line record *before* MergeContent?

Cheers,

Michael J. Vincent
Lead Network Systems Engineer | The MITRE Corporation | Network Technology & Security (T864) | +1 (781) 271-8381


Re: Convert JSON to single line

Posted by Matt Burgess <ma...@apache.org>.
Michael,

As of NiFi 1.7.0, if you use MergeRecord instead of MergeContent, you
can choose a JsonRecordSetWriter with "Pretty Print JSON" set to false
and "Output Grouping" set to "One Line Per Object", that should output
one JSON per line (as well as merge individual flow files/records
together). Any record-based processor would work in that case, so if
MergeRecord isn't an option, then ConvertRecord will work just as
well. In either case, the JsonRecordSetWriter should be set to inherit
the schema from the reader.

Regards,
Matt

On Thu, Jan 24, 2019 at 1:09 PM Vincent, Mike <mv...@mitre.org> wrote:
>
> I’m ingesting Windows Event logs with ConsumeWIndowsEventLog and then using TransformXML according to:
>
>
>
> https://community.hortonworks.com/articles/29474/nifi-converting-xml-to-json.html
>
>
>
> To make them JSON.  The flow continues to MergeContent, CompressContent and then PutS3Object.
>
>
>
> The issue I’m having is when examining the content of the uploaded files (i.e, download and unzip them), they are JSON pretty-printed structures rather than single-line JSON “records”.
>
>
>
> For example:
>
>
>
> {
>
>     Field: 1,
>
>     Field: 2
>
> }
>
>
>
> I don’t want that, I *want*:
>
>
>
> {Field:1,Field:2}
>
>
>
> What Convert / Transform / other processor can I use and how to configure to squish the JSON structure to a single line record *before* MergeContent?
>
>
>
> Cheers,
>
>
>
> Michael J. Vincent
>
> Lead Network Systems Engineer | The MITRE Corporation | Network Technology & Security (T864) | +1 (781) 271-8381
>
>

Re: [EXT] Re: Convert JSON to single line

Posted by Matt Burgess <ma...@apache.org>.
If you have fairly straightforward JSON, you can use InferAvroSchema
first, it will write an attribute to the flow file called
"avro.schema", then in your JsonTreeReader you can specify the
strategy as "Use Schema Text" and keep the default value of the Schema
Text property (which is "${avro.schema}").

Another (likely slower) way is to use a scripting processor, Groovy
has a JsonOutput class that will write the JSON on a single line by
default. The script is only a few lines in ExecuteScript (which can be
slow), but here's a full script you can use in InvokeScriptedProcessor
(the faster scripting processor), it's got a lot of boilerplate in it,
the two main methods are JsonSlurper.parse() and JsonOutput.toJson():

import groovy.json.*

class GroovyProcessor implements Processor {
    def REL_SUCCESS = new
Relationship.Builder().name("success").description('FlowFiles that
were successfully processed are routed here').build()
    def REL_FAILURE = new
Relationship.Builder().name("failure").description('FlowFiles that
were not successfully processed are routed here').build()
    def ComponentLog log
    void initialize(ProcessorInitializationContext context) { log =
context.logger }
    Set<Relationship> getRelationships() { return [REL_FAILURE,
REL_SUCCESS] as Set }
    Collection<ValidationResult> validate(ValidationContext context) { null }
    PropertyDescriptor getPropertyDescriptor(String name) { null }
    void onPropertyModified(PropertyDescriptor descriptor, String
oldValue, String newValue) { }
    List<PropertyDescriptor> getPropertyDescriptors() { null }
    String getIdentifier() { null }
    void onTrigger(ProcessContext context, ProcessSessionFactory
sessionFactory) throws ProcessException {
        def session = sessionFactory.createSession()
        try {
            def flowFile = session.get()
            if(!flowFile) return
            try {
                def inStream = session.read(flowFile)
                def jsonObj = new groovy.json.JsonSlurper().parse(inStream)
                inStream.close()
                flowFile = session.write(flowFile, {outStream ->
outStream.write(JsonOutput.toJson(jsonObj).bytes) } as
OutputStreamCallback)
                session.transfer(flowFile, REL_SUCCESS)
            } catch(e) {
                log.error('Couldn\'t process', e)
                session.transfer(flowFile, REL_FAILURE)
            }
            session.commit()
        } catch (final Throwable t) {
            log.error('{} failed to process due to {}; rolling back
session', [this, t] as Object[])
            session.rollback(true)
            throw t
}}}
processor = new GroovyProcessor()


Regards,
Matt

On Thu, Jan 24, 2019 at 3:02 PM Vincent, Mike <mv...@mitre.org> wrote:
>
> Running 1.8 so I see MergeRecord.  I think I need to use JSONTreeReader as the reader, but it requires a schema.  I don't have a schema for the JSON; wondering why I can't just take the JSON I'm already receiving and not pretty print it - squash it to one line?  I'm new to NiFI so please pardon what may be an elementary request.
>
> Cheers,
>
> Michael J. Vincent
> Lead Network Systems Engineer | The MITRE Corporation | Network Technology & Security (T864) | +1 (781) 271-8381
>
> -----Original Message-----
> From: Matt Burgess <ma...@apache.org>
> Sent: Thursday, January 24, 2019 1:17 PM
> To: users@nifi.apache.org
> Subject: [EXT] Re: Convert JSON to single line
>
> Michael,
>
> As of NiFi 1.7.0, if you use MergeRecord instead of MergeContent, you can choose a JsonRecordSetWriter with "Pretty Print JSON" set to false and "Output Grouping" set to "One Line Per Object", that should output one JSON per line (as well as merge individual flow files/records together). Any record-based processor would work in that case, so if MergeRecord isn't an option, then ConvertRecord will work just as well. In either case, the JsonRecordSetWriter should be set to inherit the schema from the reader.
>
> Regards,
> Matt
>
> On Thu, Jan 24, 2019 at 1:09 PM Vincent, Mike <mv...@mitre.org> wrote:
> >
> > I’m ingesting Windows Event logs with ConsumeWIndowsEventLog and then using TransformXML according to:
> >
> >
> >
> > https://community.hortonworks.com/articles/29474/nifi-converting-xml-t
> > o-json.html
> >
> >
> >
> > To make them JSON.  The flow continues to MergeContent, CompressContent and then PutS3Object.
> >
> >
> >
> > The issue I’m having is when examining the content of the uploaded files (i.e, download and unzip them), they are JSON pretty-printed structures rather than single-line JSON “records”.
> >
> >
> >
> > For example:
> >
> >
> >
> > {
> >
> >     Field: 1,
> >
> >     Field: 2
> >
> > }
> >
> >
> >
> > I don’t want that, I *want*:
> >
> >
> >
> > {Field:1,Field:2}
> >
> >
> >
> > What Convert / Transform / other processor can I use and how to configure to squish the JSON structure to a single line record *before* MergeContent?
> >
> >
> >
> > Cheers,
> >
> >
> >
> > Michael J. Vincent
> >
> > Lead Network Systems Engineer | The MITRE Corporation | Network
> > Technology & Security (T864) | +1 (781) 271-8381
> >
> >

RE: [EXT] Re: Convert JSON to single line

Posted by "Vincent, Mike" <mv...@mitre.org>.
Running 1.8 so I see MergeRecord.  I think I need to use JSONTreeReader as the reader, but it requires a schema.  I don't have a schema for the JSON; wondering why I can't just take the JSON I'm already receiving and not pretty print it - squash it to one line?  I'm new to NiFI so please pardon what may be an elementary request. 

Cheers,

Michael J. Vincent
Lead Network Systems Engineer | The MITRE Corporation | Network Technology & Security (T864) | +1 (781) 271-8381

-----Original Message-----
From: Matt Burgess <ma...@apache.org> 
Sent: Thursday, January 24, 2019 1:17 PM
To: users@nifi.apache.org
Subject: [EXT] Re: Convert JSON to single line

Michael,

As of NiFi 1.7.0, if you use MergeRecord instead of MergeContent, you can choose a JsonRecordSetWriter with "Pretty Print JSON" set to false and "Output Grouping" set to "One Line Per Object", that should output one JSON per line (as well as merge individual flow files/records together). Any record-based processor would work in that case, so if MergeRecord isn't an option, then ConvertRecord will work just as well. In either case, the JsonRecordSetWriter should be set to inherit the schema from the reader.

Regards,
Matt

On Thu, Jan 24, 2019 at 1:09 PM Vincent, Mike <mv...@mitre.org> wrote:
>
> I’m ingesting Windows Event logs with ConsumeWIndowsEventLog and then using TransformXML according to:
>
>
>
> https://community.hortonworks.com/articles/29474/nifi-converting-xml-t
> o-json.html
>
>
>
> To make them JSON.  The flow continues to MergeContent, CompressContent and then PutS3Object.
>
>
>
> The issue I’m having is when examining the content of the uploaded files (i.e, download and unzip them), they are JSON pretty-printed structures rather than single-line JSON “records”.
>
>
>
> For example:
>
>
>
> {
>
>     Field: 1,
>
>     Field: 2
>
> }
>
>
>
> I don’t want that, I *want*:
>
>
>
> {Field:1,Field:2}
>
>
>
> What Convert / Transform / other processor can I use and how to configure to squish the JSON structure to a single line record *before* MergeContent?
>
>
>
> Cheers,
>
>
>
> Michael J. Vincent
>
> Lead Network Systems Engineer | The MITRE Corporation | Network 
> Technology & Security (T864) | +1 (781) 271-8381
>
>