You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Ryan Hendrickson <ry...@gmail.com> on 2020/09/10 14:27:35 UTC

Data performance with FlowFile Repo's RocksDB

Hi all,
   I've got a NiFi running with a lot of small JSON files and I'm trying to
squeeze the most performance out of it.

   I recently saw the new RocksDB FlowFile Repo (
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository)
and was wondering what kind, if any, performance gains we could expect out
of it.

Thanks,
Ryan

Re: Data performance with FlowFile Repo's RocksDB

Posted by Matt Burgess <ma...@gmail.com>.
You can use a JsonTreeReader set to Infer Schema and use that in JoltTransformRecord. But if your payload is one big JSON object (rather than a top-level array of JSON objects), then you only have one record and should stick to JoltTransformJson. If you do have an array, JoltTransformJson will still read the whole thing into memory where JoltTransformRecord will process each element individually.

You may be able to use a Jolt transform to do the flattening but you’d need to know the structure of the JSON in order to match the various levels correctly.

Regards,
Matt

> On Sep 10, 2020, at 10:41 AM, Ryan Hendrickson <ry...@gmail.com> wrote:
> 
> 
> Hey Joe,
>    Right now I'm using an InputPort -> JoltTransformJSON -> Custom FlattenJsonArray -> DistributeLoad -> PutElasticHTTP  on a 8 core 64GB of ram box.
> 
>    I did see there is a JoltTransformRecord, but my rudimentary information on the Record processing is that you need a pre-defined well-known schema for the records.  What happens if you don't know the whole schema?
> 
> Thanks,
> Ryan
> 
>> On Thu, Sep 10, 2020 at 10:33 AM Joe Witt <jo...@gmail.com> wrote:
>> Ryan
>> 
>> By far the largest performance relevant activity is flow design itself.  As a last resort I'd look at repo changes.
>> 
>> Are you using the record processors?  Does your data arrive in batches?
>> 
>> Thanks
>> 
>>> On Thu, Sep 10, 2020 at 7:27 AM Ryan Hendrickson <ry...@gmail.com> wrote:
>>> Hi all,
>>>    I've got a NiFi running with a lot of small JSON files and I'm trying to squeeze the most performance out of it.
>>> 
>>>    I recently saw the new RocksDB FlowFile Repo (https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository) and was wondering what kind, if any, performance gains we could expect out of it.
>>> 
>>> Thanks,
>>> Ryan

Re: Data performance with FlowFile Repo's RocksDB

Posted by Ryan Hendrickson <ry...@gmail.com>.
Hey Joe,
   Right now I'm using an InputPort -> JoltTransformJSON -> Custom
FlattenJsonArray -> DistributeLoad -> PutElasticHTTP  on a 8 core 64GB of
ram box.

   I did see there is a JoltTransformRecord, but my rudimentary information
on the Record processing is that you need a pre-defined well-known schema
for the records.  What happens if you don't know the whole schema?

Thanks,
Ryan

On Thu, Sep 10, 2020 at 10:33 AM Joe Witt <jo...@gmail.com> wrote:

> Ryan
>
> By far the largest performance relevant activity is flow design itself.
> As a last resort I'd look at repo changes.
>
> Are you using the record processors?  Does your data arrive in batches?
>
> Thanks
>
> On Thu, Sep 10, 2020 at 7:27 AM Ryan Hendrickson <
> ryan.andrew.hendrickson@gmail.com> wrote:
>
>> Hi all,
>>    I've got a NiFi running with a lot of small JSON files and I'm trying
>> to squeeze the most performance out of it.
>>
>>    I recently saw the new RocksDB FlowFile Repo (
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository)
>> and was wondering what kind, if any, performance gains we could expect out
>> of it.
>>
>> Thanks,
>> Ryan
>>
>

Re: Data performance with FlowFile Repo's RocksDB

Posted by Joe Witt <jo...@gmail.com>.
Ryan

By far the largest performance relevant activity is flow design itself.  As
a last resort I'd look at repo changes.

Are you using the record processors?  Does your data arrive in batches?

Thanks

On Thu, Sep 10, 2020 at 7:27 AM Ryan Hendrickson <
ryan.andrew.hendrickson@gmail.com> wrote:

> Hi all,
>    I've got a NiFi running with a lot of small JSON files and I'm trying
> to squeeze the most performance out of it.
>
>    I recently saw the new RocksDB FlowFile Repo (
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository)
> and was wondering what kind, if any, performance gains we could expect out
> of it.
>
> Thanks,
> Ryan
>