You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by DAVID SMITH <da...@btinternet.com.INVALID> on 2018/11/30 12:08:44 UTC

Help with loading a file into a cache

Hi Devs
I am running a NiFi 1.8 cluster, each node has 128Gb of Ram. I need to load the contents of a file of which is around 5Gb in size  into a Key/Value cache.
The file I want to load is produced by another company so the format it comes in is not negotiable. The file contains thousands of lines in the following format:-
<index value1>:{<property1 name>: <property1 value>, <property2 name>:<property2 value>}<index value2>:{<property1 name>: <property1 value>, <property2 name>:<property2 value>}
<index value3>:{<property1 name>: <property1 value>, <property2 name>:<property2 value>}

I want the index value to become the Key and everything  beyond the colon to become the value.
What would be the most efficient way of reading the file, and parsing it to load into a cache, I thought of reading in the file, using a split content on CR/LF and then splitting on the first colon.I have noticed in 1.8 there are some CSV and JSON Readers (controller services), would these be a better way of doing this, but the problem I can see is that the file isn't quite a CSV and it isn't quite a JSON Array file.
Many thanksDave

Re: Help with loading a file into a cache

Posted by Mike Thomsen <mi...@gmail.com>.
Dave,

Can you post a redacted example with dummy data?

Thanks,

Mike

On Fri, Nov 30, 2018 at 7:08 AM DAVID SMITH
<da...@btinternet.com.invalid> wrote:

> Hi Devs
> I am running a NiFi 1.8 cluster, each node has 128Gb of Ram. I need to
> load the contents of a file of which is around 5Gb in size  into a
> Key/Value cache.
> The file I want to load is produced by another company so the format it
> comes in is not negotiable. The file contains thousands of lines in the
> following format:-
> <index value1>:{<property1 name>: <property1 value>, <property2
> name>:<property2 value>}<index value2>:{<property1 name>: <property1
> value>, <property2 name>:<property2 value>}
> <index value3>:{<property1 name>: <property1 value>, <property2
> name>:<property2 value>}
>
> I want the index value to become the Key and everything  beyond the colon
> to become the value.
> What would be the most efficient way of reading the file, and parsing it
> to load into a cache, I thought of reading in the file, using a split
> content on CR/LF and then splitting on the first colon.I have noticed in
> 1.8 there are some CSV and JSON Readers (controller services), would these
> be a better way of doing this, but the problem I can see is that the file
> isn't quite a CSV and it isn't quite a JSON Array file.
> Many thanksDave