You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by idioma <co...@gmail.com> on 2016/05/03 10:26:34 UTC

Re: Using the template cvs to json to convert dsv files into json format: a few questions

I am comment on my own question, just to clarify one aspect of the existing
template. I am not sure why we need ExtractText between the first
ReplaceText and the one that in the flow comes before UpdateAttribute? I am
not sure I have fully understood the flow, what is the actual value for that
step? 

Thank you so much for your help. 



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9782.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
Just to add on my previous comment. I have just used ConvertCharacterSet from
ASCII to UTF-8 and I am getting the same result. The issue seems to be how ¦
are interpreted, I am not sure why. 



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9922.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
Bryan, thank you so much. I will then try a combination of ReplaceText and
MergeContent and work out whether I need any additional manipulation after
that. I am kinda struggling with ReplaceText, once again, I am using in the
search value (\r\n) and the replacement value empty. Furthermore, my
replacement strategy is Literal Replace with Evaluation Mode Line by line (I
have modified these values and tried different combinations, but no joy)

In case of the following value: 

"This is adescription",

I would like to be returned with: 

"This is a description"

However, the regex seems to replace everything before and after ¦ with ??,
without touching the values/sentences that needs the space in. Is that
because my file is probably still in ASCII? Should I convert it into UTF-8
before using ReplaceText? Can you help? 

Thank you, 



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9921.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
Hi, I have made some progress and specifically modified the regex in
ExtractText, as follows:

(.*?);(.*?);(\d+.*)

The same value is used as SearchValue in my ReplaceText and the Replacement
Value is as before = 

{ 
 "testField" : "${dsv.1}" 
}

This time, I am outputting the right value, but I am capturing the headers
too, as follows:

"HeadingA";"HeadingB";"HeadingC";"HeadingD";"HeadingE" 
 { 
 "testField" : "4787" 
} 

I have tried to use UpdateAttribute just before the ReplaceText, but the
headers are still in the output format. 

What is the best approach for removing the first line from my flowfile so
that only the Json format will be generated? Alternative, how can I modify
the SearchValue to skip the first line? 

Thanks and looking forward to sharing this with the community!



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9992.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
With the view of ultimately sharing a template for converting dsv into json
(this is the least I can do with all the help received from this amazing
community), I have been working my way trough the steps for the dataflow and
still running into some troubles and not sure why. Here is my ExtractText ->
ReplaceText strategy:

I have replaced my initial delimiter symbol eventually with a semi-column,
so my ExtractText looks like this (I have added the attribute dsv:

dsv = (.+);(.+);(.+);(.+)  

This should parse the content and pipe it in the dataflow. Then, the
ReplaceText will do the following:

Search Value = (?s:^.*$) 
ReplacementValue = 

{ 
 "testField" : "${dsv.1}"
}


My sample data looks like something like this (full version is around 30
delimited values): 

"HeadingA";"HeadingB";"HeadingC";"HeadingD";"HeadingE"
4787; "788769"; "67869"; "SomeText"; 12-07-200 13.45.22

The initial value is not enclosed within "" and some values such as date,
datetime and some numerical ones too are randomly not wrapped up with the
speech marks.  

My final Json output looks rather weird and I am not sure why this is
happening (is this because my search value is the default one? Can I search
for ${dsv} ?

{ 
 "testField" : "4787; "788769"; "67869"; "SomeText"; 12-07-200 13.45.22"
}

I am expecting to have the testField matched to 4787, is that correct. I
though the issue was the lack of speech marks and I have tweaked the data
sample to look like the other fields, but no joy. 

Can you help?



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9965.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by Bryan Bende <bb...@gmail.com>.
Ok I think I understand now...

Generally NiFi is geared more towards operating on a single piece of data
at a time (one FlowFile), so taking one DSV and converting it to JSON is
straight forward, but trying to use several DSVs as input to produce one
JSON is more tricky.

What you might be able to do is first convert each DSV to JSON so in your
example you would first produce:

{ a = "valueA" }

{ b = "valueB" }

Then use MergeContent to merge all these JSON documents together, and then
possibly do more extracting/replacing after that turn it into your desired
output.

Maybe someone else has another idea of how to handle this.

-Bryan




On Wed, May 4, 2016 at 7:25 AM, idioma <co...@gmail.com> wrote:

> Hi Bryan,
> thank you so much, I am glad I am on the right track. Regarding how the
> resulting json file is created, I should clarify that there are 5 different
> .dsv files from which the json will be created. Therefore, there are a
> number of denormalisation rules employed to produce what will be the
> output.
> For example, the following json is an example of a subset of the mappings:
>
> {
> a = "valueA",
> b = "valueB
>
> }
>
> From .dsv(s): DSV1, DSV2, DSV3, DSV4 and DSV5.
>
> The key/value "a" is from DSV1 where there is a column called "HeadingA"
> with value "a", however the value "valueA" is the actual description of "a"
> as per DSV4 (here is its structure):
>
> HeadingCode ¦ HeadingText
> a                   "valueA"
>
> There are other cases, like b where the value is taken straight from one
> .dsv file/column
>
> Does it clarify a bit? What is your recommendation in my scenario?
>
> Thank you so much.
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9899.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
Hi Bryan,
thank you so much, I am glad I am on the right track. Regarding how the
resulting json file is created, I should clarify that there are 5 different
.dsv files from which the json will be created. Therefore, there are a
number of denormalisation rules employed to produce what will be the output.
For example, the following json is an example of a subset of the mappings:

{
a = "valueA",
b = "valueB

}

From .dsv(s): DSV1, DSV2, DSV3, DSV4 and DSV5. 

The key/value "a" is from DSV1 where there is a column called "HeadingA"
with value "a", however the value "valueA" is the actual description of "a"
as per DSV4 (here is its structure):

HeadingCode ¦ HeadingText
a                   "valueA"

There are other cases, like b where the value is taken straight from one
.dsv file/column

Does it clarify a bit? What is your recommendation in my scenario? 

Thank you so much. 







--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9899.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by Bryan Bende <bb...@gmail.com>.
1) Replacing with GetFile sounds good

2) You are correct, there is nothing to set since you are getting it from
GetFile

3) You could probably replace all those characters with one ReplaceText
processor where the Search Value is something like (\n|\r|\t) and the
Replacement Value is empty.
I haven't tried this, but I think it should work.

4) That modified expression looks correct assuming ¦ is your delimiter.

5) Not totally sure I understand the question... In the example template,
field1 and field2 are just example field names to show how you could take
column 1 and column 2 from the CSV and put them into a JSON document.

Are you saying that you need to produce a single JSON document from
multiple dsv files? can you give a more concrete example of what the final
JSON needs to look like?

-Bryan


On Tue, May 3, 2016 at 11:23 AM, idioma <co...@gmail.com> wrote:

> Hi Bryan, as always you are a star!
> Thank you so much for your help. It does start making much more sense now,
> I
> cannot thank you enough, this flow is really something challenging
> considering my current knowledge of NiFi. So, in my case:
>
> 1 ) I have replaced GenerateFlowFile with GetFile (which looks into a
> subset
> of the content from the .dsv files).
>
> 2) I believe I do not need to set any content (correct me if I am wrong)
> This step is, therefore, skipped.
>
> 3) There might be some need for removing special characters, \r \n and \t
> with " ", so I am thinking to use ReplaceText and then double check with
> ConvertCharacterSet so that ASCII has been converted into UTF-8. Can you
> have more than one replacement value? So, in the first case you would need
> to have a search value \r then your replacement value would be something
> like replaceAll("\r", " "), then I would need to do a similar thing with \n
> and \t.
>
> 4) Here my ExtractText would be slightly different from csv. Something
> like:
> (.+)¦(.+)¦(.+)¦(.+) with attribute dsv. Does it make sense?
>
> 5) ReplaceText, then, is a working in progress as I have a number of dsv
> files, which will then constitute the final json, so I will need to clarify
> the process in depth. I have a question, though. Are field1 as much as
> field2 and a new attribute matching to the desider Json field?
>
> Thank you again, I hope very soon to give back to the community! :)
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9815.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by idioma <co...@gmail.com>.
Hi Bryan, as always you are a star!
Thank you so much for your help. It does start making much more sense now, I
cannot thank you enough, this flow is really something challenging
considering my current knowledge of NiFi. So, in my case: 

1 ) I have replaced GenerateFlowFile with GetFile (which looks into a subset
of the content from the .dsv files).

2) I believe I do not need to set any content (correct me if I am wrong)
This step is, therefore, skipped. 

3) There might be some need for removing special characters, \r \n and \t
with " ", so I am thinking to use ReplaceText and then double check with
ConvertCharacterSet so that ASCII has been converted into UTF-8. Can you
have more than one replacement value? So, in the first case you would need
to have a search value \r then your replacement value would be something
like replaceAll("\r", " "), then I would need to do a similar thing with \n
and \t. 

4) Here my ExtractText would be slightly different from csv. Something like:
(.+)¦(.+)¦(.+)¦(.+) with attribute dsv. Does it make sense? 

5) ReplaceText, then, is a working in progress as I have a number of dsv
files, which will then constitute the final json, so I will need to clarify
the process in depth. I have a question, though. Are field1 as much as
field2 and a new attribute matching to the desider Json field? 

Thank you again, I hope very soon to give back to the community! :)



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9815.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Using the template cvs to json to convert dsv files into json format: a few questions

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Regarding the CsvToJson template, here is what is happening...

1) GenerateFlowFile produces a FlowFile to simulate some data

2) ReplaceText sets the content of that flow file to "a,b,c,d" to represent
and example CSV line

Steps 1 and 2 are just simulating data that would really come from
somewhere else.

3) ExtractText applies a regular expression to the FlowFile to extract the
columns of the CSV...

In this case there is an attribute added to the processor called "csv" with
a value of (.+),(.+),(.+),(.+)

After the FlowFile passes through this processor it will have added
attributes:

csv.1 = a
csv.2 = b
csv.3 = c
csv.4 = d

4) ReplaceText uses those new attributes to build a new JSON document by
setting the replacement value to:

{
 "field1" : "${csv.1}",
 "field2" : "${csv.2}",
 "field3" : "${csv.3}",
 "field4" : "${csv.4}"
}

5) UpdateAttribute is not doing anything in this example, it is just a
place holder to send the results of step 4 somewhere and end the flow, you
would replace this with whatever you want to do with your JSON.

-Bryan


On Tue, May 3, 2016 at 4:26 AM, idioma <co...@gmail.com> wrote:

> I am comment on my own question, just to clarify one aspect of the existing
> template. I am not sure why we need ExtractText between the first
> ReplaceText and the one that in the flow comes before UpdateAttribute? I am
> not sure I have fully understood the flow, what is the actual value for
> that
> step?
>
> Thank you so much for your help.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Using-the-template-cvs-to-json-to-convert-dsv-files-into-json-format-a-few-questions-tp9738p9782.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>