You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by idioma <co...@gmail.com> on 2016/07/18 12:26:09 UTC

SplitText Usage - how to output my individual files?

Hi,
I am not sure whether I am terribly missing the point, but I have a simple
dataflow (CSV2JSON) that does the following:

GetFile (1000 line-cvs file)
SplitText (one line per file)
Extract and ReplaceText in order to extract the content and construct the
JSON structure
UpdateAttribute in which a new attribute called filename has been created
and gets assigned to myOutput.json in order to have visibility of the output
PutFile

My understanding is that when in presence of large files (this is not
probably my case right now, but I thought to test it for future reference),
it is recommended to use SplitText, so that NiFi will create a JSON file for
each line. My question is how do I actually prove that SplitText is doing
the job? How do you test that the file has been successfully split into
multiple json files? 

Thank you for your help, I am rather stuck with this,

I.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/SplitText-Usage-how-to-output-my-individual-files-tp12845.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: SplitText Usage - how to output my individual files?

Posted by idioma <co...@gmail.com>.
Mark, thank you so much for your reply. It is very much clear now and I had
suspected the mgmt toolbar and in particular the Data Provenance area is
very informative regarding the issue.

Thank you again!



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/SplitText-Usage-how-to-output-my-individual-files-tp12845p12855.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: SplitText Usage - how to output my individual files?

Posted by Mark Payne <ma...@hotmail.com>.
Hi Idioma,

There are two different ways that I can recommend you verify this. The first is to right-click on a connection
and then choose "List Queue". From here, you can see the FlowFiles that are queued up in the connection.
You can then click the 'info' icon on the left to see the attributes and content of the FlowFile, so you can download
the content and inspect it.

Secondly, and more importantly, is NiFi's notion of Data Provenance. The User Guide explains how to use this
feature [1]. As data traverses through NiFi, every event that happens to every piece of data is recorded in the
Provenance Repository. This allows you to see exactly what happened to each piece of data as it flows through
your system, and this can be used (among many other things) to debug workflows. It will allow you to see the
attributes as well as the content at every step along the way, so that you can understand exactly how the data
looked at every step along the way and replay the data at any step along the way if it wasn't handled correctly.

Does all of this make sense and give you what you need? Let us know if you have any further questions!

Thanks
-Mark

[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance <http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance>


> On Jul 18, 2016, at 8:26 AM, idioma <co...@gmail.com> wrote:
> 
> Hi,
> I am not sure whether I am terribly missing the point, but I have a simple
> dataflow (CSV2JSON) that does the following:
> 
> GetFile (1000 line-cvs file)
> SplitText (one line per file)
> Extract and ReplaceText in order to extract the content and construct the
> JSON structure
> UpdateAttribute in which a new attribute called filename has been created
> and gets assigned to myOutput.json in order to have visibility of the output
> PutFile
> 
> My understanding is that when in presence of large files (this is not
> probably my case right now, but I thought to test it for future reference),
> it is recommended to use SplitText, so that NiFi will create a JSON file for
> each line. My question is how do I actually prove that SplitText is doing
> the job? How do you test that the file has been successfully split into
> multiple json files? 
> 
> Thank you for your help, I am rather stuck with this,
> 
> I.
> 
> 
> 
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/SplitText-Usage-how-to-output-my-individual-files-tp12845.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.