You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by St...@thomsonreuters.com on 2016/08/31 07:52:57 UTC

working with HTML table

Hi All,

I'm trying to extract and doing calculation from HTML table with NIFI.
The purpose of the test if doing an addition of each TD in the same TR and output the result in file.
For this sample the result should be 23 and 43.

My table looks like

<table>
<tr>
          <td>11</td>
          <td>12</td>
     </tr>
     <tr>
          <td>21</td>
          <td>22</td>
     </tr>
</table>
My NIFI workflow is

InvokeHTTP > Response > GetHTMLElement > Success > PutFile

The CSS Selector for GetHTMLElement is table td.
I know that GetHTMLElement produce 0-N element but I don't know how I can perform calculation of them.

All help will be grateful

Thanks
Regards
Stephane

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Stephane Tinseau

Thomson Reuters
stephane.tinseau@thomsonreuters.com<ma...@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>


________________________________

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://site.thomsonreuters.com/site/disclosures/>

RE: working with HTML table

Posted by St...@thomsonreuters.com.
Hi Yolanda, Jeremy,

Thanks for your useful samples.
I will work with them to challenge a little more complex case.

Regards
Stephane
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Stephane Tinseau
iSuite Technical Specialist

Thomson Reuters

Phone: +33 1 47 62 67 72

stephane.tinseau@thomsonreuters.com<ma...@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>

From: Jeremy Dyer [mailto:jdye64@gmail.com]
Sent: 31 August 2016 17:28
To: users@nifi.apache.org
Subject: Re: working with HTML table

Stephan - Here is another option using just the GetHTMLElement without any ExecuteScript processor. This uses a CSS selector to pull the elements and then NiFi Expression Language to split and add the values. It isn't much different than what you had. You were very close.

On Wed, Aug 31, 2016 at 10:06 AM, Yolanda Davis <yo...@gmail.com>> wrote:
Hi Stephane,

Here's something I hope can help.  In the GetHTMLElement instead of doing the selector on "table td" try "table tr"  with an output type of "Text" and a destination type of flowfile-content.  This should create flow files for each row with data and extract the numeric text from the td elements in that data.  From there you can use the ExecuteScript processor to trim the whitespace, convert the text values into numbers and sum them. I was able to get this to work with the javascript (ECMAScript) below and using the example html you provided:

var flowFile = session.get();
if (flowFile != null) {

  var StreamCallback =  Java.type("org.apache.nifi.processor.io.StreamCallback")
  var IOUtils = Java.type("org.apache.commons.io.IOUtils")
  var StandardCharsets = Java.type("java.nio.charset.StandardCharsets")

  flowFile = session.write(flowFile,
    new StreamCallback(function(inputStream, outputStream) {
        var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        var res = text.split(" ");
        var count = 0;
        for(i in res){
        if(parseInt(res[i]) != NaN){
        count+=parseInt(res[i]);
        }
        }
        outputStream.write(count.toString().getBytes(StandardCharsets.UTF_8))
    }))
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getId() + '_count.txt');
  session.transfer(flowFile, REL_SUCCESS)
}

I've attached the template I used to do this which hopefully can help as well.  Please let me know if you have any questions.

Yolanda


On Wed, Aug 31, 2016 at 3:52 AM, <St...@thomsonreuters.com>> wrote:
Hi All,

I’m trying to extract and doing calculation from HTML table with NIFI.
The purpose of the test if doing an addition of each TD in the same TR and output the result in file.
For this sample the result should be 23 and 43.

My table looks like

<table>
<tr>
          <td>11</td>
          <td>12</td>
     </tr>
     <tr>
          <td>21</td>
          <td>22</td>
     </tr>
</table>
My NIFI workflow is

InvokeHTTP > Response > GetHTMLElement > Success > PutFile

The CSS Selector for GetHTMLElement is table td.
I know that GetHTMLElement produce 0-N element but I don’t know how I can perform calculation of them.

All help will be grateful

Thanks
Regards
Stephane

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Stephane Tinseau

Thomson Reuters
stephane.tinseau@thomsonreuters.com<ma...@thomsonreuters.com>
thomsonreuters.com<http://thomsonreuters.com/>


________________________________

This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website.<http://site.thomsonreuters.com/site/disclosures/>



--
--
yolanda.m.davis@gmail.com<ma...@gmail.com>
@YolandaMDavis



Re: working with HTML table

Posted by Jeremy Dyer <jd...@gmail.com>.
Stephan - Here is another option using just the GetHTMLElement without any
ExecuteScript processor. This uses a CSS selector to pull the elements and
then NiFi Expression Language to split and add the values. It isn't much
different than what you had. You were very close.

On Wed, Aug 31, 2016 at 10:06 AM, Yolanda Davis <yo...@gmail.com>
wrote:

> Hi Stephane,
>
> Here's something I hope can help.  In the GetHTMLElement instead of doing
> the selector on "table td" try "table tr"  with an output type of "Text"
> and a destination type of flowfile-content.  This should create flow files
> for each row with data and extract the numeric text from the td elements in
> that data.  From there you can use the ExecuteScript processor to trim the
> whitespace, convert the text values into numbers and sum them. I was able
> to get this to work with the javascript (ECMAScript) below and using the
> example html you provided:
>
> var flowFile = session.get();
> if (flowFile != null) {
>
>   var StreamCallback =  Java.type("org.apache.nifi.
> processor.io.StreamCallback")
>   var IOUtils = Java.type("org.apache.commons.io.IOUtils")
>   var StandardCharsets = Java.type("java.nio.charset.StandardCharsets")
>
>   flowFile = session.write(flowFile,
>     new StreamCallback(function(inputStream, outputStream) {
>         var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>         var res = text.split(" ");
>         var count = 0;
>         for(i in res){
>         if(parseInt(res[i]) != NaN){
>         count+=parseInt(res[i]);
>         }
>         }
>         outputStream.write(count.toString().getBytes(
> StandardCharsets.UTF_8))
>     }))
>   flowFile = session.putAttribute(flowFile, "filename", flowFile.getId() +
> '_count.txt');
>   session.transfer(flowFile, REL_SUCCESS)
> }
>
> I've attached the template I used to do this which hopefully can help as
> well.  Please let me know if you have any questions.
>
> Yolanda
>
>
> On Wed, Aug 31, 2016 at 3:52 AM, <St...@thomsonreuters.com>
> wrote:
>
>> Hi All,
>>
>>
>>
>> I’m trying to extract and doing calculation from HTML table with NIFI.
>>
>> The purpose of the test if doing an addition of each TD in the same TR
>> and output the result in file.
>>
>> For this sample the result should be 23 and 43.
>>
>>
>>
>> My table looks like
>>
>>
>>
>> <table>
>>
>> <tr>
>>
>>           <td>11</td>
>>
>>           <td>12</td>
>>
>>      </tr>
>>
>>      <tr>
>>
>>           <td>21</td>
>>
>>           <td>22</td>
>>
>>      </tr>
>>
>> </table>
>>
>> My NIFI workflow is
>>
>>
>>
>> InvokeHTTP > Response > GetHTMLElement > Success > PutFile
>>
>>
>>
>> The CSS Selector for GetHTMLElement is table td.
>>
>> I know that GetHTMLElement produce 0-N element but I don’t know how I
>> can perform calculation of them.
>>
>>
>>
>> All help will be grateful
>>
>>
>>
>> Thanks
>>
>> Regards
>>
>> Stephane
>>
>>
>>
>> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
>> · ·
>> *Stephane Tinseau*
>>
>> *Thomson Reuters*
>> stephane.tinseau@thomsonreuters.com
>> thomsonreuters.com
>>
>>
>>
>> ------------------------------
>>
>> This e-mail is for the sole use of the intended recipient and contains
>> information that may be privileged and/or confidential. If you are not an
>> intended recipient, please notify the sender by return e-mail and delete
>> this e-mail and any attachments. Certain required legal entity disclosures
>> can be accessed on our website.
>> <http://site.thomsonreuters.com/site/disclosures/>
>>
>
>
>
> --
> --
> yolanda.m.davis@gmail.com
> @YolandaMDavis
>
>

Re: working with HTML table

Posted by Yolanda Davis <yo...@gmail.com>.
Hi Stephane,

Here's something I hope can help.  In the GetHTMLElement instead of doing
the selector on "table td" try "table tr"  with an output type of "Text"
and a destination type of flowfile-content.  This should create flow files
for each row with data and extract the numeric text from the td elements in
that data.  From there you can use the ExecuteScript processor to trim the
whitespace, convert the text values into numbers and sum them. I was able
to get this to work with the javascript (ECMAScript) below and using the
example html you provided:

var flowFile = session.get();
if (flowFile != null) {

  var StreamCallback =
 Java.type("org.apache.nifi.processor.io.StreamCallback")
  var IOUtils = Java.type("org.apache.commons.io.IOUtils")
  var StandardCharsets = Java.type("java.nio.charset.StandardCharsets")

  flowFile = session.write(flowFile,
    new StreamCallback(function(inputStream, outputStream) {
        var text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        var res = text.split(" ");
        var count = 0;
        for(i in res){
        if(parseInt(res[i]) != NaN){
        count+=parseInt(res[i]);
        }
        }

outputStream.write(count.toString().getBytes(StandardCharsets.UTF_8))
    }))
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getId() +
'_count.txt');
  session.transfer(flowFile, REL_SUCCESS)
}

I've attached the template I used to do this which hopefully can help as
well.  Please let me know if you have any questions.

Yolanda


On Wed, Aug 31, 2016 at 3:52 AM, <St...@thomsonreuters.com>
wrote:

> Hi All,
>
>
>
> I’m trying to extract and doing calculation from HTML table with NIFI.
>
> The purpose of the test if doing an addition of each TD in the same TR and
> output the result in file.
>
> For this sample the result should be 23 and 43.
>
>
>
> My table looks like
>
>
>
> <table>
>
> <tr>
>
>           <td>11</td>
>
>           <td>12</td>
>
>      </tr>
>
>      <tr>
>
>           <td>21</td>
>
>           <td>22</td>
>
>      </tr>
>
> </table>
>
> My NIFI workflow is
>
>
>
> InvokeHTTP > Response > GetHTMLElement > Success > PutFile
>
>
>
> The CSS Selector for GetHTMLElement is table td.
>
> I know that GetHTMLElement produce 0-N element but I don’t know how I can
> perform calculation of them.
>
>
>
> All help will be grateful
>
>
>
> Thanks
>
> Regards
>
> Stephane
>
>
>
> · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
> · ·
> *Stephane Tinseau*
>
> *Thomson Reuters*
> stephane.tinseau@thomsonreuters.com
> thomsonreuters.com
>
>
>
> ------------------------------
>
> This e-mail is for the sole use of the intended recipient and contains
> information that may be privileged and/or confidential. If you are not an
> intended recipient, please notify the sender by return e-mail and delete
> this e-mail and any attachments. Certain required legal entity disclosures
> can be accessed on our website.
> <http://site.thomsonreuters.com/site/disclosures/>
>



-- 
--
yolanda.m.davis@gmail.com
@YolandaMDavis