You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@jmeter.apache.org by thanh nguyen <ma...@gmail.com> on 2011/01/24 16:27:27 UTC

Regulation Expression alternative

Hi everyone,

I have a big HTML table from which I need to extract data. The table has
several columns. The regulation expression required to do the extraction job
is very long and complex. The code is hard to debug and to maintain. I'd
like to know what are the alternatives? Is there HTML parser that create DOM
objects? I could program a postprocessor in beanshell...

Thanks a lot

Thanh

Re: Regulation Expression alternative

Posted by Deepak Shetty <sh...@gmail.com>.
a. Two regular expressions might work better in some cases (the second works
on the first) - hard to say without an example.
b. XPATH extractor memory used might be an issue.
Give an example of what you want to do

On Mon, Jan 24, 2011 at 7:27 AM, thanh nguyen <ma...@gmail.com>wrote:

> Hi everyone,
>
> I have a big HTML table from which I need to extract data. The table has
> several columns. The regulation expression required to do the extraction
> job
> is very long and complex. The code is hard to debug and to maintain. I'd
> like to know what are the alternatives? Is there HTML parser that create
> DOM
> objects? I could program a postprocessor in beanshell...
>
> Thanks a lot
>
> Thanh
>

Re: Regulation Expression alternative

Posted by Felix Frank <ff...@mpexnet.de>.

On 01/25/2011 10:59 PM, thanh nguyen wrote:
> The final solution I found is to break down my regulation expression and

It's "regular expression", actually.

Congratulations on solving this one.

Regards,
Felix

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-user-help@jakarta.apache.org


Re: Regulation Expression alternative

Posted by thanh nguyen <ma...@gmail.com>.
The final solution I found is to break down my regulation expression and
work with partial results. I applied the 'Divide and Conquer' strategy. It's
more code lines but it's now more readable.


On Tue, Jan 25, 2011 at 1:11 PM, thanh nguyen <ma...@gmail.com>wrote:

> Hi all,
>
> Thanks for your help! I have the regular expression to select data in 2
> columns.
>
> <table>
> <tr>
> <td>(.*?)<\td>(.*?)<td><\td>
> <\tr>
> <\table>
>
> the selection is saved in a variable called MyVar. Then I loop through
> MyVar_X_Y to access the data, where X is the row and Y the column.
>
> What's the equivalent in xpath query?
>
> Thanks
> Thanh
>
>
> On Mon, Jan 24, 2011 at 4:31 PM, Deepak Shetty <sh...@gmail.com> wrote:
>
>> what is it that you want to select? all the columns? that are not titles
>> would be something like
>> //tbody/tr/td/span (but this will flatten out the structure)?
>>
>> regards
>> deepak
>>
>> On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglistfan@gmail.com
>> >wrote:
>>
>> > Felix,
>> >
>> > I'll have look at the xpath. it looks interesting. But I can't find any
>> > example of code for xpath?
>> > Thank you
>> > Thanh
>> >
>> > ps: this is the table I'm working on. 1st row is the title. 2nd row
>> > contains
>> > data. I want to extract data1, data2....the regular expression reads row
>> by
>> > row. In the beanshell I do 2 loop: for each row and for each column.
>> There
>> > are rows number odd and rows number even.
>> >
>> >
>> > <table>
>> > <tr><th class="sbListHeaderCellEnd" scope="col" valign="top"
>> width="5"><img
>> > alt="" height="5" src="/assets/common/img/cnr_t_tl.gif"
>> width="5"></th><th
>> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt=""
>> height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_name')"
>> onclick="submitForm1023(event);return
>> > false;" title="Sort by column Title">Title1</a></span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title2</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_deliveryType')"
>> > onclick="submitForm1024(event);return false;" title="Sort by column
>> > Delivery
>> > Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
>> > border="0" height="1" src="/assets/common/img/1x1.gif"
>> width="1"></td><th
>> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt=""
>> height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_regStartDate')"
>> > onclick="submitForm1025(event);return false;" title="Sort by column
>> > Registration Date">Title4</a></span></th><td
>> > class="sbListColumnSpacer"><img
>> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
>> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
>> scope="col"><img
>> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_completionStatus')"
>> > onclick="submitForm1026(event);return false;" title="Sort by column
>> > Completion Status">Title5</a></span></th><td
>> > class="sbListColumnSpacer"><img
>> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
>> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
>> scope="col"><img
>> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_completionDate')"
>> > onclick="submitForm1027(event);return false;" title="Sort by column Date
>> > Marked Complete">Title6</a></span></th><td
>> class="sbListColumnSpacer"><img
>> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
>> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
>> scope="col"><img
>> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title7</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_score')"
>> onclick="submitForm1028(event);return
>> > false;" title="Sort by column Score">Title8</a></span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_grade')"
>> onclick="submitForm1029(event);return
>> > false;" title="Sort by column Grade">Title9</a></span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title10</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title11</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title12</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title13</span></th><td
>> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
>> > src="/assets/common/img/1x1.gif" width="1"></td><th
>> > class="sbListHeaderCell"
>> > nowrap="true" scope="col"><img alt="" height="1"
>> > src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText"><a class="sbListHeaderText"
>> > href="javascript:void('sort_startDate')"
>> > onclick="submitForm1030(event);return false;" title="Sort by column
>> > Offering
>> > Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
>> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
>> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
>> scope="col"><img
>> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
>> > class="sbListHeaderText">Title15</span></th><th align="right"
>> > class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
>> alt=""
>> > height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
>> >
>> > <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
>> > class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
>> > href="javascript:void('titleLink')"
>> onclick="submitForm1031(event);return
>> > false;" title="data1">data1</a></span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">&nbsp;</span></td><td
>> > class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">data2</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">data3</span></td><td
>> class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
>> > class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
>> > doTruncate="false" href="javascript:void('blah')"
>> > onclick="submitForm1033(event);return false;" title="blah
>> > blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">data5</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">&nbsp;</span></td><td
>> > class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">&nbsp;</span></td><td
>> > class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">data6</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">data7</span></td><td
>> class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">data8</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
>> > class="sbListText">data8</span></td><td
>> class="sbListColumnSpacer"></td><td
>> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
>> > class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
>> > class="sbLinkTableDisplay" doTruncate="false"
>> > href="javascript:void('editLink')" onclick="submitForm1035(event);return
>> > false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
>> > doTruncate="false" href="javascript:void('deleteLink')"
>> > onclick="submitForm1036(event);return false;"
>> > title="Delete">Delete</a><br><br></td><td
>> > class="sbListOddCellEnd"></td></tr><tr>
>> >
>> > </table>
>> >
>> >
>> >
>> > On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:
>> >
>> > > On 01/24/2011 04:27 PM, thanh nguyen wrote:
>> > > > Hi everyone,
>> > > >
>> > > > I have a big HTML table from which I need to extract data. The table
>> > has
>> > > > several columns. The regulation expression required to do the
>> > extraction
>> > > job
>> > > > is very long and complex. The code is hard to debug and to maintain.
>> > I'd
>> > > > like to know what are the alternatives? Is there HTML parser that
>> > create
>> > > DOM
>> > > > objects? I could program a postprocessor in beanshell...
>> > > >
>> > > > Thanks a lot
>> > >
>> > > That would be the XPath Extractor, but maybe someone can help you
>> build
>> > > a simpler regex instead (you need to share more details for this to
>> > > happen).
>> > >
>> > > Regards,
>> > > Felix
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
>> > > For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
>> > >
>> > >
>> >
>>
>
>

Re: Regulation Expression alternative

Posted by thanh nguyen <ma...@gmail.com>.
Hi all,

Thanks for your help! I have the regular expression to select data in 2
columns.

<table>
<tr>
<td>(.*?)<\td>(.*?)<td><\td>
<\tr>
<\table>

the selection is saved in a variable called MyVar. Then I loop through
MyVar_X_Y to access the data, where X is the row and Y the column.

What's the equivalent in xpath query?

Thanks
Thanh

On Mon, Jan 24, 2011 at 4:31 PM, Deepak Shetty <sh...@gmail.com> wrote:

> what is it that you want to select? all the columns? that are not titles
> would be something like
> //tbody/tr/td/span (but this will flatten out the structure)?
>
> regards
> deepak
>
> On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglistfan@gmail.com
> >wrote:
>
> > Felix,
> >
> > I'll have look at the xpath. it looks interesting. But I can't find any
> > example of code for xpath?
> > Thank you
> > Thanh
> >
> > ps: this is the table I'm working on. 1st row is the title. 2nd row
> > contains
> > data. I want to extract data1, data2....the regular expression reads row
> by
> > row. In the beanshell I do 2 loop: for each row and for each column.
> There
> > are rows number odd and rows number even.
> >
> >
> > <table>
> > <tr><th class="sbListHeaderCellEnd" scope="col" valign="top"
> width="5"><img
> > alt="" height="5" src="/assets/common/img/cnr_t_tl.gif"
> width="5"></th><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_name')" onclick="submitForm1023(event);return
> > false;" title="Sort by column Title">Title1</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title2</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_deliveryType')"
> > onclick="submitForm1024(event);return false;" title="Sort by column
> > Delivery
> > Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
> > border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_regStartDate')"
> > onclick="submitForm1025(event);return false;" title="Sort by column
> > Registration Date">Title4</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionStatus')"
> > onclick="submitForm1026(event);return false;" title="Sort by column
> > Completion Status">Title5</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionDate')"
> > onclick="submitForm1027(event);return false;" title="Sort by column Date
> > Marked Complete">Title6</a></span></th><td
> class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title7</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_score')"
> onclick="submitForm1028(event);return
> > false;" title="Sort by column Score">Title8</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_grade')"
> onclick="submitForm1029(event);return
> > false;" title="Sort by column Grade">Title9</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title10</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title11</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title12</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title13</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_startDate')"
> > onclick="submitForm1030(event);return false;" title="Sort by column
> > Offering
> > Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title15</span></th><th align="right"
> > class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
> alt=""
> > height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
> >
> > <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
> > class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('titleLink')" onclick="submitForm1031(event);return
> > false;" title="data1">data1</a></span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data2</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data3</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
> > class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('blah')"
> > onclick="submitForm1033(event);return false;" title="blah
> > blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data5</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data6</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data7</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data8</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data8</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
> > class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('editLink')" onclick="submitForm1035(event);return
> > false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('deleteLink')"
> > onclick="submitForm1036(event);return false;"
> > title="Delete">Delete</a><br><br></td><td
> > class="sbListOddCellEnd"></td></tr><tr>
> >
> > </table>
> >
> >
> >
> > On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:
> >
> > > On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > > > Hi everyone,
> > > >
> > > > I have a big HTML table from which I need to extract data. The table
> > has
> > > > several columns. The regulation expression required to do the
> > extraction
> > > job
> > > > is very long and complex. The code is hard to debug and to maintain.
> > I'd
> > > > like to know what are the alternatives? Is there HTML parser that
> > create
> > > DOM
> > > > objects? I could program a postprocessor in beanshell...
> > > >
> > > > Thanks a lot
> > >
> > > That would be the XPath Extractor, but maybe someone can help you build
> > > a simpler regex instead (you need to share more details for this to
> > > happen).
> > >
> > > Regards,
> > > Felix
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
> > >
> > >
> >
>

Re: Regulation Expression alternative

Posted by Duc Chau <ch...@gmail.com>.
Your sample HTML table is not well structured but it may help

Let's try <span class="sbListText">(.+?)</span>. Just tested using
http://gskinner.com/RegExr/. It returns data1, data2, etc.

On Tue, Jan 25, 2011 at 8:31 AM, Deepak Shetty <sh...@gmail.com> wrote:

> what is it that you want to select? all the columns? that are not titles
> would be something like
> //tbody/tr/td/span (but this will flatten out the structure)?
>
> regards
> deepak
>
> On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <mailinglistfan@gmail.com
> >wrote:
>
> > Felix,
> >
> > I'll have look at the xpath. it looks interesting. But I can't find any
> > example of code for xpath?
> > Thank you
> > Thanh
> >
> > ps: this is the table I'm working on. 1st row is the title. 2nd row
> > contains
> > data. I want to extract data1, data2....the regular expression reads row
> by
> > row. In the beanshell I do 2 loop: for each row and for each column.
> There
> > are rows number odd and rows number even.
> >
> >
> > <table>
> > <tr><th class="sbListHeaderCellEnd" scope="col" valign="top"
> width="5"><img
> > alt="" height="5" src="/assets/common/img/cnr_t_tl.gif"
> width="5"></th><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_name')" onclick="submitForm1023(event);return
> > false;" title="Sort by column Title">Title1</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title2</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_deliveryType')"
> > onclick="submitForm1024(event);return false;" title="Sort by column
> > Delivery
> > Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
> > border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_regStartDate')"
> > onclick="submitForm1025(event);return false;" title="Sort by column
> > Registration Date">Title4</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionStatus')"
> > onclick="submitForm1026(event);return false;" title="Sort by column
> > Completion Status">Title5</a></span></th><td
> > class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_completionDate')"
> > onclick="submitForm1027(event);return false;" title="Sort by column Date
> > Marked Complete">Title6</a></span></th><td
> class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title7</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_score')"
> onclick="submitForm1028(event);return
> > false;" title="Sort by column Score">Title8</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_grade')"
> onclick="submitForm1029(event);return
> > false;" title="Sort by column Grade">Title9</a></span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title10</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title11</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title12</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title13</span></th><td
> > class="sbListColumnSpacer"><img alt="" border="0" height="1"
> > src="/assets/common/img/1x1.gif" width="1"></td><th
> > class="sbListHeaderCell"
> > nowrap="true" scope="col"><img alt="" height="1"
> > src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText"><a class="sbListHeaderText"
> > href="javascript:void('sort_startDate')"
> > onclick="submitForm1030(event);return false;" title="Sort by column
> > Offering
> > Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
> > alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> > width="1"></td><th class="sbListHeaderCell" nowrap="true"
> scope="col"><img
> > alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> > class="sbListHeaderText">Title15</span></th><th align="right"
> > class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
> alt=""
> > height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
> >
> > <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
> > class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('titleLink')" onclick="submitForm1031(event);return
> > false;" title="data1">data1</a></span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data2</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data3</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
> > class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('blah')"
> > onclick="submitForm1033(event);return false;" title="blah
> > blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data5</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data6</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data7</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">data8</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> > class="sbListText">data8</span></td><td
> class="sbListColumnSpacer"></td><td
> > class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> > class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
> > class="sbLinkTableDisplay" doTruncate="false"
> > href="javascript:void('editLink')" onclick="submitForm1035(event);return
> > false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
> > doTruncate="false" href="javascript:void('deleteLink')"
> > onclick="submitForm1036(event);return false;"
> > title="Delete">Delete</a><br><br></td><td
> > class="sbListOddCellEnd"></td></tr><tr>
> >
> > </table>
> >
> >
> >
> > On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:
> >
> > > On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > > > Hi everyone,
> > > >
> > > > I have a big HTML table from which I need to extract data. The table
> > has
> > > > several columns. The regulation expression required to do the
> > extraction
> > > job
> > > > is very long and complex. The code is hard to debug and to maintain.
> > I'd
> > > > like to know what are the alternatives? Is there HTML parser that
> > create
> > > DOM
> > > > objects? I could program a postprocessor in beanshell...
> > > >
> > > > Thanks a lot
> > >
> > > That would be the XPath Extractor, but maybe someone can help you build
> > > a simpler regex instead (you need to share more details for this to
> > > happen).
> > >
> > > Regards,
> > > Felix
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
> > >
> > >
> >
>

Re: Regulation Expression alternative

Posted by Deepak Shetty <sh...@gmail.com>.
what is it that you want to select? all the columns? that are not titles
would be something like
//tbody/tr/td/span (but this will flatten out the structure)?

regards
deepak

On Mon, Jan 24, 2011 at 10:08 AM, thanh nguyen <ma...@gmail.com>wrote:

> Felix,
>
> I'll have look at the xpath. it looks interesting. But I can't find any
> example of code for xpath?
> Thank you
> Thanh
>
> ps: this is the table I'm working on. 1st row is the title. 2nd row
> contains
> data. I want to extract data1, data2....the regular expression reads row by
> row. In the beanshell I do 2 loop: for each row and for each column. There
> are rows number odd and rows number even.
>
>
> <table>
> <tr><th class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
> alt="" height="5" src="/assets/common/img/cnr_t_tl.gif" width="5"></th><th
> class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_name')" onclick="submitForm1023(event);return
> false;" title="Sort by column Title">Title1</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title2</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_deliveryType')"
> onclick="submitForm1024(event);return false;" title="Sort by column
> Delivery
> Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
> border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_regStartDate')"
> onclick="submitForm1025(event);return false;" title="Sort by column
> Registration Date">Title4</a></span></th><td
> class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_completionStatus')"
> onclick="submitForm1026(event);return false;" title="Sort by column
> Completion Status">Title5</a></span></th><td
> class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_completionDate')"
> onclick="submitForm1027(event);return false;" title="Sort by column Date
> Marked Complete">Title6</a></span></th><td class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title7</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_score')" onclick="submitForm1028(event);return
> false;" title="Sort by column Score">Title8</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_grade')" onclick="submitForm1029(event);return
> false;" title="Sort by column Grade">Title9</a></span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title10</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title11</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title12</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title13</span></th><td
> class="sbListColumnSpacer"><img alt="" border="0" height="1"
> src="/assets/common/img/1x1.gif" width="1"></td><th
> class="sbListHeaderCell"
> nowrap="true" scope="col"><img alt="" height="1"
> src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText"><a class="sbListHeaderText"
> href="javascript:void('sort_startDate')"
> onclick="submitForm1030(event);return false;" title="Sort by column
> Offering
> Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
> alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
> width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
> alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
> class="sbListHeaderText">Title15</span></th><th align="right"
> class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img alt=""
> height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>
>
> <tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
> class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
> href="javascript:void('titleLink')" onclick="submitForm1031(event);return
> false;" title="data1">data1</a></span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data2</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data3</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
> class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
> doTruncate="false" href="javascript:void('blah')"
> onclick="submitForm1033(event);return false;" title="blah
> blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data5</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data6</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data7</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">data8</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
> class="sbListText">data8</span></td><td class="sbListColumnSpacer"></td><td
> class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
> class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
> class="sbLinkTableDisplay" doTruncate="false"
> href="javascript:void('editLink')" onclick="submitForm1035(event);return
> false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
> doTruncate="false" href="javascript:void('deleteLink')"
> onclick="submitForm1036(event);return false;"
> title="Delete">Delete</a><br><br></td><td
> class="sbListOddCellEnd"></td></tr><tr>
>
> </table>
>
>
>
> On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:
>
> > On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > > Hi everyone,
> > >
> > > I have a big HTML table from which I need to extract data. The table
> has
> > > several columns. The regulation expression required to do the
> extraction
> > job
> > > is very long and complex. The code is hard to debug and to maintain.
> I'd
> > > like to know what are the alternatives? Is there HTML parser that
> create
> > DOM
> > > objects? I could program a postprocessor in beanshell...
> > >
> > > Thanks a lot
> >
> > That would be the XPath Extractor, but maybe someone can help you build
> > a simpler regex instead (you need to share more details for this to
> > happen).
> >
> > Regards,
> > Felix
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
> >
> >
>

RE: Regulation Expression alternative

Posted by Federico Ferrara <ff...@pragmaconsultores.com>.
Try with these
https://addons.mozilla.org/es-ES/firefox/addon/xpath-finder/?src=api
http://sourceforge.net/projects/xpe/

Federico Ferrara• Pragma Consultores
fferrara@pragmaconsultores.com
San Martín 575 2º Piso | (C1004AAK) Buenos Aires - Argentina
Tel: (+5411) 4327-1999 • Fax: (+5411) 4327-1997
www.pragmaconsultores.com
________________________________________
De: thanh nguyen [mailinglistfan@gmail.com]
Enviado el: lunes, 24 de enero de 2011 03:08 p.m.
Para: JMeter Users List
Asunto: Re: Regulation Expression alternative

Felix,

I'll have look at the xpath. it looks interesting. But I can't find any
example of code for xpath?
Thank you
Thanh

ps: this is the table I'm working on. 1st row is the title. 2nd row contains
data. I want to extract data1, data2....the regular expression reads row by
row. In the beanshell I do 2 loop: for each row and for each column. There
are rows number odd and rows number even.


<table>
<tr><th class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
alt="" height="5" src="/assets/common/img/cnr_t_tl.gif" width="5"></th><th
class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_name')" onclick="submitForm1023(event);return
false;" title="Sort by column Title">Title1</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title2</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_deliveryType')"
onclick="submitForm1024(event);return false;" title="Sort by column Delivery
Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_regStartDate')"
onclick="submitForm1025(event);return false;" title="Sort by column
Registration Date">Title4</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_completionStatus')"
onclick="submitForm1026(event);return false;" title="Sort by column
Completion Status">Title5</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_completionDate')"
onclick="submitForm1027(event);return false;" title="Sort by column Date
Marked Complete">Title6</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title7</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_score')" onclick="submitForm1028(event);return
false;" title="Sort by column Score">Title8</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_grade')" onclick="submitForm1029(event);return
false;" title="Sort by column Grade">Title9</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title10</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title11</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title12</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title13</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_startDate')"
onclick="submitForm1030(event);return false;" title="Sort by column Offering
Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title15</span></th><th align="right"
class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img alt=""
height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>

<tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
href="javascript:void('titleLink')" onclick="submitForm1031(event);return
false;" title="data1">data1</a></span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data2</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data3</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
doTruncate="false" href="javascript:void('blah')"
onclick="submitForm1033(event);return false;" title="blah
blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data5</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data6</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data7</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data8</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data8</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
class="sbLinkTableDisplay" doTruncate="false"
href="javascript:void('editLink')" onclick="submitForm1035(event);return
false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
doTruncate="false" href="javascript:void('deleteLink')"
onclick="submitForm1036(event);return false;"
title="Delete">Delete</a><br><br></td><td
class="sbListOddCellEnd"></td></tr><tr>

</table>



On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:

> On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > Hi everyone,
> >
> > I have a big HTML table from which I need to extract data. The table has
> > several columns. The regulation expression required to do the extraction
> job
> > is very long and complex. The code is hard to debug and to maintain. I'd
> > like to know what are the alternatives? Is there HTML parser that create
> DOM
> > objects? I could program a postprocessor in beanshell...
> >
> > Thanks a lot
>
> That would be the XPath Extractor, but maybe someone can help you build
> a simpler regex instead (you need to share more details for this to
> happen).
>
> Regards,
> Felix
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
>
>

El contenido de este mail o cualquier adjunto en el, es confidencial y solo pertenecen a la persona que figura como remitente. Si ha recibido este mail por error, por favor notifique al administrador del sistema. Cualquier opinion vertida o informacion publicada en el presente mail, pertenece a su autor y no obliga en ninguna medida a la empresa. La empresa no se responsabiliza en ningun modo, por el contenido de virus informaticos que este mail pueda contener, ni se responsabiliza por daños causados por el mismo.
__

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-user-help@jakarta.apache.org


Re: Regulation Expression alternative

Posted by thanh nguyen <ma...@gmail.com>.
Felix,

I'll have look at the xpath. it looks interesting. But I can't find any
example of code for xpath?
Thank you
Thanh

ps: this is the table I'm working on. 1st row is the title. 2nd row contains
data. I want to extract data1, data2....the regular expression reads row by
row. In the beanshell I do 2 loop: for each row and for each column. There
are rows number odd and rows number even.


<table>
<tr><th class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img
alt="" height="5" src="/assets/common/img/cnr_t_tl.gif" width="5"></th><th
class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_name')" onclick="submitForm1023(event);return
false;" title="Sort by column Title">Title1</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title2</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_deliveryType')"
onclick="submitForm1024(event);return false;" title="Sort by column Delivery
Type">Title3</a></span></th><td class="sbListColumnSpacer"><img alt=""
border="0" height="1" src="/assets/common/img/1x1.gif" width="1"></td><th
class="sbListHeaderCell" nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_regStartDate')"
onclick="submitForm1025(event);return false;" title="Sort by column
Registration Date">Title4</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_completionStatus')"
onclick="submitForm1026(event);return false;" title="Sort by column
Completion Status">Title5</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_completionDate')"
onclick="submitForm1027(event);return false;" title="Sort by column Date
Marked Complete">Title6</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title7</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_score')" onclick="submitForm1028(event);return
false;" title="Sort by column Score">Title8</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_grade')" onclick="submitForm1029(event);return
false;" title="Sort by column Grade">Title9</a></span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title10</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title11</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title12</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title13</span></th><td
class="sbListColumnSpacer"><img alt="" border="0" height="1"
src="/assets/common/img/1x1.gif" width="1"></td><th class="sbListHeaderCell"
nowrap="true" scope="col"><img alt="" height="1"
src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText"><a class="sbListHeaderText"
href="javascript:void('sort_startDate')"
onclick="submitForm1030(event);return false;" title="Sort by column Offering
Start Date">Title14</a></span></th><td class="sbListColumnSpacer"><img
alt="" border="0" height="1" src="/assets/common/img/1x1.gif"
width="1"></td><th class="sbListHeaderCell" nowrap="true" scope="col"><img
alt="" height="1" src="/assets/common/img/1x1.gif" width="30"><br><span
class="sbListHeaderText">Title15</span></th><th align="right"
class="sbListHeaderCellEnd" scope="col" valign="top" width="5"><img alt=""
height="5" src="/assets/common/img/cnr_t_tr.gif" width="5"></th></tr>

<tr><td class="sbListOddCellEnd"></td><td class="sbListOddCell"><span
class="sbListText"><a class="sbLinkTableDisplay" doTruncate="false"
href="javascript:void('titleLink')" onclick="submitForm1031(event);return
false;" title="data1">data1</a></span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data2</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data3</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText" nowrap="nowrap"><span
class="sbListText">data4</span><br><a class="sbLinkTableDisplay"
doTruncate="false" href="javascript:void('blah')"
onclick="submitForm1033(event);return false;" title="blah
blah">blah</a></span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data5</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">&nbsp;</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data6</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data7</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">data8</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell"><span
class="sbListText">data8</span></td><td class="sbListColumnSpacer"></td><td
class="sbListOddCell"><span class="sbListText">&nbsp;</span></td><td
class="sbListColumnSpacer"></td><td class="sbListOddCell" nowrap><a
class="sbLinkTableDisplay" doTruncate="false"
href="javascript:void('editLink')" onclick="submitForm1035(event);return
false;" title="Edit">Edit</a><br><a class="sbLinkTableDisplay"
doTruncate="false" href="javascript:void('deleteLink')"
onclick="submitForm1036(event);return false;"
title="Delete">Delete</a><br><br></td><td
class="sbListOddCellEnd"></td></tr><tr>

</table>



On Mon, Jan 24, 2011 at 10:34 AM, Felix Frank <ff...@mpexnet.de> wrote:

> On 01/24/2011 04:27 PM, thanh nguyen wrote:
> > Hi everyone,
> >
> > I have a big HTML table from which I need to extract data. The table has
> > several columns. The regulation expression required to do the extraction
> job
> > is very long and complex. The code is hard to debug and to maintain. I'd
> > like to know what are the alternatives? Is there HTML parser that create
> DOM
> > objects? I could program a postprocessor in beanshell...
> >
> > Thanks a lot
>
> That would be the XPath Extractor, but maybe someone can help you build
> a simpler regex instead (you need to share more details for this to
> happen).
>
> Regards,
> Felix
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: jmeter-user-help@jakarta.apache.org
>
>

Re: Regulation Expression alternative

Posted by Felix Frank <ff...@mpexnet.de>.
On 01/24/2011 04:27 PM, thanh nguyen wrote:
> Hi everyone,
> 
> I have a big HTML table from which I need to extract data. The table has
> several columns. The regulation expression required to do the extraction job
> is very long and complex. The code is hard to debug and to maintain. I'd
> like to know what are the alternatives? Is there HTML parser that create DOM
> objects? I could program a postprocessor in beanshell...
> 
> Thanks a lot

That would be the XPath Extractor, but maybe someone can help you build
a simpler regex instead (you need to share more details for this to happen).

Regards,
Felix

---------------------------------------------------------------------
To unsubscribe, e-mail: jmeter-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jmeter-user-help@jakarta.apache.org