You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@any23.apache.org by "McBennett, Pat" <Mc...@DNB.com.INVALID> on 2015/04/29 18:11:31 UTC

How to configure Any23 programmatically in Java?

Hi,

I've just started trying to use Any23 programmatically from Java, and it looks great.
The documentation has sample code [1], but that code seems out-of-date (the webpage it attempts to extract from (http://www.rentalinrome.com/semanticloft/semanticloft.htm) has changed I think), and it has a syntax error (the word 'Apache' appears twice on line 1, which doesn't make any sense).

My questions are simply:

1.      How do I configure the 'Any23' instance in this code? I know the constructor takes a Properties instance, but where are the currently supported properties documented? For instance, how do I set the timeout for the connection attempt?

2.      This code sample doesn't seem to crawl from the webpage I provide - it just scans that one page. So is there a code sample for crawling a website (with code to show how to configure the MaxPages and MaxDepth)?

Thanks,

Pat.

[1] - http://any23.apache.org/dev-data-extraction.html


[cid:image001.png@01D08297.0F603420]

Pat McBennett
Architect
The Chase Building, 5th Floor
Carmanhall Road, Sandyford,
Dublin 18, Ireland
Direct +353 1
Mobile +353 8

http://www.dnb.co.uk/

[cid:image002.png@01D08297.0F603420]<http://www.facebook.com/DunBradstreet>[cid:image003.png@01D08297.0F603420]<http://twitter.com/dnbus>[cid:image004.png@01D08297.0F603420]<http://www.linkedin.com/company/dun-&-bradstreet>[cid:image005.png@01D08297.0F603420]<http://www.youtube.com/user/DunandBrad>
[cid:image006.png@01D08297.0F603420]

The information contained in this electronic message and any attachments (the "Message") is intended for one or more specific individuals or entities, and may be confidential, proprietary, privileged or otherwise protected by law. If you are not the intended recipient (or you are not authorised to receive for the recipient), please notify the sender immediately, delete this Message and do not disclose, distribute, or copy it to any third party or otherwise use this Message. Electronic messages are not secure or error free and can contain viruses or may be delayed and the sender is not liable for any of these occurrences. The sender reserves the right to monitor, record, transfer cross border and retain electronic messages.
"D&B" is a trading style of D&B Business Information Solutions is registered in Ireland. www.dnb.co.uk