You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ofbiz.apache.org by Adrian Crum <ad...@hlmksw.com> on 2008/07/08 18:53:35 UTC

Gathering Data From External Websites

I need OFBiz to gather data from external websites - so that data can be 
extracted from the HTML. Is there anything like that in OFBiz? Has 
anyone else done something similar?

-Adrian

Re: Gathering Data From External Websites

Posted by Adrian Crum <ad...@hlmksw.com>.
Thank you Al and Abhishake!

The JTidy library is what I was looking for. I wanted to convert an 
external HTML page to a org.w3c.dom.Document object and JTidy does 
exactly that.

Now I want to put some kind of wrapper around the Document object so I 
can work with it in minilang.

-Adrian

Abhishake Agarwal wrote:
> Hello Adrian,
> 
> I don't know whether ofbiz has this, but I have done similar thing using a
> API called html parser. you can search it on google.
> 
> Regards,
> Abhishake
> 
> On Tue, Jul 8, 2008 at 10:50 PM, Al Byers <by...@automationgroups.com>
> wrote:
> 
>> Adrian,
>>
>> In the past I have used JTidy to make sure it is in XHTML and then wrote
>> Freemarker scripts to process the markup. I find FM to be easier to use
>> than
>> XSLT because it has a loop index var and it is easier to connect it with
>> Java classes that you may wish to write to help in the processing.
>>
>> -Al
>>
>> On Tue, Jul 8, 2008 at 10:53 AM, Adrian Crum <ad...@hlmksw.com> wrote:
>>
>>> I need OFBiz to gather data from external websites - so that data can be
>>> extracted from the HTML. Is there anything like that in OFBiz? Has anyone
>>> else done something similar?
>>>
>>> -Adrian
>>>
> 

Re: Gathering Data From External Websites

Posted by Abhishake Agarwal <ab...@gmail.com>.
Hello Adrian,

I don't know whether ofbiz has this, but I have done similar thing using a
API called html parser. you can search it on google.

Regards,
Abhishake

On Tue, Jul 8, 2008 at 10:50 PM, Al Byers <by...@automationgroups.com>
wrote:

> Adrian,
>
> In the past I have used JTidy to make sure it is in XHTML and then wrote
> Freemarker scripts to process the markup. I find FM to be easier to use
> than
> XSLT because it has a loop index var and it is easier to connect it with
> Java classes that you may wish to write to help in the processing.
>
> -Al
>
> On Tue, Jul 8, 2008 at 10:53 AM, Adrian Crum <ad...@hlmksw.com> wrote:
>
> > I need OFBiz to gather data from external websites - so that data can be
> > extracted from the HTML. Is there anything like that in OFBiz? Has anyone
> > else done something similar?
> >
> > -Adrian
> >
>

Re: Gathering Data From External Websites

Posted by Al Byers <by...@automationgroups.com>.
Adrian,

In the past I have used JTidy to make sure it is in XHTML and then wrote
Freemarker scripts to process the markup. I find FM to be easier to use than
XSLT because it has a loop index var and it is easier to connect it with
Java classes that you may wish to write to help in the processing.

-Al

On Tue, Jul 8, 2008 at 10:53 AM, Adrian Crum <ad...@hlmksw.com> wrote:

> I need OFBiz to gather data from external websites - so that data can be
> extracted from the HTML. Is there anything like that in OFBiz? Has anyone
> else done something similar?
>
> -Adrian
>