You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Juan Jose Pablos <ch...@apache.org> on 2005/06/01 12:09:37 UTC

Re: Forrest as an XML repository

FYI:

Ricardo Beltran wrote:
> Hello Forrest developers team:
> First of all I would like to thank your efforts to
> bring to reality this great project!
> 
> I'm planning to use Forrest in a project in Mexico,
> for a Social Sciences University in Mexico. We have
> about 10,000 polls about public opinion from the last
> 18 years of the Mexican history. Those polls were
> written (and executed) in Clipper (prg). I have a DTD
> that describes the content and structure of these
> polls, my plan is to transform those Clipper files to
> XML and  use Forrest to make publicly available this
> information.
> As you can imagine there's a lot of information (about
> 5 GB) and it is very important to have a mechanism to
> search all this info using keywords. As you can see
> I'm planning to use Forrest as an XML repository and
> use Lucene or Google as my search engine.
> My questions are: 
> Do you think that Forrest is an appropriate framework
> for this purpose? and Do you think that Lucene or
> Google will do the job of indexing about (5 GB) of XML
> files?
> If not, do you know some other project that could be
> suitable for this purpose.
> 
> For your attention to this e-mail thanks a lot.
> Best Regards
> 
> Ricardo Beltran
> ricardobeltran@ieee.org
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Read only the mail you want - Yahoo! Mail SpamGuard. 
> http://promotions.yahoo.com/new_mail 
> 
>

Re: Forrest as an XML repository

Posted by Ricardo Beltrán <fa...@gmail.com>.

Thanks for your help!
I would consider seriously your opinions, by now we aren't sure to
start this project because we do not have the resources yet, but if I
use forrest to build it I will stay in contact with you trough the
mailing list

Best regards
Ricardo Beltran
ricardobeltran@ieee.org

On 03/06/05, Ross Gardler <rg...@apache.org> wrote:
> Juan Jose Pablos wrote:
> > FYI:
> >
> > Ricardo Beltran wrote:
> 
> I've CC'd Ricardo on this reply - please reply all.
> 
> ...
> 
> >> My questions are: Do you think that Forrest is an appropriate framework
> >> for this purpose? and Do you think that Lucene or
> >> Google will do the job of indexing about (5 GB) of XML
> >> files?
> 
> I can't comment with authority on the suitability of Google or Lucene
> for this as I have no experience. My gut is telling me that this is not
> the optimal solution.
> 
> I do have a project that has around 8Gb of dynamic data being published
> via the Forrest webapp.
> 
> The solution I employed, and one that appears to be working well, was to
> have the data in an XML enabled database, in this case we used Oracle,
> but we have successfully used XIndice and eXist in similar, smaller,
> projects in the past. I wrote a custom generator to retrieve the data
> from the DBMS.
> 
> It should be noted that Cocoon has some database components that can be
> utilised (there is the results of some early experiments of I did with
> these components in the whiteboard plugin
> org.apache.forrest.plugin.Database). The reason I never completed work
> on this plugin was not a problem with it, but additional requirements
> that made it easier to build a custom generator (our requests were also
> dependant on live data from sensor readings over an RS232 port).
> 
> The system has now been running for about 3 months and we are very happy
> with it. Because we are using a Database server as the repository we
> have all the indexing and optimisation provided by that server. We also
> have the benefit of a very expressive and mature search language.
> 
> Of course, this solution requires that you run the system dynamically.
> Using Google to index your site would allow you to run statically.
> Trying to build a static site from 5GB of data would be a wonderful
> stress test, if you do this please report your findings to us.
> 
> Ross
>

Re: Forrest as an XML repository

Posted by Ross Gardler <rg...@apache.org>.

Juan Jose Pablos wrote:
> FYI:
> 
> Ricardo Beltran wrote:

I've CC'd Ricardo on this reply - please reply all.

...

>> My questions are: Do you think that Forrest is an appropriate framework
>> for this purpose? and Do you think that Lucene or
>> Google will do the job of indexing about (5 GB) of XML
>> files?

I can't comment with authority on the suitability of Google or Lucene 
for this as I have no experience. My gut is telling me that this is not 
the optimal solution.

I do have a project that has around 8Gb of dynamic data being published 
via the Forrest webapp.

The solution I employed, and one that appears to be working well, was to 
have the data in an XML enabled database, in this case we used Oracle, 
but we have successfully used XIndice and eXist in similar, smaller, 
projects in the past. I wrote a custom generator to retrieve the data 
from the DBMS.

It should be noted that Cocoon has some database components that can be 
utilised (there is the results of some early experiments of I did with 
these components in the whiteboard plugin 
org.apache.forrest.plugin.Database). The reason I never completed work 
on this plugin was not a problem with it, but additional requirements 
that made it easier to build a custom generator (our requests were also 
dependant on live data from sensor readings over an RS232 port).

The system has now been running for about 3 months and we are very happy 
with it. Because we are using a Database server as the repository we 
have all the indexing and optimisation provided by that server. We also 
have the benefit of a very expressive and mature search language.

Of course, this solution requires that you run the system dynamically. 
Using Google to index your site would allow you to run statically. 
Trying to build a static site from 5GB of data would be a wonderful 
stress test, if you do this please report your findings to us.

Ross