You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Juan Jose Pablos <ch...@che-che.com> on 2003/09/17 08:39:30 UTC

Lucent and Xindice (Re: about lucent and exist)

Ramon Prades wrote:
 > Hi Juan Jose
 >
 > Do you think we should drop Lucene and use Xindice instead?

I think that we should not drop anything until we get a replacement that 
improves the actual situation. Lucene works and there is room for Lucene 
and xindice.


 > - Populate the database using a crawler and cocoon's xml-views.

Doing this it will allow to populate your indices from varios sources, 
not only files. But this implementation is independent on wherever you 
use Xindice or Lucene.


 > - Create a search page with a number of options as in "search in 
content",
 > "search in title" and so on.

I have been thinking a bit on this. Not about the search page itself, 
but about the power of been able to search to any XML format and get a 
link to the HTML/PDF page makes a big step.

But on todays forrest's situation we only have a few xml schemas:

document
howto
faq
changes/todo/contributors??
book/site
sdocbook/docbook


Out of these schema I have not found many use case examples of search:

Document-v*
-----------
Search for an author/person
Search for an acronym
Search for a figure.
Search for fixme notes.

Howto
-----------
Search for an author/person
Search for an audience (novice... etc)

FAQ
-----------
Search for an author/person
Search for a question.
Search for an answer.

...


So The work actually neede to implement in our actual release does not 
requiere much.

What do you think?

Cheers,
Cheche


> 
> This is what I think:
> 
> - Use Xindice.
> - Populate the database using a crawler and cocoon's xml-views.
> - Create a search page with a number of options as in "search in content",
> "search in title" and so on.
> 
> Regards.
> 
> Ramón
> 
> 
>>-----Mensaje original-----
>>De: Juan Jose Pablos [mailto:cheche@che-che.com] 
>>Enviado el: sábado, 13 de septiembre de 2003 17:56
>>Para: forrest-dev@xml.apache.org
>>Asunto: Re: about lucent and exist
>>
>>
>>Stefano Mazzocchi wrote:
>>
>>>Lucene is based on algorithms that don't allow the above.
>>>
>>
>>Thanks for backing this up. That was my initial feeling.
>>
>>
>>>For that, you need what is called an "xml database", which 
>>
>>could be, 
>>
>>>in
>>>the most simple case, a collection of files in a file 
>>
>>system and a very 
>>
>>>slow incremental collector that opens all files, scans them 
>>
>>and collects 
>>
>>>the matching elements and returns the results as a new 
>>
>>document. In the 
>>
>>>best case, it's a semi-structured database with multidimensional 
>>>indexing features (exist and xindice are much closer to that).
>>>
>>
>>I am happy to look at xindice.
>>
>>
>>>You are trying to create "virtual documents" out of 
>>
>>XML-aware queries
>>
>>>over a repository of hierarchical content (not necessarely XML, but 
>>>XML-viewable).
>>
>>Are you saying that because we are making the request to document-v12 
>>schema? I am not sure about this. I am not thinking about doing the 
>>request to the document-v12 schema.
>>
>>In Forrest we are importing from another schema and on that 
>>process we 
>>are losing information ( i.e. <author/> becames <p> ). So I 
>>would like 
>>to get a search on the source and get the results to where I can 
>>retrieve that document.
>>
>>
>>>Eh, if it was that easy. You are implying that:
>>>
>>> 1) a tag is used to indicate the semantics of the nodes contained
>>>therein. Although this is generally the case (and there is 
>>
>>the ability 
>>
>>>to have RDF/XML to performm this way) this is not generalizable.
>>
>>I would like to see an example on this.
>>
>>
>>> 2) without namespaces, there is a tremendous semantic 
>>
>>collision. With
>>
>>>namespaces, you are assuming that the namespace refers to 
>>
>>the 'meaning' 
>>
>>>of the tag, again not generalizable.
>>>
>>
>>ok, I have not mention anything about namespaces, the request 
>>that put 
>>as an example only deals with faq schema. I had not thought 
>>about multi 
>>  namespace documents or other type of XML input.
>>
>>
>>>This said, I agree that having the ability to run XQuery 
>>
>>queries over a 
>>
>>>content repository that exposes XML views would be a 
>>
>>tremendous help.
>>
>>>Just don't call it "semantic searching", because that's not 
>>
>>even close 
>>
>>>(but very few are able to explain the difference and the 
>>
>>reason why we 
>>
>>>need the entire RDF stack in the first place, so don't worry).
>>>
>>>-- 
>>>Stefano.
>>
>>ok, I will not used that name, I will not worry either.
>>
>>Cheers,
>>Cheche
>>
>>
> 
> 
> 



RE: Lucent and Xindice (Re: about lucent and exist)

Posted by Ramon Prades <rp...@porcelanosa.com>.
Hi Cheche

Reading Xindice docs I've seen it requires a server daemon to be started,
and I'm not sure if this is what we want.

At first sight, Xindice will bring a lot of power to Forrest. Apart from the
searching tool, Xindice can be used to alter existing docs. For example (I
don't want to start a discussion to see if this is a good idea or not - this
is just an example), we could have Forrest to add automatically check-boxes
next to todo items so an administrator can mark them as completed and
Xindice could move the item from "todo" to "changes" (again, it's just an
example). Another example can be a "What's new" page generated with the help
of Xindice.

But on second thoughts maybe using a service that needs to be started and so
on will go against the simplicity of Forrest, demanding more configuration
and maybe more administration. 

I will keep thinking about this issues, but in my opinion whatever we do has
to be done with the ultimate goal of keeping Forrest as simple and easy to
use as it is now (and that includes static sites).

Any comments?

Ramon



> -----Mensaje original-----
> De: Juan Jose Pablos [mailto:cheche@che-che.com] 
> Enviado el: miércoles, 17 de septiembre de 2003 8:40
> Para: forrest-dev@xml.apache.org
> Asunto: Lucent and Xindice (Re: about lucent and exist)
> 
> 
> Ramon Prades wrote:
>  > Hi Juan Jose
>  >
>  > Do you think we should drop Lucene and use Xindice instead?
> 
> I think that we should not drop anything until we get a 
> replacement that 
> improves the actual situation. Lucene works and there is room 
> for Lucene 
> and xindice.
> 
> 
>  > - Populate the database using a crawler and cocoon's xml-views.
> 
> Doing this it will allow to populate your indices from varios 
> sources, 
> not only files. But this implementation is independent on 
> wherever you 
> use Xindice or Lucene.
> 
> 
>  > - Create a search page with a number of options as in "search in 
> content",
>  > "search in title" and so on.
> 
> I have been thinking a bit on this. Not about the search page itself, 
> but about the power of been able to search to any XML format 
> and get a 
> link to the HTML/PDF page makes a big step.
> 
> But on todays forrest's situation we only have a few xml schemas:
> 
> document
> howto
> faq
> changes/todo/contributors??
> book/site
> sdocbook/docbook
> 
> 
> Out of these schema I have not found many use case examples of search:
> 
> Document-v*
> -----------
> Search for an author/person
> Search for an acronym
> Search for a figure.
> Search for fixme notes.
> 
> Howto
> -----------
> Search for an author/person
> Search for an audience (novice... etc)
> 
> FAQ
> -----------
> Search for an author/person
> Search for a question.
> Search for an answer.
> 
> ...
> 
> 
> So The work actually neede to implement in our actual release 
> does not 
> requiere much.
> 
> What do you think?
> 
> Cheers,
> Cheche
> 
> 
> > 
> > This is what I think:
> > 
> > - Use Xindice.
> > - Populate the database using a crawler and cocoon's xml-views.
> > - Create a search page with a number of options as in "search in 
> > content", "search in title" and so on.
> > 
> > Regards.
> > 
> > Ramón
> > 
> > 
> >>-----Mensaje original-----
> >>De: Juan Jose Pablos [mailto:cheche@che-che.com]
> >>Enviado el: sábado, 13 de septiembre de 2003 17:56
> >>Para: forrest-dev@xml.apache.org
> >>Asunto: Re: about lucent and exist
> >>
> >>
> >>Stefano Mazzocchi wrote:
> >>
> >>>Lucene is based on algorithms that don't allow the above.
> >>>
> >>
> >>Thanks for backing this up. That was my initial feeling.
> >>
> >>
> >>>For that, you need what is called an "xml database", which
> >>
> >>could be,
> >>
> >>>in
> >>>the most simple case, a collection of files in a file
> >>
> >>system and a very
> >>
> >>>slow incremental collector that opens all files, scans them
> >>
> >>and collects
> >>
> >>>the matching elements and returns the results as a new
> >>
> >>document. In the
> >>
> >>>best case, it's a semi-structured database with multidimensional
> >>>indexing features (exist and xindice are much closer to that).
> >>>
> >>
> >>I am happy to look at xindice.
> >>
> >>
> >>>You are trying to create "virtual documents" out of
> >>
> >>XML-aware queries
> >>
> >>>over a repository of hierarchical content (not necessarely XML, but
> >>>XML-viewable).
> >>
> >>Are you saying that because we are making the request to 
> document-v12
> >>schema? I am not sure about this. I am not thinking about doing the 
> >>request to the document-v12 schema.
> >>
> >>In Forrest we are importing from another schema and on that
> >>process we 
> >>are losing information ( i.e. <author/> becames <p> ). So I 
> >>would like 
> >>to get a search on the source and get the results to where I can 
> >>retrieve that document.
> >>
> >>
> >>>Eh, if it was that easy. You are implying that:
> >>>
> >>> 1) a tag is used to indicate the semantics of the nodes contained 
> >>>therein. Although this is generally the case (and there is
> >>
> >>the ability
> >>
> >>>to have RDF/XML to performm this way) this is not generalizable.
> >>
> >>I would like to see an example on this.
> >>
> >>
> >>> 2) without namespaces, there is a tremendous semantic
> >>
> >>collision. With
> >>
> >>>namespaces, you are assuming that the namespace refers to
> >>
> >>the 'meaning'
> >>
> >>>of the tag, again not generalizable.
> >>>
> >>
> >>ok, I have not mention anything about namespaces, the request
> >>that put 
> >>as an example only deals with faq schema. I had not thought 
> >>about multi 
> >>  namespace documents or other type of XML input.
> >>
> >>
> >>>This said, I agree that having the ability to run XQuery
> >>
> >>queries over a
> >>
> >>>content repository that exposes XML views would be a
> >>
> >>tremendous help.
> >>
> >>>Just don't call it "semantic searching", because that's not
> >>
> >>even close
> >>
> >>>(but very few are able to explain the difference and the
> >>
> >>reason why we
> >>
> >>>need the entire RDF stack in the first place, so don't worry).
> >>>
> >>>--
> >>>Stefano.
> >>
> >>ok, I will not used that name, I will not worry either.
> >>
> >>Cheers,
> >>Cheche
> >>
> >>
> > 
> > 
> > 
> 
> 
> 
>