You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Peter Klotz <pk...@iconet.wn.bawue.de> on 2003/04/09 15:16:59 UTC

Help: XML Searching/Indexing with Cocoon

Hi,

(Cocoon 2.0.4)
I'm trying to generate a index such that a search would present me usefull
results. I have a pipeline that provides me all the XML content that I
want to index on. I found out that I have to have URLs in there that I
would later want to get as a result instead of just the content. So I see
that the Cocoon crawler looks for these URLs and retrieves them. But still
I get back as search result the URL that I used to retrieve the starting
XML data.
This is of course not at all what I want!

Probably I'm not understanding something principal here. I have defined a
view with label "content" and that label is defined in all generators that
provide the XML data.

<map:view name="content" from-label="content"
  <map:serialize type="links"/>
</map:view>

I can also see that when I use some url?content-view=content
that I get the correct XML content back, so that view seems to work.
Is that enough or do I have to use a "links" view?
BTW, I'm using the create-index.xsp and SearchGenerator from the
search-Sample in Cocoon.

Now the question is how can I specify what the result of a search should
look like, where do the links that I can click on as a result come from?
I thought that when I provide href="" in the XML to index these URLs will
be served when the string I'm searching on appears in the element that has
this link or?
Is there a way to look what's in the binary index files?

Please, please anybody explain this in a bit more detail!


Thanks, Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Help: XML Searching/Indexing with Cocoon

Posted by Upayavira <uv...@upaya.co.uk>.
Peter,

> I saw this, but it only defines what should be indexed. You would then
> define in cocoon.xconf as configuration to cocoon-xml-indexer
> <store-fields>title</store-fields>
> <store-fields>summary</store-fields>
> 
> At least that is my interpretation of the source code of the XML
> indexer. But that piece of XML does not contain any URL that should be
> returned if the search finds some text in title or summary? So how
> should it work then?

As you can see from:

http://archives.real-time.com/pipermail/cocoon-users/2002-December/026935.html

The feature I want is only present in 2.1, i.e. to display some useful text along with 
the 
URL.
  
> the search would maybe return the URL that produced this XML?
> But in my case it crawls URLS and still does not return these URLS.
> 
> > I only spotted this a few days ago (original message 18 March), but
> > have not yet got a  response to my posting of a few days ago.
> 
> I'm not sure whether you need these "well-known" view names "content"
> "links" or both?

Okay. In cocoon.xconf you need to specify a view that is used to gather content 
and a 
view that is used to gather links for crawling.

I found that, even if I specified the content link as 'lucene-content', it still used the 
'content' view. So it seems best to make sure that the content view returns exactly 
what you want to have indexed.

Here's my extract from cocoon.xconf:

 <cocoon-crawler logger="core.search.crawler">
    
<exclude>.*/search/.*,.*\.gif$,.*\.jpg$,.*\.css$,arts/.*,books/.*,articles/.*,/centres/.*</
ex
clude>
    <link-view-query>cocoon-view=lucene-links</link-view-query>
  </cocoon-crawler>
  <lucene-xml-indexer logger="core.search.lucene">
    <store-fields>body</store-fields>
    <content-view-query>cocoon-view=content</content-view-query>
  </lucene-xml-indexer>

Hope that helps.

Upayavira

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Help: XML Searching/Indexing with Cocoon

Posted by Upayavira <uv...@upaya.co.uk>.
Peter,

> I saw this, but it only defines what should be indexed. You would then
> define in cocoon.xconf as configuration to cocoon-xml-indexer
> <store-fields>title</store-fields>
> <store-fields>summary</store-fields>
> 
> At least that is my interpretation of the source code of the XML
> indexer. But that piece of XML does not contain any URL that should be
> returned if the search finds some text in title or summary? So how
> should it work then?

As you can see from:

http://archives.real-time.com/pipermail/cocoon-users/2002-December/026935.html

The feature I want is only present in 2.1, i.e. to display some useful text along with the 
URL.
  
> the search would maybe return the URL that produced this XML?
> But in my case it crawls URLS and still does not return these URLS.
> 
> > I only spotted this a few days ago (original message 18 March), but
> > have not yet got a  response to my posting of a few days ago.
> 
> I'm not sure whether you need these "well-known" view names "content"
> "links" or both?

Okay. In cocoon.xconf you need to specify a view that is used to gather content and a 
view that is used to gather links for crawling.

I found that, even if I specified the content link as 'lucene-content', it still used the 
'content' view. So it seems best to make sure that the content view returns exactly 
what you want to have indexed.

Here's my extract from cocoon.xconf:

 <cocoon-crawler logger="core.search.crawler">
    
<exclude>.*/search/.*,.*\.gif$,.*\.jpg$,.*\.css$,arts/.*,books/.*,articles/.*,/centres/.*</ex
clude>
    <link-view-query>cocoon-view=lucene-links</link-view-query>
  </cocoon-crawler>
  <lucene-xml-indexer logger="core.search.lucene">
    <store-fields>body</store-fields>
    <content-view-query>cocoon-view=content</content-view-query>
  </lucene-xml-indexer>

Hope that helps.

Upayavira

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Help: XML Searching/Indexing with Cocoon

Posted by Peter Klotz <pk...@iconet.wn.bawue.de>.
Hi,

> Jeremy Quinn posted a reply to one of my queries that suggested an
> answer to this  question. This is his XML to be indexed:
>
>> <body>
>>  <title>title gets stored, then displayed with hit</title>
>>  <summary>summary gets stored, then displayed with hit</summary> all
>> of my body content with tags stripped out
>> </body>
I saw this, but it only defines what should be indexed. You would then
define in cocoon.xconf as configuration to cocoon-xml-indexer
<store-fields>title</store-fields>
<store-fields>summary</store-fields>

At least that is my interpretation of the source code of the XML indexer.
But that piece of XML does not contain any URL that should be returned if
the search finds some text in title or summary? So how should it work
then?

the search would maybe return the URL that produced this XML?
But in my case it crawls URLS and still does not return these URLS.

> I only spotted this a few days ago (original message 18 March), but have
> not yet got a  response to my posting of a few days ago.

I'm not sure whether you need these "well-known" view names "content"
"links" or both?


Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Help: XML Searching/Indexing with Cocoon

Posted by Upayavira <uv...@upaya.co.uk>.
> I'm trying to generate a index such that a search would present me
> usefull results. I have a pipeline that provides me all the XML
> content that I want to index on. I found out that I have to have URLs
> in there that I would later want to get as a result instead of just
> the content. So I see that the Cocoon crawler looks for these URLs and
> retrieves them. But still I get back as search result the URL that I
> used to retrieve the starting XML data. This is of course not at all
> what I want!

Jeremy Quinn posted a reply to one of my queries that suggested an answer to this 
question. This is his XML to be indexed:

> <body>
>  <title>title gets stored, then displayed with hit</title>
>  <summary>summary gets stored, then displayed with hit</summary>
>  all of my body content with tags stripped out
> </body>

I only spotted this a few days ago (original message 18 March), but have not yet got a 
response to my posting of a few days ago.

Let's hope he, or someone else, can enlighten us.

Regards, Upayavira

Regards, Upayavira

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org