You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by so...@gmail.com on 2005/04/15 06:43:24 UTC

BUG FIX for Lucene Indexing

File:
\apache-lenya-1.2.2\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index\configuration2xslt.xsl

Add the following line:
<xsl:template match="namespace"/>

Without it, adding the language to the lucene index returns:
"http://apache.org/cocoon/lenya/page-envelope/1.0
http://purl.org/dc/elements/1.1/ en"
With the line, the request returns:
"en"

I can not think of any reason to include namespaces in the data.

(I am rebuilding Publication Search, and will post the instructions to
the User ML.  Many files are requiring many changes.  I'll post
obvious bug fixes here.)

Thanks,
solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: BUG FIX for Lucene Indexing

Posted by Andreas Hartmann <an...@apache.org>.
solprovider@gmail.com wrote:
> On 4/15/05, Andreas Hartmann <an...@apache.org> wrote:
> 
>>>Add the following line:
>>><xsl:template match="namespace"/>
>>
>>Maybe I don't understand, but shouldn't that read
>><xsl:template match="namespace()"/> ?
>>-- Andreas
> 
> 
> No.  See other response for explanation. - solprovider

Yes, sorry - I wasn't really familiar with this issue.

-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: BUG FIX for Lucene Indexing

Posted by so...@gmail.com.
On 4/15/05, Andreas Hartmann <an...@apache.org> wrote:
> > Add the following line:
> > <xsl:template match="namespace"/>
> 
> Maybe I don't understand, but shouldn't that read
> <xsl:template match="namespace()"/> ?
> -- Andreas

No.  See other response for explanation. - solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: BUG FIX for Lucene Indexing

Posted by Andreas Hartmann <an...@apache.org>.
solprovider@gmail.com wrote:
> File:
> \apache-lenya-1.2.2\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index\configuration2xslt.xsl
> 
> Add the following line:
> <xsl:template match="namespace"/>

Maybe I don't understand, but shouldn't that read

<xsl:template match="namespace()"/> ?

-- Andreas

> 
> Without it, adding the language to the lucene index returns:
> "http://apache.org/cocoon/lenya/page-envelope/1.0
> http://purl.org/dc/elements/1.1/ en"
> With the line, the request returns:
> "en"
> 
> I can not think of any reason to include namespaces in the data.
> 
> (I am rebuilding Publication Search, and will post the instructions to
> the User ML.  Many files are requiring many changes.  I'll post
> obvious bug fixes here.)
> 
> Thanks,
> solprovider


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: BUG FIX for Lucene Indexing

Posted by so...@gmail.com.
On 4/15/05, Michael Wechner <mi...@wyona.com> wrote:
> solprovider@gmail.com wrote:
> >File:
>\apache-lenya-1.2.2\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index\configuration2xslt.xsl
> >Add the following line:
> ><xsl:template match="namespace"/>
> 
> I don't fully understand to be honest. Can you give a concrete example?
> Thanks
> Michi

I am teaching lucene to index the XML files under {pub}/content/live. 
The configuration is:
FILE: {pub}\config\search\lucene-live.xconf
<?xml version="1.0"?>
<lucene>
  <update-index type="new"/>
  <index-dir src="../../work/search/lucene/index/live/index"/>
  <htdocs-dump-dir src="../../content/live"/>
  <indexer class="org.apache.lenya.lucene.index.ConfigurableIndexer">
    <configuration src="lenyadocs.xconf"/>
    <extensions src="xml"/>
   </indexer>
</lucene>

FILE: {pub}\config\search\lenyadocs.xconf
<?xml version="1.0"?>
<luc:document xmlns:luc="http://apache.org/cocoon/lenya/lucene/1.0">
  <luc:field name="title" type="Text">
    <namespace prefix="lenya">http://apache.org/cocoon/lenya/page-envelope/1.0</namespace>
    <namespace prefix="dc">http://purl.org/dc/elements/1.1/</namespace>
    <xpath>/*/lenya:meta/dc:subject</xpath>
  </luc:field>
  <luc:field name="htmltitle" type="Text">
    <namespace prefix="xhtml">http://www.w3.org/1999/xhtml</namespace>
    <xpath>/xhtml:html/xhtml:head/xhtml:title</xpath>
  </luc:field>
  <luc:field name="language" type="Text">
    <namespace prefix="lenya">http://apache.org/cocoon/lenya/page-envelope/1.0</namespace>
    <namespace prefix="dc">http://purl.org/dc/elements/1.1/</namespace>
    <xpath>/*/lenya:meta/dc:language</xpath>
  </luc:field>
  <luc:field name="description" type="Text">
    <namespace prefix="lenya">http://apache.org/cocoon/lenya/page-envelope/1.0</namespace>
    <namespace prefix="dc">http://purl.org/dc/elements/1.1/</namespace>
    <xpath>/*/lenya:meta/dc:description</xpath>
  </luc:field>
  <luc:field name="htmlbody" type="Text">
    <namespace prefix="xhtml">http://www.w3.org/1999/xhtml</namespace>
    <xpath>/xhtml:html/xhtml:body</xpath>
  </luc:field>
  <luc:field name="contents" type="UnStored" xpath="/"/>
</luc:document>

Lucene errors if each namespace is not specified in the field
description.  (Anybody know how to specify the namespaces outside the
field tags?)  Without that line in "configuration2xslt.xsl", the
namespaces are included with the data.  The namespaces need to be
removed, and it is easier (and better design) to exclude them during
data creation than remove them later.

NOTE: The filename "lenyadocs.xconf" was arbitrary.
NOTE: I am uncertain which fields should be used in the search
results, so the configuration for the index includes extra
possibilities for title and excerpts.

solprovider

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org


Re: BUG FIX for Lucene Indexing

Posted by Michael Wechner <mi...@wyona.com>.
solprovider@gmail.com wrote:

>File:
>\apache-lenya-1.2.2\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index\configuration2xslt.xsl
>
>Add the following line:
><xsl:template match="namespace"/>
>
>Without it, adding the language to the lucene index returns:
>"http://apache.org/cocoon/lenya/page-envelope/1.0
>http://purl.org/dc/elements/1.1/ en"
>With the line, the request returns:
>"en"
>  
>

I don't fully understand to be honest. Can you give a concrete example?

Thanks

Michi

>I can not think of any reason to include namespaces in the data.
>
>(I am rebuilding Publication Search, and will post the instructions to
>the User ML.  Many files are requiring many changes.  I'll post
>obvious bug fixes here.)
>
>Thanks,
>solprovider
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
>For additional commands, e-mail: dev-help@lenya.apache.org
>
>
>  
>


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lenya.apache.org
For additional commands, e-mail: dev-help@lenya.apache.org