You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by "J.-Wolfgang Kaltz" <jw...@yahoo.com> on 2004/06/08 20:04:41 UTC

Search links after updating search index

Hi all,
I have been trying to use the search feature in Lenya (that is, the Lucene 
integration). First, congratulations to the developers, as it seems to be
basically working, though there remain a few (hopefully minor) issues.

I would like to know if anybody using the current version of Lenya in CVS
has managed to update the search index and have the search still working.

Here is what I am doing: I update the search index for a live publication 
(for instance, after publishing a new article) by using the
lenya/bin/crawl_and_index.xml file and by providing custom 
configuration files for crawling and indexing (see below).

After the update, I do a new search. But, the link URLs in the 
result list are wrong: the context is provided twice. With a little debugging 
of the XSL stylesheet, I see that the variable uri looks like
/lenya-2004-06-08/default/live/index.html
However, the other variables used to construct the link also contain this
context. For instance, contextprefix is /lenya-2004-06-08 and so on.
So the link URL contains the stuff twice, and the link does not work.

What is strange is that, if you simply check out the current version of Lenya
in CVS, you will find that the index is pre-created, and the website has been
pre-crawled. When the URLs in the result list are constructed, the variable uri 
does not contain the whole context, and thus the URLs are correct.

The only difference between an updated search index, and the one prefabricated 
in Lenya's CVS, is that when updating, the crawled files are placed in 
  htdocs_dump/live/lenya-2004-06-08/default/live/
which is the context of my publication being crawled,
whereas in the version in CVS, the pre-crawled file is directly in
  htdocs_dump/live/

So, my question is:
is anybody out there successfully updating the search index of a live 
publication (using the current version in CVS), and the URLs in the result 
list are still OK ?

Or, is there something I am missing in how the search engine should be 
configured for a publication ? I have tried many different options, but I am
not making progress:

here is my lucene.xconf
<lucene>
  <update-index type="new"/>
  <index-dir src="../../work/search/lucene/index/live/index"/>
  <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump/live"/>
  <indexer class="org.apache.lenya.lucene.index.DefaultIndexer"/>
</lucene>

here is my crawler.xconf (the carriage returns in the href are just for display
 here, not in the actual file) :

<crawler>
  <user-agent>lenya</user-agent>

  <base-url 
    href="http://kronos.informatik.uni-duisburg.de:8080/lenya-2004-06-08/
       default/live/index.html"/>
  <scope-url 
    href="http://kronos.informatik.uni-duisburg.de:8080/lenya-2004-06-08/
       default/live/"/>

  <uri-list src="../../work/search/lucene/uris.txt"/>
  <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump/live"/>

</crawler>

I would appreciate any help, because I am still unsure whether this is a 
configuration issue, or a bug.



---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Search links after updating search index

Posted by Michael Wechner <mi...@wyona.com>.
J.-Wolfgang Kaltz wrote:

>
>here is my lucene.xconf
><lucene>
>  <update-index type="new"/>
>  <index-dir src="../../work/search/lucene/index/live/index"/>
>  <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump/live"/>
>  <indexer class="org.apache.lenya.lucene.index.DefaultIndexer"/>
></lucene>
>  
>

try

<htdocs-dump-dir src="../../content/live"/>

and then you might have to modify the prefix within 
pubs/default/lenya/lucene.xmap

then you don't necessarily have to use the crawler


>
>I would appreciate any help, because I am still unsure whether this is a 
>configuration issue, or a bug.
>  
>
it's a configuration issue and not a bug

As soon as the code freeze is over I will try to fix these configurations
for 1.2.1 and add a button to the Admin area where administrators can 
start the indexing.

HTH

Michi

>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: lenya-user-help@cocoon.apache.org
>
>
>  
>


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Using lucene in Lenya

Posted by Michael Wechner <mi...@wyona.com>.
Luis zorita wrote:

>
>
> I set classpath=D:\lenyaprueba\build\lenya\webapp
> and certainly both classes are in the right tree:
> org.apache.lenya.lucene.IndexConfiguration in: 
> d:\lenyaprueba\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene 
> y 
> d:\lenyaprueba\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index 
>
>

well, actually the classpath is being set within the build file 
crawl_and_index.xml, which is working on Linux/Unix without any problems.

To be honest I have never tested it on Windows, but I guess it's just a 
path problem somewhere within the build file.

Michi

> I would appreciate any help
> Luis
>
>
>
>
>
>
>
>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: lenya-user-help@cocoon.apache.org
>
>


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Using lucene in Lenya

Posted by Luis zorita <lz...@pas.uned.es>.
Hi everybody:
My next step in using lenya is using the search engine in lenya1.2. Lucene.
I don´t know why but I have some troubles.
1.- I want search in my live area  using the search utility in the 
default publication.
I´m in a w2000 environment.
After reading the documentation and some mails in the users list :
ant -f d:\lenyaprueba\build\lenya\webapp\lenya\bin\crawl_and_index.xml 
-Dlucene.xconf=d:\lenyaprueba\build\lenya\webapp\lenya\pubs\default\config\search\lucene-live.xconf 
index

here my tree publication is:
d:\lenyaprueba\build\lenya\webapp\lenya\pubs\default

in search.properties:
webapp.dir=d:\lenyaprueba\build\lenya\wepapp
java.run=c:\j2sdk1.4.1_02\bin\java

in lucene-live.xconf:
<lucene>
  <update-index type="new"/>
<index-dir src="../../work/search/lucene/index/live/index"/>
<htdocs-dump-dir src="../../content/live"/>
  <indexer class="org.apache.lenya.lucene.index.DefaultIndexer"/>
</lucene>

.My first problem is:
runnig :

D:\lenyaprueba\build\lenya\webapp>ant -f 
d:\lenyaprueba\build\lenya\webapp\lenya\bin\crawl_and_index.xml 
-Dlucene.xconf=d:\lenyaprueba\build\lenya\webapp\lenya\pubs\default\config\search\lucene-live.xconf 
index
Buildfile: d:\lenyaprueba\build\lenya\webapp\lenya\bin\crawl_and_index.xml

init:
     [echo] INFO: Init

index:
     [echo] INFO: Index hypertext documents
     [echo] INFO: Show configuration
     [java] Could not find org.apache.lenya.lucene.IndexConfiguration . 
Make you have it in your classpath
     [echo] INFO: Create index ...
     [java] Could not find org.apache.lenya.lucene.index.Index . Make 
you have it in your classpath
     [echo] INFO: Index has been created

BUILD SUCCESSFUL
Total time: 2 seconds

I set classpath=D:\lenyaprueba\build\lenya\webapp
and certainly both classes are in the right tree:
org.apache.lenya.lucene.IndexConfiguration in: 
d:\lenyaprueba\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene 
y 
d:\lenyaprueba\build\lenya\webapp\WEB-INF\classes\org\apache\lenya\lucene\index

I would appreciate any help
Luis








>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org