You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rishabh Bajpai <r_...@lycos.com> on 2003/06/14 06:57:24 UTC

Strange problem while indexing?

i am using lucene to index xml+html files. the xml contains the metadata associated with the html file.

the process, at a high level, is: 
-create a list of all xml files in a folder
-parse through each of the xml file using SAX parser
-create name:value pairs out of the tags and values, and index them
-one of the tag contains the url to the html page
-when you encounter that, parse the html file

when i do this for a few files, it seems to work fine. however, as the number of files increase, it starts to throw an error!
initially, i get a "SAXException: Content is not allowed in trailing section." - but i checked and the xml file seems to be well-formed! i even tried indexing this file individually, and it worked!
then i get "Index locked for write: Lock@/export/home.../write.lock"
at times, i also get a "Timed out waiting for: Lock@/export/home/.../commit.lock"

as a result of this, the index doesnt get updates and the results are incorrect. i also observed once that while the index is being built, i get the results, but when it exits, i stop getting results. possibly, my hunch is that index updation didnot get commited?

what is particularly intersting to note is that this problem occurs at only some times. another observation is that it worked fine for around 50 files, but not for about 100 files?

can anyone help me - or give pointers as to what is going on here?

-rishabh


 


____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org