You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yong-gang Cao <ch...@gmail.com> on 2005/06/28 14:24:42 UTC

Why the index crashed,any clue?

I've tried to index large amount of web pages(about 6 million pages) 
And I encountered the exception as following after indexed 1.34 million records.

[java] 050625 121642 Processed 1340000 records (30.161636 rec/s) 
[java] java.io.FileNotFoundException:
D:\DynamicDisk\webdb\segments\20050620232113\index\_ykxc.prx (access
denied)

I tried to open the incomplete index using Luke,and Luke also reports
that _ykxc.prx is not found.
Why it is losed?
there was no manual interference during its indexing.
I've encountered this kind of issue more than one times. Is it bug of
lucene or nutch?
Any clue about this?
Thanks very much!
-- 
Best wishes to all diligent guys!

Be careful of your anti-virus software while indexing

Posted by Yong-gang Cao <ch...@gmail.com>.
after my carefully check,for the file _ykxc.prx, I don't think it is a virus. 
 it just was detected as virus and was quarantined. 
It was reported as Mad.5131 virus.
Although I can't find any detail information about this virus on internet.
In our cases, the *.prx file is just a term proximity data file. If it
was injected, it will fail to work, but it works.
aha, what a coincidence! It's time to think about the leak of pattern
based anti-virus software.
Be careful of your anti-virus software. It can mash your whole day work.

On 6/28/05, Yong-gang Cao <ch...@gmail.com> wrote:
> Sorry,I got it.
> It was deleted by anti-virus software.
> Damn virus!
> 
> On 6/28/05, Yong-gang Cao <ch...@gmail.com> wrote:
> > I've tried to index large amount of web pages(about 6 million pages)
> > And I encountered the exception as following after indexed 1.34 million records.
> >
> > [java] 050625 121642 Processed 1340000 records (30.161636 rec/s)
> > [java] java.io.FileNotFoundException:
> > D:\DynamicDisk\webdb\segments\20050620232113\index\_ykxc.prx (access
> > denied)
> >
> > I tried to open the incomplete index using Luke,and Luke also reports
> > that _ykxc.prx is not found.
> > Why it is losed?
> > there was no manual interference during its indexing.
> > I've encountered this kind of issue more than one times. Is it bug of
> > lucene or nutch?
> > Any clue about this?
> > Thanks very much!
> > --
> > Best wishes to all diligent guys!
> >
> 
> 
> --
> Best wishes to all diligent guys!
> 


--

Re: Why the index crashed,any clue?

Posted by Yong-gang Cao <ch...@gmail.com>.
Sorry,I got it.
It was deleted by anti-virus software.
Damn virus!

On 6/28/05, Yong-gang Cao <ch...@gmail.com> wrote:
> I've tried to index large amount of web pages(about 6 million pages)
> And I encountered the exception as following after indexed 1.34 million records.
> 
> [java] 050625 121642 Processed 1340000 records (30.161636 rec/s)
> [java] java.io.FileNotFoundException:
> D:\DynamicDisk\webdb\segments\20050620232113\index\_ykxc.prx (access
> denied)
> 
> I tried to open the incomplete index using Luke,and Luke also reports
> that _ykxc.prx is not found.
> Why it is losed?
> there was no manual interference during its indexing.
> I've encountered this kind of issue more than one times. Is it bug of
> lucene or nutch?
> Any clue about this?
> Thanks very much!
> --
> Best wishes to all diligent guys!
> 


-- 
Best wishes to all diligent guys!