You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by aicha BEN <ai...@yahoo.com> on 2006/08/29 16:26:36 UTC

problem with RTF parsing

Hi, 

I try to index rtf files and I don't succeed.....
I've got the rtf_parser_src.zip and I create the rtf-parser.jar and the parse-rtf.jar, this plugin is in the nutch-site.xml,
everything seems to be correct...... When I make a crawl there is no problem but when I try to search a word in the rtf file indexed I have no result ..... I don't understand where is the problem and why I have no result for my request......
 
Please could you help me, I don't find anything on this subject.....
Thank you. 
Aïcha

Re : problem with RTF parsing

Posted by Renaud Richardet <re...@wyona.com>.
Ben,

Building the jar as described in the README is not enough, you then need to activate the rtf plugin (uncomment the correct lines in src/plugin/build.xml) and rebuild Nutch.  

HTH,
Renaud



aicha BEN wrote:
> I build the plugin as it is explain in the README.txt file in the parse-rtf directory.
> The jar files are built....
>
>
> ----- Message d'origine ----
> De : Renaud Richardet <re...@wyona.com>
> À : nutch-user@lucene.apache.org
> Envoyé le : Mardi, 29 Août 2006, 8h24mn 35s
> Objet : Re: problem with RTF parsing
>
>
> Ben,
>
> Did you activate the plugin in src/plugin/build.xml? RTF parsing worked 
> for us.
>
> HTH,
> Renaud
>
>
> aicha BEN wrote:
>   
>> Hi, 
>>
>> I try to index rtf files and I don't succeed.....
>> I've got the rtf_parser_src.zip and I create the rtf-parser.jar and the parse-rtf.jar, this plugin is in the nutch-site.xml,
>> everything seems to be correct...... When I make a crawl there is no problem but when I try to search a word in the rtf file indexed I have no result ..... I don't understand where is the problem and why I have no result for my request......
>>  
>> Please could you help me, I don't find anything on this subject.....
>> Thank you. 
>> Aïcha
>>   
>>     
>
>   

-- 
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com


Re : problem with RTF parsing

Posted by aicha BEN <ai...@yahoo.com>.
I build the plugin as it is explain in the README.txt file in the parse-rtf directory.
The jar files are built....


----- Message d'origine ----
De : Renaud Richardet <re...@wyona.com>
À : nutch-user@lucene.apache.org
Envoyé le : Mardi, 29 Août 2006, 8h24mn 35s
Objet : Re: problem with RTF parsing


Ben,

Did you activate the plugin in src/plugin/build.xml? RTF parsing worked 
for us.

HTH,
Renaud


aicha BEN wrote:
> Hi, 
>
> I try to index rtf files and I don't succeed.....
> I've got the rtf_parser_src.zip and I create the rtf-parser.jar and the parse-rtf.jar, this plugin is in the nutch-site.xml,
> everything seems to be correct...... When I make a crawl there is no problem but when I try to search a word in the rtf file indexed I have no result ..... I don't understand where is the problem and why I have no result for my request......
>  
> Please could you help me, I don't find anything on this subject.....
> Thank you. 
> Aïcha
>   

-- 
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com

Re: problem with RTF parsing

Posted by Renaud Richardet <re...@wyona.com>.
Ben,

Did you activate the plugin in src/plugin/build.xml? RTF parsing worked 
for us.

HTH,
Renaud


aicha BEN wrote:
> Hi, 
>
> I try to index rtf files and I don't succeed.....
> I've got the rtf_parser_src.zip and I create the rtf-parser.jar and the parse-rtf.jar, this plugin is in the nutch-site.xml,
> everything seems to be correct...... When I make a crawl there is no problem but when I try to search a word in the rtf file indexed I have no result ..... I don't understand where is the problem and why I have no result for my request......
>  
> Please could you help me, I don't find anything on this subject.....
> Thank you. 
> Aïcha
>   

-- 
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com


problem with web site indexing

Posted by Aïcha <ai...@yahoo.com>.
Hi,
 
I try to index a web site with all the pages of the site,
but the only page I have in the index is the first page or the page of the URL I have put in the input file of the crawling.....
at the end I have only one page in the index.......
so do I have to do something to make it work?
 
Thanks in advance!
Aïcha