You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by satya swaroop <ss...@gmail.com> on 2010/07/13 14:11:56 UTC

indexing rich documents

Hi all,
         i am new to solr and followed with the wiki and got the solr admin
run sucessfully. It is good going for xml files. But to index the rich
documents i am unable to get it. I followed wiki to make the richer
documents also,  but i didnt get it.The error comes when i send an pdf/html
file is a lazy error. can anyone give some detail description about how to
make richer documents indexable....
 i use tomcat and working in ubuntu. The home directory for solr is
/opt/solr/example and catalina home is /opt/tomcat6.


thanks & regards,
 swaroop

indexing rich documents

Posted by satya swaroop <ss...@gmail.com>.
Hi all,
         i am new to solr and followed with the wiki and got the solr admin
run sucessfully. It is good going for xml files. But to index the rich
documents i am unable to get it. I followed wiki to make the richer
documents also,  but i didnt get it.The error comes when i send an pdf/html
file is a lazy error. can anyone give some detail description about how to
make richer documents indexable....
 i use tomcat and working in ubuntu. The home directory for solr is
/opt/solr/example and catalina home is /opt/tomcat6.


thanks & regards,
 swaroop

Re: indexing rich documents

Posted by Nikola Garafolic <ni...@srce.hr>.
On 07/13/2010 02:11 PM, satya swaroop wrote:
> Hi all,
>           i am new to solr and followed with the wiki and got the solr admin
> run sucessfully. It is good going for xml files. But to index the rich
> documents i am unable to get it. I followed wiki to make the richer
> documents also,  but i didnt get it.The error comes when i send an pdf/html
> file is a lazy error. can anyone give some detail description about how to
> make richer documents indexable....
>   i use tomcat and working in ubuntu. The home directory for solr is
> /opt/solr/example and catalina home is /opt/tomcat6.
>
>
> thanks&  regards,
>   swaroop
>

I also have exact problem, but my enviroment is different.
I use Jboss AS 5.1.0 GA with HornetQ 2.0.0 and solr 1.3.0 patched to 
support indexing rich text documents.
I copied example/solr directory to conf directory on Jboss, and solr.war 
to deploy directory on Jboss. Everything seem to work except indexing 
rich text documents. I am using default schema.xml that is included in 
example/solr/conf directory.
I use all that for gss ( http://code.google.com/p/gss/ ).
Is there some generic schema.xml file that should work out of the box?
Guys from gss send me some other schema.xml file, but I get "undefined 
field text" error in log. With default schema.xml file (that came with 
solr) I get "undefined field 'body'".

Attached is file I got from guys at gss project, that is also not 
working for me.

Regards,
Nikola

-- 
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafolic@srce.hr

Re: indexing rich documents

Posted by Lance Norskog <go...@gmail.com>.
The libraries are searched in the solr/llib directory, not solr home.
If using multicore, solr/core/lib.

These are searched automatically. You can also tell Solr to search in
other directories with the <lib> directive in solrconfig.xml.

On Tue, Jul 13, 2010 at 11:48 PM, satya swaroop <ss...@gmail.com> wrote:
>
> here i attach u my solrconfig , tika config, schema files... if der r any
> wrong tell me
>



-- 
Lance Norskog
goksron@gmail.com

Re: indexing rich documents

Posted by satya swaroop <ss...@gmail.com>.
here i attach u my solrconfig , tika config, schema files... if der r any
wrong tell me

Re: indexing rich documents

Posted by satya swaroop <ss...@gmail.com>.
ya i checked the extraction request handler but couldnt get the
info........... i installed tika-0.7 and copied the jar files into the solr
home library...... i started sending the pdf/html files then i get a lazy
error..... i am using tomcat and solr 1.4

Re: indexing rich documents

Posted by satya swaroop <ss...@gmail.com>.
hi,
yes i followed the wiki and can now tell me the procedure for it
  regards,
   swaroop

Re: indexing rich documents

Posted by Markus Jelsma <ma...@buyways.nl>.
Hi,

Are you sure you followed the wiki [1] on this subject? There is an example 
there but you need Solr 1.4.0 or higher. I unsure if just patching 1.3.0 will 
really do the trick. The patch must then also include Apache Tika, which sits 
under the hood, extracting content and meta data from various formats.

[1]: http://wiki.apache.org/solr/ExtractingRequestHandler

Cheers,

On Tuesday 13 July 2010 14:11:56 satya swaroop wrote:
> Hi all,
>          i am new to solr and followed with the wiki and got the solr admin
> run sucessfully. It is good going for xml files. But to index the rich
> documents i am unable to get it. I followed wiki to make the richer
> documents also,  but i didnt get it.The error comes when i send an pdf/html
> file is a lazy error. can anyone give some detail description about how to
> make richer documents indexable....
>  i use tomcat and working in ubuntu. The home directory for solr is
> /opt/solr/example and catalina home is /opt/tomcat6.
> 
> 
> thanks & regards,
>  swaroop
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350