You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/26 10:48:30 UTC

Solr Indexing Rich Documents

I have a large corpus of rich documents i.e. pdf and doc files. I think
that I can use directly the example jar of Solr. However for a real time
environment what should I care? Also how do you send such kind of documents
into Solr to index, I think post.jar does not handle that file type?  I
should mention that I don't store documents in a database.

Re: Solr Indexing Rich Documents

Posted by Ahmet Arslan <io...@yahoo.com>.

Here is the documentation page : http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#filesystemrepository


----- Original Message -----
From: Furkan KAMACI <fu...@gmail.com>
To: solr-user@lucene.apache.org; Ahmet Arslan <io...@yahoo.com>
Cc: 
Sent: Saturday, April 27, 2013 2:48 PM
Subject: Re: Solr Indexing Rich Documents

Yes, file system

2013/4/27 Ahmet Arslan <io...@yahoo.com>

> hi,
>
> Where do you store your rich documents? File system?
>
>
>
>
> ----- Original Message -----
> From: Furkan KAMACI <fu...@gmail.com>
> To: solr-user@lucene.apache.org
> Cc:
> Sent: Friday, April 26, 2013 6:19 PM
> Subject: Re: Solr Indexing Rich Documents
>
> Is there any example at wiki for Manifold?
>
> 2013/4/26 Ahmet Arslan <io...@yahoo.com>
>
> > Hi Furkan,
> >
> > post.jar meant to be used as example, quick start etc. For production
> > (incremental updates, deletes) consider using
> http://manifoldcf.apache.orgfor indexing rich documents. It utilises
> ExtractingRequestHandler feature
> > of solr.
> >
> > --- On Fri, 4/26/13, Furkan KAMACI <fu...@gmail.com> wrote:
> >
> > > From: Furkan KAMACI <fu...@gmail.com>
> > > Subject: Re: Solr Indexing Rich Documents
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, April 26, 2013, 3:39 PM
> > > Thanks for the answer, I get an error
> > > now: FileNotFound Exception as I
> > > mentioned at other thread. Now I' trying to solve it.
> > >
> > > 2013/4/26 Jack Krupansky <ja...@basetechnology.com>
> > >
> > > > It's called SolrCell or the ExtractingRequestHandler
> > > (/update/extract),
> > > > which the newer post.jar knows to use for some file
> > > types:
> > > > http://wiki.apache.org/solr/ExtractingRequestHandler
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -----Original Message----- From: Furkan KAMACI
> > > > Sent: Friday, April 26, 2013 4:48 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Solr Indexing Rich Documents
> > > >
> > > >
> > > > I have a large corpus of rich documents i.e. pdf and
> > > doc files. I think
> > > > that I can use directly the example jar of Solr.
> > > However for a real time
> > > > environment what should I care? Also how do you send
> > > such kind of documents
> > > > into Solr to index, I think post.jar does not handle
> > > that file type?  I
> > > > should mention that I don't store documents in a
> > > database.
> > > >
> > >
> >
>
>

Re: Solr Indexing Rich Documents

Posted by Furkan KAMACI <fu...@gmail.com>.

Yes, file system

2013/4/27 Ahmet Arslan <io...@yahoo.com>

> hi,
>
> Where do you store your rich documents? File system?
>
>
>
>
> ----- Original Message -----
> From: Furkan KAMACI <fu...@gmail.com>
> To: solr-user@lucene.apache.org
> Cc:
> Sent: Friday, April 26, 2013 6:19 PM
> Subject: Re: Solr Indexing Rich Documents
>
> Is there any example at wiki for Manifold?
>
> 2013/4/26 Ahmet Arslan <io...@yahoo.com>
>
> > Hi Furkan,
> >
> > post.jar meant to be used as example, quick start etc. For production
> > (incremental updates, deletes) consider using
> http://manifoldcf.apache.orgfor indexing rich documents. It utilises
> ExtractingRequestHandler feature
> > of solr.
> >
> > --- On Fri, 4/26/13, Furkan KAMACI <fu...@gmail.com> wrote:
> >
> > > From: Furkan KAMACI <fu...@gmail.com>
> > > Subject: Re: Solr Indexing Rich Documents
> > > To: solr-user@lucene.apache.org
> > > Date: Friday, April 26, 2013, 3:39 PM
> > > Thanks for the answer, I get an error
> > > now: FileNotFound Exception as I
> > > mentioned at other thread. Now I' trying to solve it.
> > >
> > > 2013/4/26 Jack Krupansky <ja...@basetechnology.com>
> > >
> > > > It's called SolrCell or the ExtractingRequestHandler
> > > (/update/extract),
> > > > which the newer post.jar knows to use for some file
> > > types:
> > > > http://wiki.apache.org/solr/ExtractingRequestHandler
> > > >
> > > > -- Jack Krupansky
> > > >
> > > > -----Original Message----- From: Furkan KAMACI
> > > > Sent: Friday, April 26, 2013 4:48 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Solr Indexing Rich Documents
> > > >
> > > >
> > > > I have a large corpus of rich documents i.e. pdf and
> > > doc files. I think
> > > > that I can use directly the example jar of Solr.
> > > However for a real time
> > > > environment what should I care? Also how do you send
> > > such kind of documents
> > > > into Solr to index, I think post.jar does not handle
> > > that file type?  I
> > > > should mention that I don't store documents in a
> > > database.
> > > >
> > >
> >
>
>

Re: Solr Indexing Rich Documents

Posted by Ahmet Arslan <io...@yahoo.com>.

hi,

Where do you store your rich documents? File system?




----- Original Message -----
From: Furkan KAMACI <fu...@gmail.com>
To: solr-user@lucene.apache.org
Cc: 
Sent: Friday, April 26, 2013 6:19 PM
Subject: Re: Solr Indexing Rich Documents

Is there any example at wiki for Manifold?

2013/4/26 Ahmet Arslan <io...@yahoo.com>

> Hi Furkan,
>
> post.jar meant to be used as example, quick start etc. For production
> (incremental updates, deletes) consider using http://manifoldcf.apache.orgfor indexing rich documents. It utilises ExtractingRequestHandler feature
> of solr.
>
> --- On Fri, 4/26/13, Furkan KAMACI <fu...@gmail.com> wrote:
>
> > From: Furkan KAMACI <fu...@gmail.com>
> > Subject: Re: Solr Indexing Rich Documents
> > To: solr-user@lucene.apache.org
> > Date: Friday, April 26, 2013, 3:39 PM
> > Thanks for the answer, I get an error
> > now: FileNotFound Exception as I
> > mentioned at other thread. Now I' trying to solve it.
> >
> > 2013/4/26 Jack Krupansky <ja...@basetechnology.com>
> >
> > > It's called SolrCell or the ExtractingRequestHandler
> > (/update/extract),
> > > which the newer post.jar knows to use for some file
> > types:
> > > http://wiki.apache.org/solr/ExtractingRequestHandler
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Furkan KAMACI
> > > Sent: Friday, April 26, 2013 4:48 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Solr Indexing Rich Documents
> > >
> > >
> > > I have a large corpus of rich documents i.e. pdf and
> > doc files. I think
> > > that I can use directly the example jar of Solr.
> > However for a real time
> > > environment what should I care? Also how do you send
> > such kind of documents
> > > into Solr to index, I think post.jar does not handle
> > that file type?  I
> > > should mention that I don't store documents in a
> > database.
> > >
> >
>

Re: Solr Indexing Rich Documents

Posted by Furkan KAMACI <fu...@gmail.com>.

Is there any example at wiki for Manifold?

2013/4/26 Ahmet Arslan <io...@yahoo.com>

> Hi Furkan,
>
> post.jar meant to be used as example, quick start etc. For production
> (incremental updates, deletes) consider using http://manifoldcf.apache.orgfor indexing rich documents. It utilises ExtractingRequestHandler feature
> of solr.
>
> --- On Fri, 4/26/13, Furkan KAMACI <fu...@gmail.com> wrote:
>
> > From: Furkan KAMACI <fu...@gmail.com>
> > Subject: Re: Solr Indexing Rich Documents
> > To: solr-user@lucene.apache.org
> > Date: Friday, April 26, 2013, 3:39 PM
> > Thanks for the answer, I get an error
> > now: FileNotFound Exception as I
> > mentioned at other thread. Now I' trying to solve it.
> >
> > 2013/4/26 Jack Krupansky <ja...@basetechnology.com>
> >
> > > It's called SolrCell or the ExtractingRequestHandler
> > (/update/extract),
> > > which the newer post.jar knows to use for some file
> > types:
> > > http://wiki.apache.org/solr/ExtractingRequestHandler
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Furkan KAMACI
> > > Sent: Friday, April 26, 2013 4:48 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Solr Indexing Rich Documents
> > >
> > >
> > > I have a large corpus of rich documents i.e. pdf and
> > doc files. I think
> > > that I can use directly the example jar of Solr.
> > However for a real time
> > > environment what should I care? Also how do you send
> > such kind of documents
> > > into Solr to index, I think post.jar does not handle
> > that file type?  I
> > > should mention that I don't store documents in a
> > database.
> > >
> >
>

Re: Solr Indexing Rich Documents

Posted by Ahmet Arslan <io...@yahoo.com>.

Hi Furkan,

post.jar meant to be used as example, quick start etc. For production (incremental updates, deletes) consider using http://manifoldcf.apache.org for indexing rich documents. It utilises ExtractingRequestHandler feature of solr. 

--- On Fri, 4/26/13, Furkan KAMACI <fu...@gmail.com> wrote:

> From: Furkan KAMACI <fu...@gmail.com>
> Subject: Re: Solr Indexing Rich Documents
> To: solr-user@lucene.apache.org
> Date: Friday, April 26, 2013, 3:39 PM
> Thanks for the answer, I get an error
> now: FileNotFound Exception as I
> mentioned at other thread. Now I' trying to solve it.
> 
> 2013/4/26 Jack Krupansky <ja...@basetechnology.com>
> 
> > It's called SolrCell or the ExtractingRequestHandler
> (/update/extract),
> > which the newer post.jar knows to use for some file
> types:
> > http://wiki.apache.org/solr/ExtractingRequestHandler
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Furkan KAMACI
> > Sent: Friday, April 26, 2013 4:48 AM
> > To: solr-user@lucene.apache.org
> > Subject: Solr Indexing Rich Documents
> >
> >
> > I have a large corpus of rich documents i.e. pdf and
> doc files. I think
> > that I can use directly the example jar of Solr.
> However for a real time
> > environment what should I care? Also how do you send
> such kind of documents
> > into Solr to index, I think post.jar does not handle
> that file type?  I
> > should mention that I don't store documents in a
> database.
> >
>

Re: Solr Indexing Rich Documents

Posted by Furkan KAMACI <fu...@gmail.com>.

Thanks for the answer, I get an error now: FileNotFound Exception as I
mentioned at other thread. Now I' trying to solve it.

2013/4/26 Jack Krupansky <ja...@basetechnology.com>

> It's called SolrCell or the ExtractingRequestHandler (/update/extract),
> which the newer post.jar knows to use for some file types:
> http://wiki.apache.org/solr/ExtractingRequestHandler
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI
> Sent: Friday, April 26, 2013 4:48 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Indexing Rich Documents
>
>
> I have a large corpus of rich documents i.e. pdf and doc files. I think
> that I can use directly the example jar of Solr. However for a real time
> environment what should I care? Also how do you send such kind of documents
> into Solr to index, I think post.jar does not handle that file type?  I
> should mention that I don't store documents in a database.
>

Re: Solr Indexing Rich Documents

Posted by Jack Krupansky <ja...@basetechnology.com>.

It's called SolrCell or the ExtractingRequestHandler (/update/extract), 
which the newer post.jar knows to use for some file types:
http://wiki.apache.org/solr/ExtractingRequestHandler

-- Jack Krupansky

-----Original Message----- 
From: Furkan KAMACI
Sent: Friday, April 26, 2013 4:48 AM
To: solr-user@lucene.apache.org
Subject: Solr Indexing Rich Documents

I have a large corpus of rich documents i.e. pdf and doc files. I think
that I can use directly the example jar of Solr. However for a real time
environment what should I care? Also how do you send such kind of documents
into Solr to index, I think post.jar does not handle that file type?  I
should mention that I don't store documents in a database.