You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Eugene Dzhurinsky <bo...@redwerk.com> on 2009/11/03 17:08:01 UTC

Re: adding and updating a lot of document to Solr, metadata extraction etc

On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
> About large XML files and http overhead: you can tell solr to load the
> file directly from a file system. This will stream thousands of
> documents in one XML file without loading everything in memory at
> once.
> 
> This is a new book on Solr. It will help you through this early learning phase.
> 
> http://www.packtpub.com/solr-1-4-enterprise-search-server

Thank you, but we have to prepare some proof of concept with the stable
version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.

Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
and looks like this way is preferred in my case.

I do have a lot of HTML pages on disk storage, and some metadata being stored
in SQL tables. What I seem to need is to provide some sort of EntityProcessor
and DataSource to DataImportHandler. Additionally I will need to provide some
sort of properties to instruct data source for data retrieval (table names
etc).

So may be there is some tutorial or how-to, describing the process of creation
of custom classes for importing the data into Solr 1.3.0?

Thank you in advance!

-- 
Eugene N Dzhurinsky

Re: adding and updating a lot of document to Solr, metadata extraction etc

Posted by Israel Ekpo <is...@gmail.com>.

On Tue, Nov 10, 2009 at 8:26 AM, Eugene Dzhurinsky <bo...@redwerk.com> wrote:

> On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
> > The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
> > much better off using the DIH from this.
> >
> > This is the current Solr release candidate binary:
> > http://people.apache.org/~gsingers/solr/1.4.0/<http://people.apache.org/%7Egsingers/solr/1.4.0/>
>
> In fact we are prohibited to use release candidates/nightly builds, we are
> forced to use only releases of Solr :(
>
> --
> Eugene N Dzhurinsky
>


Well, the official release is out and you can pick it up from your closest
mirror here

http://www.apache.org/dyn/closer.cgi/lucene/solr/


-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.

Re: adding and updating a lot of document to Solr, metadata extraction etc

Posted by Eugene Dzhurinsky <bo...@redwerk.com>.

On Tue, Nov 03, 2009 at 05:49:23PM -0800, Lance Norskog wrote:
> The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
> much better off using the DIH from this.
> 
> This is the current Solr release candidate binary:
> http://people.apache.org/~gsingers/solr/1.4.0/

In fact we are prohibited to use release candidates/nightly builds, we are
forced to use only releases of Solr :(

-- 
Eugene N Dzhurinsky

Re: adding and updating a lot of document to Solr, metadata extraction etc

Posted by Lance Norskog <go...@gmail.com>.

The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
much better off using the DIH from this.

This is the current Solr release candidate binary:
http://people.apache.org/~gsingers/solr/1.4.0/

On Tue, Nov 3, 2009 at 8:08 AM, Eugene Dzhurinsky <bo...@redwerk.com> wrote:
> On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
>> About large XML files and http overhead: you can tell solr to load the
>> file directly from a file system. This will stream thousands of
>> documents in one XML file without loading everything in memory at
>> once.
>>
>> This is a new book on Solr. It will help you through this early learning phase.
>>
>> http://www.packtpub.com/solr-1-4-enterprise-search-server
>
> Thank you, but we have to prepare some proof of concept with the stable
> version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.
>
> Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
> and looks like this way is preferred in my case.
>
> I do have a lot of HTML pages on disk storage, and some metadata being stored
> in SQL tables. What I seem to need is to provide some sort of EntityProcessor
> and DataSource to DataImportHandler. Additionally I will need to provide some
> sort of properties to instruct data source for data retrieval (table names
> etc).
>
> So may be there is some tutorial or how-to, describing the process of creation
> of custom classes for importing the data into Solr 1.3.0?
>
> Thank you in advance!
>
> --
> Eugene N Dzhurinsky
>



-- 
Lance Norskog
goksron@gmail.com