You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Grant Ingersoll <gr...@yahoo.com> on 2006/03/05 15:18:58 UTC

Index Builder

What/where is the Index Builder that is referred to in  http://wiki.apache.org/solr/CollectionBuilding?  I can see how to do a bunch of adds over HTTP but this seems like it is less than optimal when I have all the files on the same machine and am trying to bootstrap the system.  What's the best way to do this?

Thanks,
Grant


----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
Relax. Yahoo! Mail virus scanning helps detect nasty viruses!

Re: Index Builder

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/5/06, Grant Ingersoll <gr...@yahoo.com> wrote:
> So, I was thinking I could write a driver program that takes in my files and then calls the API directly.  Is this doable?

It's doable...
While it will be more efficient, it's not clear how much you will
gain, esp if you run with multiple CPUs (IndexWriting is highly
synchronized).

Check out the UpdateHandler abstract class:
  public abstract int addDoc(AddUpdateCommand cmd) throws IOException;
  public abstract void delete(DeleteUpdateCommand cmd) throws IOException;
  public abstract void deleteByQuery(DeleteUpdateCommand cmd) throws
IOException;
  public abstract void commit(CommitUpdateCommand cmd) throws IOException;
  public abstract void close() throws IOException;

While the implementation of the UpdateHandler is pluggable, there
isn't a place to plug in different client handlers (like there is with
RequestHandler).  You could create another servlet in the same webapp
and get the current UpdateHandler (SolrCore.updateHandler) and use
that to update the index.

Seems like there isn't a getter for SolrCore.updateHandler... feel
free to sumbit a patch if you want to go this route.

You could even drop down to a lower level and use DocumentBuilder to
create your own Lucene Document instances and write them with an
IndexWriter yourself.

-Yonik


>  Do you do it all through HTTP requests or through a driver that calls the API?
> I think I would prefer the API calls for bulk loading.  Where should I look for these?
>
> -Grant
>
> Yonik Seeley <ys...@gmail.com> wrote: On 3/5/06, Grant Ingersoll  wrote:
> > What/where is the Index Builder that is referred to in  http://wiki.apache.org/solr/CollectionBuilding?
>
> It's currently client-supplied (i.e. there isn't one).
>
> Having all Solr users have to write their own builders (code that gets
> data from a source and posts XML documents) certainly isn't optimal.
>
> It would be nice if we could give Solr a database URL with some SQL,
> and have it automatically slurp and index the records.  It would also
> be nice to be able to grab documents from a CSV or other simple
> structured text file and index them.
>
> These ideas are on already on the task list on the (currently down) Wiki.
>
> -Yonik

Re: Index Builder

Posted by Chris Hostetter <ho...@fucit.org>.
: I had a feeling that was the case.  So, I was thinking I could write a
: driver program that takes in my files and then calls the API directly.
: Is this doable?  How do you guys do it on your live site?  Do you do it
: all through HTTP requests or through a driver that calls the API?  I
: think I would prefer the API calls for bulk loading.  Where should I
: look for these?

Once upon a time, I agrued for having an robust update API, and a way to
write "updater plugins" that would run within the Solr JVM ... and I was
talked out of it in favor doing everything over HTTP.  So yeah ... that's
what I do: build/update entirely over HTTP.

>From what i remember of the internal update API, you could probably write
a new subclass of UpdateHandler that you register in the solrconfig.xml
which pulled most of the data from wherever you want -- but it would still
need to be triggered by (minimal) "<add>" messages over HTTP.

alternately, you could write your own Servlet with load-on-startup="true"
that used the internal update methods directly.




-Hoss


Re: Index Builder

Posted by Grant Ingersoll <gr...@yahoo.com>.
I had a feeling that was the case.  So, I was thinking I could write a driver program that takes in my files and then calls the API directly.  Is this doable?  How do you guys do it on your live site?  Do you do it all through HTTP requests or through a driver that calls the API?  I think I would prefer the API calls for bulk loading.  Where should I look for these?

-Grant

Yonik Seeley <ys...@gmail.com> wrote: On 3/5/06, Grant Ingersoll  wrote:
> What/where is the Index Builder that is referred to in  http://wiki.apache.org/solr/CollectionBuilding?

It's currently client-supplied (i.e. there isn't one).

Having all Solr users have to write their own builders (code that gets
data from a source and posts XML documents) certainly isn't optimal.

It would be nice if we could give Solr a database URL with some SQL,
and have it automatically slurp and index the records.  It would also
be nice to be able to grab documents from a CSV or other simple
structured text file and index them.

These ideas are on already on the task list on the (currently down) Wiki.

-Yonik



----------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com
		
---------------------------------
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.

Re: Index Builder

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/5/06, Grant Ingersoll <gr...@yahoo.com> wrote:
> What/where is the Index Builder that is referred to in  http://wiki.apache.org/solr/CollectionBuilding?

It's currently client-supplied (i.e. there isn't one).

Having all Solr users have to write their own builders (code that gets
data from a source and posts XML documents) certainly isn't optimal.

It would be nice if we could give Solr a database URL with some SQL,
and have it automatically slurp and index the records.  It would also
be nice to be able to grab documents from a CSV or other simple
structured text file and index them.

These ideas are on already on the task list on the (currently down) Wiki.

-Yonik