You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/03/02 05:19:07 UTC

Re: Multi-threaded post.jar?

Hi,

Sure, lots of things could be done with creative curl usage.... but there
is still something to be said about having an ecosystem of nice devops
friendly tools...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Feb 27, 2013 at 8:01 AM, Upayavira <uv...@odoko.co.uk> wrote:

> I took the cheap and cheerful approach, and created another class that
> wraps SimplePostTool. It makes lots of assumptions, such as that the
> shell will already have expanded any globs/wildcards, and just assigns
> various arguments to the various threads. It is good enough for what I
> need.
>
> The idea of a shell is an interesting one. But is there stuff we
> couldn't achieve without creative use of 'curl'?
>
> Upayavira
>
> On Tue, Feb 26, 2013, at 04:34 AM, Otis Gospodnetic wrote:
> > Upayavira, ever did this?
> >
> > Ha, look at my email from 20 days ago and this:
> > https://github.com/javanna/elasticshell
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Wed, Feb 6, 2013 at 2:38 PM, Otis Gospodnetic
> > <otis.gospodnetic@gmail.com
> > > wrote:
> >
> > > Btw wouldn't this be a chance to create a solr cli tool, much like
> > > es2unix?  Maybe with a shell? I'm off-line now, but I recently came
> across
> > > a java lib that makes this easy... jclam jsomething ...
> > >
> > > Otis
> > > Solr & ElasticSearch Support
> > > http://sematext.com/
> > > On Feb 6, 2013 8:48 AM, "Jan Høydahl" <ja...@cominvent.com> wrote:
> > >
> > >> With dependencies I meant external jar dependencies. Perhaps
> extensions
> > >> could have deps while leaving the "core" compilable without?
> > >>
> > >> --
> > >> Jan Høydahl, search solution architect
> > >> Cominvent AS - www.cominvent.com
> > >> Solr Training - www.solrtraining.com
> > >>
> > >> 5. feb. 2013 kl. 17:10 skrev Upayavira <uv...@odoko.co.uk>:
> > >>
> > >> > By dependencies, do you mean other java classes? I was thinking of
> > >> > splitting it out into a few classes, each of which is clearer in its
> > >> > purpose.
> > >> >
> > >> > Upayavira
> > >> >
> > >> > On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote:
> > >> >> Wiki page exists already: http://wiki.apache.org/solr/post.jar
> > >> >>
> > >> >> I'm happy to consider a refactoring, especially if it make it
> SIMPLER
> > >> to
> > >> >> read and interact with and doesn't add a ton of mandatory
> dependencies.
> > >> >> It should probably still be possible to say something like
> > >> >>
> > >> >>  javac org/apache/solr/util/SimplePostTool.java
> > >> >>  java -cp . org.apache.solr.util.SimplePostTool -h
> > >> >>
> > >> >> That's just how I've been thinking so far though. If other
> committers
> > >> are
> > >> >> happy with abandoning the simple-ness and instead create a
> > >> best-practices
> > >> >> based feature-rich tool with dependencies, then I'll not object.
> > >> >>
> > >> >> --
> > >> >> Jan Høydahl, search solution architect
> > >> >> Cominvent AS - www.cominvent.com
> > >> >> Solr Training - www.solrtraining.com
> > >> >>
> > >> >> 5. feb. 2013 kl. 05:22 skrev Upayavira <uv...@odoko.co.uk>:
> > >> >>
> > >> >>> Thx Jan,
> > >> >>>
> > >> >>> All I know is I've got a data set of 500k documents, Solr
> formatted,
> > >> and
> > >> >>> I want it to be as easy as possible to get them into Solr. I also
> want
> > >> >>> to be able to show the benefit of multithreading. The outcome
> would
> > >> >>> really be "make sure your code uses multiple threads to push to
> Solr"
> > >> >>> rather than "use post.jar in production". I see post.jar as a
> > >> >>> demonstration tool, rather than anything else, and am considering
> > >> adding
> > >> >>> another feature to enhance that.
> > >> >>>
> > >> >>> However, I did stall once I started looking at the
> SimplePostTool.jar
> > >> >>> class, because it is loosing its connection with the term
> 'Simple'.
> > >> >>> Adding multithreading, however useful, correct, whatever, would
> > >> >>> completely push it over the edge. Thus, I think the proper
> approach is
> > >> >>> to refactor the tool into a number of classes, and only then think
> > >> about
> > >> >>> adding multithreading as a completely separate affair. I'm more
> than
> > >> >>> happy to have a go at that refactoring, especially if you're
> prepared
> > >> to
> > >> >>> review it.
> > >> >>>
> > >> >>> I guess the other thing that is much needed is a wiki page that
> > >> details
> > >> >>> the features of the tool, and also explains that its role is
> > >> >>> educational, rather than anything else.
> > >> >>>
> > >> >>> Upayavira
> > >> >>>
> > >> >>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
> > >> >>>> Hi,
> > >> >>>>
> > >> >>>> Hmm, the tool is getting bloated for a one-class no-deps tool
> > >> already :)
> > >> >>>> Guess it would be useful too with real-life code examples using
> > >> SolrJ and
> > >> >>>> other libs as well (such as robots.txt lib, commons-cli etc), but
> > >> whether
> > >> >>>> that should be an extension of SimplePostTool or a totally new
> tool
> > >> from
> > >> >>>> scratch is something to discuss. Please bring on your ideas of
> how
> > >> you
> > >> >>>> plan to extend it, perhaps even simplifying the code in the
> process?
> > >> >>>>
> > >> >>>> --
> > >> >>>> Jan Høydahl, search solution architect
> > >> >>>> Cominvent AS - www.cominvent.com
> > >> >>>> Solr Training - www.solrtraining.com
> > >> >>>>
> > >> >>>> 3. feb. 2013 kl. 17:19 skrev Upayavira <uv...@odoko.co.uk>:
> > >> >>>>
> > >> >>>>> I have a scenario in which I need to post 500,000 documents to
> Solr
> > >> as a
> > >> >>>>> test. I have these documents in XML files already formatted in
> > >> Solr's
> > >> >>>>> xml format.
> > >> >>>>>
> > >> >>>>> Posting to Solr using post.jar it takes 1m55s. With a bit of
> bash
> > >> >>>>> jiggery-pokery, I was able to get this down to 1m08s by running
> four
> > >> >>>>> concurrent post.jar instances, which strikes me as a significant
> > >> >>>>> improvement.
> > >> >>>>>
> > >> >>>>> I'm considering adding multithreaded capabilities to post.jar,
> but
> > >> >>>>> before I go to that effort, I wanted to see if anyone else would
> > >> >>>>> consider it a useful feature. Given that the SimplePostTool is
> > >> becoming
> > >> >>>>> far from simple, I wanted to see whether the feature is likely
> to be
> > >> >>>>> accepted before I put in the effort. Also, I would need to
> consider
> > >> >>>>> which parts of the tool to add that to. Currently I only want
> it for
> > >> >>>>> posting XML docs, but there's also crawling capabilities in it
> too.
> > >> >>>>>
> > >> >>>>> Thoughts?
> > >> >>>>>
> > >> >>>>> Upayavira
> > >> >>>>
> > >> >>
> > >>
> > >>
>