You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Thompson <gu...@yahoo.com> on 2010/07/09 20:17:59 UTC
PDF remote streaming extract with lots of multiValues
How would I go about setting a large number of literal values in a call to index
a remote PDF? I'm currently calling:
http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf
And that works great, except now I'm coming across usecases where I need send in
hundreds, up to thousands, of different values for 'mycategory'. So with
mycategory defined as a multiValued string, I can call:
http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf
and that works as expected. But when I try to embed thousands of
literal.mycategory parameters in the call, eventually my container says 'look,
I've been forgiving about letting you GET URLs far longer than 1500 characters,
but this is ridiculous' and barfs on it.
I've tried POSTing a <add><doc>...</doc></add> command, but it only pays
attention to parameters in the URL query string, ignoring everything in the
document. I've seen some other threads that seem related, but now I'm just
confused.
What's the best way to tackle this?
-dKt