You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Thompson <gu...@yahoo.com> on 2010/07/09 20:17:59 UTC

PDF remote streaming extract with lots of multiValues

How would I go about setting a large number of literal values in a call to index 
a remote PDF?  I'm currently calling:

http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&stream.url=http://otherhost/some/file.pdf


And that works great, except now I'm coming across usecases where I need send in 
hundreds, up to thousands, of different values for 'mycategory'.  So with 
mycategory defined as a multiValued string, I can call:

 
http://host/solr/update/extract?literal.id=abc&literal.mycategory=blah&literal.mycategory=foo&literal.mycategory=bar&stream.url=http://otherhost/some/file.pdf


and that works as expected.  But when I try to embed thousands of 
literal.mycategory parameters in the call, eventually my container says 'look, 
I've been forgiving about letting you GET URLs far longer than 1500 characters, 
but this is ridiculous' and barfs on it.  


I've tried POSTing a <add><doc>...</doc></add> command, but it only pays 
attention to parameters in the URL query string, ignoring everything in the 
document.  I've seen some other threads that seem related, but now I'm just 
confused.  


What's the best way to tackle this?

-dKt