You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Eric Pugh <ep...@opensourceconnections.com> on 2007/07/13 16:31:01 UTC

Rich Docs Indexing

Hi all,

I've been working with the RichDocumentRequestHandler (http:// 
issues.apache.org/jira/browse/SOLR-284)  for the past weeks, and it  
seems to be working quite well.  We discovered that when we throw a  
27 MB PDF document at it we needed to beef up the Java Heap size, and  
we haven't come up with a great solution for handling PDF documents  
that have a password on them, beyond not indexing them.

I wanted to see if I could get some momentum going on seeing if this  
is something that the committers want in Solr 1.3...   I'd like to  
write up a wiki page similar to http://wiki.apache.org/solr/UpdateCSV  
page that would give folks a chance to see what this code can do, but  
highlight that it is a wiki page about just a patch file?  Would this  
be okay, or misleading to folks?

I've updated the patch to revision 555996.

Thanks for your consideration!   PS, is anyone going to be at OSCON  
in two weeks?  I'd love to meet up with some other Solr folks.

Eric

-------------------------------------------------------
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467





Re: Rich Docs Indexing

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 13, 2007, at 10:31 AM, Eric Pugh wrote:
> I wanted to see if I could get some momentum going on seeing if  
> this is something that the committers want in Solr 1.3...   I'd  
> like to write up a wiki page similar to http://wiki.apache.org/solr/ 
> UpdateCSV page that would give folks a chance to see what this code  
> can do, but highlight that it is a wiki page about just a patch  
> file?  Would this be okay, or misleading to folks?

Eric - kudos!  Thanks for this contribution and effort to document  
it.  There is already precedent here - the Field Collapsing  
contribution has worked thus far too:

	<http://wiki.apache.org/solr/FieldCollapsing>

So go for it!

	Erik, who will one day look at this contribution, but not for a few  
weeks, sorry


Re: Rich Docs Indexing

Posted by Yonik Seeley <yo...@apache.org>.
On 7/13/07, Eric Pugh <ep...@opensourceconnections.com> wrote:
>  I'd like to write up a wiki page that would give folks a chance to see what this code
> can do, but highlight that it is a wiki page about just a patch file?

That's fine.  And if you don't link it to the main page, there
shouldn't be any confusion.

-Yonik