You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jrusnak <jr...@live.unc.edu> on 2014/07/21 16:51:52 UTC

Edit Example Post.jar to read ALL file types

I am working with Solr 4.8.1 to set up an enterprise search system.

The file system I am working with has numerous files with unique extension
types (ex .20039 .20040 .20041 etc.)

I am using the post.jar file included in the binary download (src: 
SimplePostTool.java
<http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SimplePostTool.java> 
)to post these files to the solr server and would like to edit this jar file
to recognize /any/ file extension it comes across.

Is there a way to do this with the SimplePostTool.java source? I am right
now working to better understand the Filetype and DEFAULT_FILE_TYPE
variables as well as the mimeMap. It is these that currently allow me to
manually add file extensions.

I would however, like the tool to be able to read in files no matter what
they extension was and default their mime type to text/plain.



--
View this message in context: http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edit Example Post.jar to read ALL file types

Posted by jrusnak <jr...@live.unc.edu>.
I am copy-pasting the file extensions /from /the text document /into /the
source code, not /from /the source code. My typing mistake.



--
View this message in context: http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312p4148567.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edit Example Post.jar to read ALL file types

Posted by jrusnak <jr...@live.unc.edu>.
So by using the SimplePostTool I can define the application type and handling
of specific documents (Such as word, powerpoint, xml, png, etcetera). I have
defined these and they are handled based on their type. In my file system
however, I have a large number of files that can be read as plain text but
do not have the .txt extension due to the manner in which they were saved. I
would like them to read in a text/plain.

Since posting I have found a workaround - I am using a batch file to read
all the directory's file extensions into a text document and copy/pasting
the extensions from the SimplePostTool Source code. Though not ideal, it
does get the job done.

My thanks for the blog, I will look into it.



--
View this message in context: http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312p4148566.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edit Example Post.jar to read ALL file types

Posted by Erick Erickson <er...@gmail.com>.
So how do you expect these to be indexed? I mean what happens
if you run across a Word document? How about an mp3? Just
blasting all files up seems chancy. And doesn't just
'java -jar post.jar * ' do what you ask?

This seems like an XY problem, _why_ do you want
to do this? Because unless the files being sent to Solr are
properly formatted, they won't be ingested. There's some special
logic that handles XML file and expects the very precise Solr
format.... Solr would have no idea what to do with the
extensions in your example.

Perhaps a better approach would be to control the indexing
from a SolrJ client. Here's a blog if you want to follow
that approach.

Best,
Erick


On Mon, Jul 21, 2014 at 7:51 AM, jrusnak <jr...@live.unc.edu> wrote:

> I am working with Solr 4.8.1 to set up an enterprise search system.
>
> The file system I am working with has numerous files with unique extension
> types (ex .20039 .20040 .20041 etc.)
>
> I am using the post.jar file included in the binary download (src:
> SimplePostTool.java
> <
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/util/SimplePostTool.java
> >
> )to post these files to the solr server and would like to edit this jar
> file
> to recognize /any/ file extension it comes across.
>
> Is there a way to do this with the SimplePostTool.java source? I am right
> now working to better understand the Filetype and DEFAULT_FILE_TYPE
> variables as well as the mimeMap. It is these that currently allow me to
> manually add file extensions.
>
> I would however, like the tool to be able to read in files no matter what
> they extension was and default their mime type to text/plain.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Edit-Example-Post-jar-to-read-ALL-file-types-tp4148312.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>