You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gaurav Sharma <ga...@gmail.com> on 2008/06/18 16:07:22 UTC
indexing unsupported mime types using Lucene
Hi,
I am using Lucene for indexing and searching the documents.
Its working file for supported documents. Now i want to index documents with
unsupported mime types.
Right now i am using LIUS which is built over Lucene for indexing the
documents.
Is there any tool which I can use for indexing the unsupported mime types.
Thanks in advance.
-Gaurav
-----
-Gaurav
--
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: too many clauses exception
Posted by Chris Lu <ch...@gmail.com>.
This is easy, use:
BooleanQuery.setMaxClauseCount(4096);
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!
On Thu, Jul 3, 2008 at 11:23 PM, Gaurav Sharma <ga...@gmail.com>
wrote:
>
>
> Hi,
>
> I am stuck with one more exception.
> When i am using a wild card such as a* i am getting too many clauses
> exception. It saying maximum clause count is set to 1024. Is there any way
> to increase this count.
> Can u please help me out in overcoming this.
>
> Thanks in advance.
> -Gaurav
>
>
>
> -----
> -Gaurav
> --
> View this message in context:
> http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
too many clauses exception
Posted by Gaurav Sharma <ga...@gmail.com>.
Hi,
I am stuck with one more exception.
When i am using a wild card such as a* i am getting too many clauses
exception. It saying maximum clause count is set to 1024. Is there any way
to increase this count.
Can u please help me out in overcoming this.
Thanks in advance.
-Gaurav
-----
-Gaurav
--
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: indexing unsupported mime types using Lucene
Posted by Gaurav Sharma <ga...@gmail.com>.
hi Otis
I haven't tried Tiks?
Is it open source?
had u heard about LIUS before or is it talked aroung industry?
And what about Solr. It seems you worked on Solr and Nutch.
Otis Gospodnetic wrote:
>
> Gaurav, have you tried Tika? (sub-project of Apache Lucene)
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
>> From: Gaurav Sharma <ga...@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, June 18, 2008 10:07:22 AM
>> Subject: indexing unsupported mime types using Lucene
>>
>>
>> Hi,
>> I am using Lucene for indexing and searching the documents.
>> Its working file for supported documents. Now i want to index documents
>> with
>> unsupported mime types.
>> Right now i am using LIUS which is built over Lucene for indexing the
>> documents.
>>
>> Is there any tool which I can use for indexing the unsupported mime
>> types.
>> Thanks in advance.
>> -Gaurav
>>
>>
>> -----
>> -Gaurav
>> --
>> View this message in context:
>> http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
-----
-Gaurav
--
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18023951.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: indexing unsupported mime types using Lucene
Posted by Steven A Rowe <sa...@syr.edu>.
Hi Gaurav,
To which mime types are you referring?
I can't think of a tool designed for this, but one thing you might try is checking whether the input is compressed/packed, and if so first decompressing/unpacking it, and then using the "strings" program (available on Linux and Cygwin) to extract string data.
Steve
On 06/18/2008 at 10:07 AM, Gaurav Sharma wrote:
>
> Hi, I am using Lucene for indexing and searching the documents. Its
> working file for supported documents. Now i want to index documents with
> unsupported mime types. Right now i am using LIUS which is built over
> Lucene for indexing the documents.
>
> Is there any tool which I can use for indexing the
> unsupported mime types.
> Thanks in advance.
> -Gaurav
>
>
> ----- -Gaurav -- View this message in context:
> http://www.nabble.com/indexing-unsupported-mime-types-using-Lu
> cene-tp17983491p17983491.html Sent from the Lucene - Java Users mailing
> list archive at Nabble.com.
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
> additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org