You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gaurav Sharma <ga...@gmail.com> on 2008/06/18 16:07:22 UTC

indexing unsupported mime types using Lucene

Hi,
I am using Lucene for indexing and searching the documents.
Its working file for supported documents. Now i want to index documents with
unsupported mime types.
Right now i am using LIUS which is built over Lucene for indexing the
documents.

Is there any tool which I can use for indexing the unsupported mime types.
Thanks in advance.
-Gaurav


-----
-Gaurav
-- 
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: too many clauses exception

Posted by Chris Lu <ch...@gmail.com>.
This is easy, use:
        BooleanQuery.setMaxClauseCount(4096);

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Thu, Jul 3, 2008 at 11:23 PM, Gaurav Sharma <ga...@gmail.com>
wrote:

>
>
> Hi,
>
> I am stuck with one more exception.
> When i am using a wild card such as a* i am getting too many clauses
> exception. It saying maximum clause count is set to 1024. Is there any way
> to increase this count.
> Can u please help me out in overcoming this.
>
> Thanks in advance.
> -Gaurav
>
>
>
> -----
> -Gaurav
> --
> View this message in context:
> http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

too many clauses exception

Posted by Gaurav Sharma <ga...@gmail.com>.

Hi,

I am stuck with one more exception.
When i am using a wild card such as a* i am getting too many clauses
exception. It saying maximum clause count is set to 1024. Is there any way
to increase this count.
Can u please help me out in overcoming this.

Thanks in advance.
-Gaurav



-----
-Gaurav
-- 
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18273569.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexing unsupported mime types using Lucene

Posted by Gaurav Sharma <ga...@gmail.com>.
hi Otis

I haven't tried Tiks?
Is it open source?

had u heard about LIUS before or is it talked aroung industry?
And what about Solr. It seems you worked on Solr and Nutch.

Otis Gospodnetic wrote:
> 
> Gaurav, have you tried Tika? (sub-project of Apache Lucene)
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> ----- Original Message ----
>> From: Gaurav Sharma <ga...@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, June 18, 2008 10:07:22 AM
>> Subject: indexing unsupported mime types using Lucene
>> 
>> 
>> Hi,
>> I am using Lucene for indexing and searching the documents.
>> Its working file for supported documents. Now i want to index documents
>> with
>> unsupported mime types.
>> Right now i am using LIUS which is built over Lucene for indexing the
>> documents.
>> 
>> Is there any tool which I can use for indexing the unsupported mime
>> types.
>> Thanks in advance.
>> -Gaurav
>> 
>> 
>> -----
>> -Gaurav
>> -- 
>> View this message in context: 
>> http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p17983491.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 


-----
-Gaurav
-- 
View this message in context: http://www.nabble.com/indexing-unsupported-mime-types-using-Lucene-tp17983491p18023951.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: indexing unsupported mime types using Lucene

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Gaurav,

To which mime types are you referring?

I can't think of a tool designed for this, but one thing you might try is checking whether the input is compressed/packed, and if so first decompressing/unpacking it, and then using the "strings" program (available on Linux and Cygwin) to extract string data.

Steve

On 06/18/2008 at 10:07 AM, Gaurav Sharma wrote:
> 
> Hi, I am using Lucene for indexing and searching the documents. Its
> working file for supported documents. Now i want to index documents with
> unsupported mime types. Right now i am using LIUS which is built over
> Lucene for indexing the documents.
> 
> Is there any tool which I can use for indexing the
> unsupported mime types.
> Thanks in advance.
> -Gaurav
> 
> 
> ----- -Gaurav -- View this message in context:
> http://www.nabble.com/indexing-unsupported-mime-types-using-Lu
> cene-tp17983491p17983491.html Sent from the Lucene - Java Users mailing
> list archive at Nabble.com.
> 
> 
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For
> additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org