You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Tardif, Sebastien" <ST...@anacomp.com> on 2005/07/15 20:57:02 UTC

Runtime full text search like in Microsoft Windows Search

How can you use Lucene like the very limited but fast search that
Microsoft Windows Search provide?
 
The use case is that the users have a CD with lot of files. I provide
them a nice user interface. They have the option to generate the full
text search index but they should also be able to search without an
index generated. I know that will be slow, but Microsoft Windows Search
is still able to search 500 MB in less than 30 seconds for simpler
matching.
 
How can I use Lucene for this simpler, not existing index, search?
 
Or should I have to hook to operating system specific API like Win32 on
Windows?

Re: Runtime full text search like in Microsoft Windows Search

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 15, 2005, at 3:12 PM, xing@mac.com wrote:

> If Microsoft Search does as you describe. Isn't it just:
>
> 1) Open file
> 2) Determine file type
> 3) Convert file content to UTF8, if text based, and you have the  
> API to read it. .html, .txt., .doc, .excel, etc.
> 4) Perform string search, regex.
> 5) Continue to next file
>
> As far as I know, Lucene is not designed for unindexed search.

The new MemoryIndex might be perfect for this sort of thing.  I  
suspect Microsoft's search doesn't allow anything but a term/exact- 
phrase kinda query, so even MemoryIndex might be doing more work (and  
thus slower) for a fair comparison.  However, you'd be able to do  
rich queries using MemoryIndex and it has been heavily tuned for  
performance.  The slow part will be simply reading the files and  
converting them.

     Erik


>
> Tardif, Sebastien wrote:
>
>> How can you use Lucene like the very limited but fast search that
>> Microsoft Windows Search provide?
>>  The use case is that the users have a CD with lot of files. I  
>> provide
>> them a nice user interface. They have the option to generate the full
>> text search index but they should also be able to search without an
>> index generated. I know that will be slow, but Microsoft Windows  
>> Search
>> is still able to search 500 MB in less than 30 seconds for simpler
>> matching.
>>  How can I use Lucene for this simpler, not existing index, search?
>>  Or should I have to hook to operating system specific API like  
>> Win32 on
>> Windows?
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Runtime full text search like in Microsoft Windows Search

Posted by "xing@mac.com" <xi...@mac.com>.
If Microsoft Search does as you describe. Isn't it just:

1) Open file
2) Determine file type
3) Convert file content to UTF8, if text based, and you have the API to 
read it. .html, .txt., .doc, .excel, etc.
4) Perform string search, regex.
5) Continue to next file

As far as I know, Lucene is not designed for unindexed search.

Tardif, Sebastien wrote:
> How can you use Lucene like the very limited but fast search that
> Microsoft Windows Search provide?
>  
> The use case is that the users have a CD with lot of files. I provide
> them a nice user interface. They have the option to generate the full
> text search index but they should also be able to search without an
> index generated. I know that will be slow, but Microsoft Windows Search
> is still able to search 500 MB in less than 30 seconds for simpler
> matching.
>  
> How can I use Lucene for this simpler, not existing index, search?
>  
> Or should I have to hook to operating system specific API like Win32 on
> Windows?
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Runtime full text search like in Microsoft Windows Search

Posted by Nathan Brackett <nb...@net-temps.com>.
I imagine you could index the info you wanted to quickly search on into a
RAMDirectory (assuming it wasn't too much info), then run simple or complex
searches on that, but I that might take longer to do than simple regex
searching on files. That would only give you a gain if you were going to run
repeated searches on that set of data.


-----Original Message-----
From: Tardif, Sebastien [mailto:STARDIF@anacomp.com]
Sent: Friday, July 15, 2005 2:57 PM
To: java-user@lucene.apache.org
Subject: Runtime full text search like in Microsoft Windows Search


How can you use Lucene like the very limited but fast search that
Microsoft Windows Search provide?

The use case is that the users have a CD with lot of files. I provide
them a nice user interface. They have the option to generate the full
text search index but they should also be able to search without an
index generated. I know that will be slow, but Microsoft Windows Search
is still able to search 500 MB in less than 30 seconds for simpler
matching.

How can I use Lucene for this simpler, not existing index, search?

Or should I have to hook to operating system specific API like Win32 on
Windows?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Runtime full text search like in Microsoft Windows Search

Posted by Otis Gospodnetic <ot...@yahoo.com>.
As somebody already said, you can have an in-memory index with
RAMDirectory.  You can also pre-build a Lucene index on that CD - CD is
"static", you can't add/remove/change files on it, so you can build an
index and burn it onto the CD at the same time when you put the Word
files on it.

As for getting the indexable text out of Word and other documents, look
at the code for the Lucene book - http://lucenebook.com - there is a
little framework there, that parses and indexes a number of common file
types.

Otis


--- "Tardif, Sebastien" <ST...@anacomp.com> wrote:

> How can you use Lucene like the very limited but fast search that
> Microsoft Windows Search provide?
>  
> The use case is that the users have a CD with lot of files. I provide
> them a nice user interface. They have the option to generate the full
> text search index but they should also be able to search without an
> index generated. I know that will be slow, but Microsoft Windows
> Search
> is still able to search 500 MB in less than 30 seconds for simpler
> matching.
>  
> How can I use Lucene for this simpler, not existing index, search?
>  
> Or should I have to hook to operating system specific API like Win32
> on
> Windows?
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org