You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rama Krishna <sr...@hotmail.com> on 2002/05/29 11:48:04 UTC

MS Word Search ??

Hi,

I am trying to build a search engine which search in MS Word, excel, ppt and 
adobe pdf. I am not sure whether i can use Lucene for this or not.  pl. help 
me out in this regard.


Regards,
Ramakrishna


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: MS Word Search ??

Posted by Guru <gu...@equadriga.com>.
why not?

You can use this for any file (with any extension)

regards
guru

----- Original Message -----
From: "Rama Krishna" <sr...@hotmail.com>
To: <lu...@jakarta.apache.org>
Sent: Wednesday, May 29, 2002 3:18 PM
Subject: MS Word Search ??


> Hi,
>
> I am trying to build a search engine which search in MS Word, excel, ppt
and
> adobe pdf. I am not sure whether i can use Lucene for this or not.  pl.
help
> me out in this regard.
>
>
> Regards,
> Ramakrishna
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Query parser error

Posted by Peter Carlson <ca...@bookandhammer.com>.
Try using the newest release and read the Queryparser syntax.

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Escape characters were just added.

--Peter



On 5/29/02 10:53 PM, "Harpreet S Walia" <ha...@sansuisoftware.com> wrote:

> Hi
> 
> I am trying to search words which have characters { , [ etc in them . I am
> using the standard lucene jar (V 1.2-rc4 ).
> when i search for the words having these charcters i get a exception saying
> 
> org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column
> 8.  Encountered:  after : "{are"
> at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
> Source)
> at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> at com.sansui.lucene.Searcher.searchDocuments(Searcher.java:79)
> 
> 
> I presume that lucene is treating these words as special characters . Is there
> a way to avoid this error and search for these kind of words . what kind of
> changes are required to be done for this .
> 
> Can someone shed some light on how queryparser treats the contents .
> 
> Thanks and regards,
> Harpreet
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Query parser error

Posted by Harpreet S Walia <ha...@sansuisoftware.com>.
Hi

I am trying to search words which have characters { , [ etc in them . I am using the standard lucene jar (V 1.2-rc4 ).
 when i search for the words having these charcters i get a exception saying 

org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 8.  Encountered:  after : "{are"
	at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
	at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
	at com.sansui.lucene.Searcher.searchDocuments(Searcher.java:79)


I presume that lucene is treating these words as special characters . Is there a way to avoid this error and search for these kind of words . what kind of changes are required to be done for this .

Can someone shed some light on how queryparser treats the contents .

Thanks and regards,
Harpreet

Date Range Searches using QueryParser

Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi,

Has anyone used the Range Search built into the queryParser to search by
date?

For example, something like

April 1, 2002 -> 0czi1cego
April 10, 2002 -> 0czuu5woo

Then do a search using like

date:[0czi1cego-0czuu5woo]


I am running into a problem where I get the correct start date but I always
get the last date in the system (for me this is 6/1/02).

What is looks like is happening is it takes the first date in the range and
goes until the end.

Does anyone else have experience with this?

Thanks

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: MS Word Search ??

Posted by Ewout Prangsma <e....@daisysoftware.com>.
Op Wednesday 29 May 2002 11:56, Karl Øie schreef:
> b: convert the documents to something that is accessable through java like
> xml, etc...

We're using wvWare (wvware.com) to convert word to html (or text) and index 
that and xpdf for converting PDF to text and index that. Any links on 
indexing using POI converters (or other java converters) are very welcome!

Ewout

>
> the best way is to convert as the java api's for MSOffice documents still
> are under development
>
> mvh karl øie
>
> On Wednesday 29 May 2002 11:48, Rama Krishna wrote:
> > Hi,
> >
> > I am trying to build a search engine which search in MS Word, excel, ppt
> > and adobe pdf. I am not sure whether i can use Lucene for this or not. 
> > pl. help me out in this regard.
> >
> >
> > Regards,
> > Ramakrishna
> >
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com

-- 
Ewout Prangsma, Directeur
Daisy Software
Telefoon/fax: +31-77-3270305/3270306
Email: e.prangsma@daisysoftware.com
Website: www.daisysoftware.com
KvK Venlo nr. 12046144 




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: MS Word Search ??

Posted by Karl Øie <ka...@gan.no>.
to search MS office documents you must first be able to

a: access the documents through java with apis like POI etc....

b: convert the documents to something that is accessable through java like 
xml, etc...

the best way is to convert as the java api's for MSOffice documents still are 
under development

mvh karl øie



On Wednesday 29 May 2002 11:48, Rama Krishna wrote:
> Hi,
>
> I am trying to build a search engine which search in MS Word, excel, ppt
> and adobe pdf. I am not sure whether i can use Lucene for this or not.  pl.
> help me out in this regard.
>
>
> Regards,
> Ramakrishna
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>