You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rama Krishna <sr...@hotmail.com> on 2002/05/29 11:48:04 UTC
MS Word Search ??
Hi,
I am trying to build a search engine which search in MS Word, excel, ppt and
adobe pdf. I am not sure whether i can use Lucene for this or not. pl. help
me out in this regard.
Regards,
Ramakrishna
_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: MS Word Search ??
Posted by Guru <gu...@equadriga.com>.
why not?
You can use this for any file (with any extension)
regards
guru
----- Original Message -----
From: "Rama Krishna" <sr...@hotmail.com>
To: <lu...@jakarta.apache.org>
Sent: Wednesday, May 29, 2002 3:18 PM
Subject: MS Word Search ??
> Hi,
>
> I am trying to build a search engine which search in MS Word, excel, ppt
and
> adobe pdf. I am not sure whether i can use Lucene for this or not. pl.
help
> me out in this regard.
>
>
> Regards,
> Ramakrishna
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Query parser error
Posted by Peter Carlson <ca...@bookandhammer.com>.
Try using the newest release and read the Queryparser syntax.
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
Escape characters were just added.
--Peter
On 5/29/02 10:53 PM, "Harpreet S Walia" <ha...@sansuisoftware.com> wrote:
> Hi
>
> I am trying to search words which have characters { , [ etc in them . I am
> using the standard lucene jar (V 1.2-rc4 ).
> when i search for the words having these charcters i get a exception saying
>
> org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column
> 8. Encountered: after : "{are"
> at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
> Source)
> at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
> at com.sansui.lucene.Searcher.searchDocuments(Searcher.java:79)
>
>
> I presume that lucene is treating these words as special characters . Is there
> a way to avoid this error and search for these kind of words . what kind of
> changes are required to be done for this .
>
> Can someone shed some light on how queryparser treats the contents .
>
> Thanks and regards,
> Harpreet
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Query parser error
Posted by Harpreet S Walia <ha...@sansuisoftware.com>.
Hi
I am trying to search words which have characters { , [ etc in them . I am using the standard lucene jar (V 1.2-rc4 ).
when i search for the words having these charcters i get a exception saying
org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 8. Encountered: after : "{are"
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
at com.sansui.lucene.Searcher.searchDocuments(Searcher.java:79)
I presume that lucene is treating these words as special characters . Is there a way to avoid this error and search for these kind of words . what kind of changes are required to be done for this .
Can someone shed some light on how queryparser treats the contents .
Thanks and regards,
Harpreet
Date Range Searches using QueryParser
Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi,
Has anyone used the Range Search built into the queryParser to search by
date?
For example, something like
April 1, 2002 -> 0czi1cego
April 10, 2002 -> 0czuu5woo
Then do a search using like
date:[0czi1cego-0czuu5woo]
I am running into a problem where I get the correct start date but I always
get the last date in the system (for me this is 6/1/02).
What is looks like is happening is it takes the first date in the range and
goes until the end.
Does anyone else have experience with this?
Thanks
--Peter
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: MS Word Search ??
Posted by Ewout Prangsma <e....@daisysoftware.com>.
Op Wednesday 29 May 2002 11:56, Karl Øie schreef:
> b: convert the documents to something that is accessable through java like
> xml, etc...
We're using wvWare (wvware.com) to convert word to html (or text) and index
that and xpdf for converting PDF to text and index that. Any links on
indexing using POI converters (or other java converters) are very welcome!
Ewout
>
> the best way is to convert as the java api's for MSOffice documents still
> are under development
>
> mvh karl øie
>
> On Wednesday 29 May 2002 11:48, Rama Krishna wrote:
> > Hi,
> >
> > I am trying to build a search engine which search in MS Word, excel, ppt
> > and adobe pdf. I am not sure whether i can use Lucene for this or not.
> > pl. help me out in this regard.
> >
> >
> > Regards,
> > Ramakrishna
> >
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com
--
Ewout Prangsma, Directeur
Daisy Software
Telefoon/fax: +31-77-3270305/3270306
Email: e.prangsma@daisysoftware.com
Website: www.daisysoftware.com
KvK Venlo nr. 12046144
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: MS Word Search ??
Posted by Karl Øie <ka...@gan.no>.
to search MS office documents you must first be able to
a: access the documents through java with apis like POI etc....
b: convert the documents to something that is accessable through java like
xml, etc...
the best way is to convert as the java api's for MSOffice documents still are
under development
mvh karl øie
On Wednesday 29 May 2002 11:48, Rama Krishna wrote:
> Hi,
>
> I am trying to build a search engine which search in MS Word, excel, ppt
> and adobe pdf. I am not sure whether i can use Lucene for this or not. pl.
> help me out in this regard.
>
>
> Regards,
> Ramakrishna
>
>
> _________________________________________________________________
> Chat with friends online, try MSN Messenger: http://messenger.msn.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>