You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Furash Gary <fu...@mcao.maricopa.gov> on 2006/07/10 18:07:02 UTC

General Approach: Analyzer versus Query

For some things, it's obvious that you would have to put them both on
the front end (during indexing) and on the back end.  E.g., if you want
to do a soundex search, you'd want to encode the words with their
soundex version during index creation, and when you query incode the
user's search input as a soundex version.

In other cases, it's not clear to me what the right approach is.  Let's
say user's are sorting by a formatted number: e.g., YYYYNNNNNN, like
2005123456.  User's might put in "05123456" or "2005123456".  In either
case, that query would be considered a hit, with 2005123456 being more
exact than 05123456, but still a good hit.

I could (1) up front, put in both versions of the numbers or (2) during
query, play with the number and search both ways.  What's the best
practice approach?

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Lucene WordExtractor

Posted by mcarcelen <mc...@isoco.com>.

Thanks suba
Sorry

-----Mensaje original-----
De: Suba Suresh [mailto:subas@wolfram.com] 
Enviado el: martes, 11 de julio de 2006 15:51
Para: java-user@lucene.apache.org
Asunto: Re: Lucene WordExtractor

There is a separate user mailing list for poi. Use it.

There are three jar files. Check the scratchpad jar. You have to send in 
a FileInputStream(not the filename) as an argument to the WordExtractor 
class.

suba suresh.

mcarcelen wrote:
> Hi all!
> I´m working with poi-bin-3.0-alpha2-20060616
> I´m trying to extract text from a Word document using the class
> org.apache.poi.hwpf.extractor.WordExtractor but I get the following bugs
> "Exception in thread main java.lang.NoSuchMethodError"
> I have also tried with the parameter -doc and the name of the word
document
> but without success
> I´ve execute the classes QuickTest.class and HWPFDocument.class, also with
> bugs
> Can anyone help me?
> Thanks all
> Cheers 
> Teresa
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene WordExtractor

Posted by Suba Suresh <su...@wolfram.com>.

There is a separate user mailing list for poi. Use it.

There are three jar files. Check the scratchpad jar. You have to send in 
a FileInputStream(not the filename) as an argument to the WordExtractor 
class.

suba suresh.

mcarcelen wrote:
> Hi all!
> I´m working with poi-bin-3.0-alpha2-20060616
> I´m trying to extract text from a Word document using the class
> org.apache.poi.hwpf.extractor.WordExtractor but I get the following bugs
> "Exception in thread main java.lang.NoSuchMethodError"
> I have also tried with the parameter -doc and the name of the word document
> but without success
> I´ve execute the classes QuickTest.class and HWPFDocument.class, also with
> bugs
> Can anyone help me?
> Thanks all
> Cheers 
> Teresa
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Lucene WordExtractor

Posted by mcarcelen <mc...@isoco.com>.

Hi all!
I´m working with poi-bin-3.0-alpha2-20060616
I´m trying to extract text from a Word document using the class
org.apache.poi.hwpf.extractor.WordExtractor but I get the following bugs
"Exception in thread main java.lang.NoSuchMethodError"
I have also tried with the parameter -doc and the name of the word document
but without success
I´ve execute the classes QuickTest.class and HWPFDocument.class, also with
bugs
Can anyone help me?
Thanks all
Cheers 
Teresa


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: General Approach: Analyzer versus Query

Posted by James Pine <ge...@yahoo.com>.

Would Lucene's FuzzyQuery be useful in this case? I
suppose it would depend on how meaningful the
sequences of numbers are.

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FuzzyQuery.html

--- Chris Hostetter <ho...@fucit.org> wrote:

> 
> : I could (1) up front, put in both versions of the
> numbers or (2) during
> : query, play with the number and search both ways. 
> What's the best
> : practice approach?
> 
> In the imortal words of Erik Hatcher...
> 
> 	"It Depends :)"
> 
> #1 takes up more space on disk and in memory, and
> makes it imposible to
> sort on that field (you can only sort on fields with
> 0 or 1 terms per doc)
> ... #2 makes the query take a little longer ... you
> really won't notice a
> difference if you have an index of 1000 Documents
> and one user, but you
> might if you have 1,000,000 items and 100 concurrent
> users.
> 
> 
> 
> -Hoss
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: General Approach: Analyzer versus Query

Posted by Chris Hostetter <ho...@fucit.org>.

: I could (1) up front, put in both versions of the numbers or (2) during
: query, play with the number and search both ways.  What's the best
: practice approach?

In the imortal words of Erik Hatcher...

	"It Depends :)"

#1 takes up more space on disk and in memory, and makes it imposible to
sort on that field (you can only sort on fields with 0 or 1 terms per doc)
... #2 makes the query take a little longer ... you really won't notice a
difference if you have an index of 1000 Documents and one user, but you
might if you have 1,000,000 items and 100 concurrent users.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org