You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2004/04/22 19:08:15 UTC

Re: more rigid stopword list ?

Moving to lucene-user list.

One of my Lucene articles includes a more comprehensive stop word list
for English:

http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html?page=2#references

Otis

--- hgadm@cswebmail.com wrote:
> Dear all,
> 
> for my taste the stopwords included in Lucene (e.g.
> StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
> with the SnowballAnalyzer - and I guess also with the
> StandardAnalyzer) is not strict enough:
> 
> For example in a sentence with "we need ..." I would
> consider "we" and "need" as stopwords but they are not
> stripped by SnowballAnalyzer or StandardAnalyzer. 
> 
> Now:
> Is there an in-built solution to use more restrictive
> stripping or do I better create my own analyzer in that
> case with a more restrictive stopword list ?
> 
> If so - are you aware of more rigid lists ? (a URI
> would be great !)
> 
> Thanks,
> 
> Holger
> 
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
>  Better!  Faster! More Powerful!
>  250 FREE hours! Sign-on Now!
>  http://www.compuserve.com/trycsrv/cs2000/webmail/
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
ka

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Doing a join?

Posted by jitender ahuja <aj...@aalayance.com>.
Hi,
   I feel by different files u mean two different sub directories under a
common folder(i.e. directory).
    Now, one stores 3 fields and another one stores 2 fields.
    Also, the class_Id or Class_Name(file 1) and the Student_Id fields(file
2) should be indexed and
    stored in the index.
    You need to have your own code for implementing the same, that will
perform the search.
    The code snippet below, I feel if included in ur own searcher code would
provide results on the
    command line.
    Also, I have not tested it but in case of problem, pl. send me a small
sample database of the two tables
    to be indexed( and not the index files) or something similar one.

Regards,
Jitender

code:
========
String searchStr=null;
ArrayList results = new ArrayList();
ArrayList results1 = new ArrayList();
Searcher indexSrch = new IndexSearcher("file1");//"file1" is the file that
stores the 3 fields
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse(searchStr, "Class_Id", analyzer);
Hits hits = indexSrch.search(query);
for( int i=0; i<hits.length(); i++)
 {
   Document doc = hits.doc(i);
   String id = doc.get("Student_Id");// retrieves the Student's Id
   results.add(id);
 }

// need to feed the "results" contents to the query1 as "searchStr1"
Searcher indexSrch1 = new IndexSearcher("file2");//"file2" is the file that
stores 2 fields

if (hits.length!=0)
{
for (int j = 0; j<results.size(); String searchStr1= results.get(j))
{
Query query1= QueryParser.parse(searchStr1, "Student_Id", analyzer);
Hits hits1 = indexSrch1.search(query1);
for( int k=0; k<hits1.length(); k++)
 {
 Document doc = hits1.doc(0);
 String Student_name = doc.get("Student_Name");
 results1.add(Student_name);
 }
 }
 }
----- Original Message ----- 
From: "Rob Jose" <rj...@surewest.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, April 22, 2004 11:56 PM
Subject: Doing a join?


> Is it possible to do a join on two fields when searching a Lucene Index.
> For example, I have an index of documents that have a "StudentName" and a
> "StudentId" field and another document that has "ClassId", "ClassName" and
> "StudentId".  I want to do a search on "ClassId" or "ClassName" and get a
> list of "StudentName".  Both of these documents are in one index, but are
> loaded from seperate files, so I can't join at creation time.  Any help is
> greatly appreciated.
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Doing a join?

Posted by jitender ahuja <aj...@aalayance.com>.
Hi,
   I feel by different files u mean two different sub directories under a
common folder(i.e. directory).
    Now, one stores 3 fields and another one stores 2 fields.
    Also, the class_Id or Class_Name(file 1) and the Student_Id fields(file
2) should be indexed and
    stored in the index.
    You need to have your own code for implementing the same, that will
perform the search.
    The code snippet below, I feel if included in ur own searcher code would
provide results on the
    command line.
    Also, I have not tested it but in case of problem, pl. send me a small
sample database of the two tables
    to be indexed( and not the index files) or something similar one.

Regards,
Jitender

code:
========
String searchStr=null;
ArrayList results = new ArrayList();
ArrayList results1 = new ArrayList();
Searcher indexSrch = new IndexSearcher("file1");//"file1" is the file that
stores the 3 fields
Analyzer analyzer = new StandardAnalyzer();
Query query = QueryParser.parse(searchStr, "Class_Id", analyzer);
Hits hits = indexSrch.search(query);
for( int i=0; i<hits.length(); i++)
 {
   Document doc = hits.doc(i);
   String id = doc.get("Student_Id");// retrieves the Student's Id
   results.add(id);
 }

// need to feed the "results" contents to the query1 as "searchStr1"
Searcher indexSrch1 = new IndexSearcher("file2");//"file2" is the file that
stores 2 fields

if (hits.length!=0)
{
for (int j = 0; j<results.size(); String searchStr1= results.get(j))
{
Query query1= QueryParser.parse(searchStr1, "Student_Id", analyzer);
Hits hits1 = indexSrch1.search(query1);
for( int k=0; k<hits1.length(); k++)
 {
 Document doc = hits1.doc(0);
 String Student_name = doc.get("Student_Name");
 results1.add(Student_name);
 }
 }
 }
----- Original Message ----- 
From: "Rob Jose" <rj...@surewest.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, April 22, 2004 11:56 PM
Subject: Doing a join?


> Is it possible to do a join on two fields when searching a Lucene Index.
> For example, I have an index of documents that have a "StudentName" and a
> "StudentId" field and another document that has "ClassId", "ClassName" and
> "StudentId".  I want to do a search on "ClassId" or "ClassName" and get a
> list of "StudentName".  Both of these documents are in one index, but are
> loaded from seperate files, so I can't join at creation time.  Any help is
> greatly appreciated.
>
> Rob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Doing a join?

Posted by Rob Jose <rj...@surewest.net>.
Is it possible to do a join on two fields when searching a Lucene Index.
For example, I have an index of documents that have a "StudentName" and a
"StudentId" field and another document that has "ClassId", "ClassName" and
"StudentId".  I want to do a search on "ClassId" or "ClassName" and get a
list of "StudentName".  Both of these documents are in one index, but are
loaded from seperate files, so I can't join at creation time.  Any help is
greatly appreciated.

Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: more rigid stopword list ?

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
p.s. there is no need to create a new Analyzer to tweak the stop word  
list.  The analyzers that do stop word removal accept the list as an  
argument to an overloaded constructor.

	Erik


On Apr 22, 2004, at 1:08 PM, Otis Gospodnetic wrote:

> Moving to lucene-user list.
>
> One of my Lucene articles includes a more comprehensive stop word list
> for English:
>
> http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html? 
> page=2#references
>
> Otis
>
> --- hgadm@cswebmail.com wrote:
>> Dear all,
>>
>> for my taste the stopwords included in Lucene (e.g.
>> StopAnalyzer.ENGLISH_STOP_WORDS, wich is usually used
>> with the SnowballAnalyzer - and I guess also with the
>> StandardAnalyzer) is not strict enough:
>>
>> For example in a sentence with "we need ..." I would
>> consider "we" and "need" as stopwords but they are not
>> stripped by SnowballAnalyzer or StandardAnalyzer.
>>
>> Now:
>> Is there an in-built solution to use more restrictive
>> stripping or do I better create my own analyzer in that
>> case with a more restrictive stopword list ?
>>
>> If so - are you aware of more rigid lists ? (a URI
>> would be great !)
>>
>> Thanks,
>>
>> Holger
>>
>> ___________________________________________________
>> The ALL NEW CS2000 from CompuServe
>>  Better!  Faster! More Powerful!
>>  250 FREE hours! Sign-on Now!
>>  http://www.compuserve.com/trycsrv/cs2000/webmail/
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>
> ka
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: RemoteSearchable

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Ah, yet another perfect time to plug Lucene's test suite.  There is an 
example of it in Lucene's own test code.  This is always your best bet 
for prime (perhaps contrived, but effective for learning the API) 
examples.

	Erik

On Apr 30, 2004, at 8:14 PM, Venu Durgam wrote:

>
> I was wondering if you implemented search using RemoteSearchable.
> Would you kindly email sample source code.
>
> Thanks.
> Venu Durgam
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RemoteSearchable

Posted by Venu Durgam <vd...@yahoo.com>.
I was wondering if you implemented search using RemoteSearchable.
Would you kindly email sample source code.

Thanks.
Venu Durgam