You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Kay Kay <ka...@gmail.com> on 2009/01/16 22:47:20 UTC

Nightly source builds of Lucene ..

I am trying to access the nightly lucene builds here at -  
http://people.apache.org/builds/lucene/java/nightly/  .  It does not 
seem to be available for sometime.  Just curious if that is the right 
source to access the same. 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search Across All Fields

Posted by Jamie <ja...@stimulussoft.com>.

Hi Erick

Thanks for the pointer. I dont know how I missed that. Our index sizes 
are absolutely huge so its not really practical in putting an all_text 
field. It would great if you could introduce a macro or something that 
one could use to specify all fields.

Thanks anyway!

Jamie


Erick Erickson wrote:
> I think you forgot a set of parentheses, a close paren right before
> the AND and an open paren right after AND
>
> Depending upon how big your index is, a MUCH easier way to do
> this is to index another field, call it all_text say, and add all your
> terms to that field as well as to the individual one, then search your
> all_text field instead....
>
> Best
> Erick
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search Across All Fields

Posted by Erick Erickson <er...@gmail.com>.

I think you forgot a set of parentheses, a close paren right before
the AND and an open paren right after AND

Depending upon how big your index is, a MUCH easier way to do
this is to index another field, call it all_text say, and add all your
terms to that field as well as to the individual one, then search your
all_text field instead....

Best
Erick

On Fri, Jan 16, 2009 at 6:02 PM, Jamie <ja...@stimulussoft.com> wrote:

> Hi Everyone
>
> I have two queries:
>
> Query 1
> ======
>
> (attachments:"beauty supply") AND sentdate:[d20081117010000 TO
> d20090117235900]
>
> Query 2
> ======
>
> (priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty
> archivedate:beauty receiveddate:beauty from:beauty to:beauty subject:beauty
> cc:beauty bcc:beauty deliveredto:beauty flag:beauty sensitivity:beauty
> sender:beauty recipient:beauty body:beauty attachments:beauty
> attachname:beauty AND priority:supply attach:supply score:supply size:supply
> sentdate:supply archivedate:supply receiveddate:supply from:supply to:supply
> subject:supply cc:supply bcc:supply deliveredto:supply flag:supply
> sensitivity:supply sender:supply recipient:supply body:supply
> attachments:supply attachname:supply) AND sentdate:[d20081117010000 TO
> d20090117235900]
>
> Query 1 returns 138 results, while Query 2 return 0 result. Any idea why?
> The second query is meant to offer the search across all fields, whereas the
> first query specifies one field. Is there a better way to conduct a search
> across all fields? Am I missing something?
>
> Thanks in advance for your help!
>
> Regards,
>
> Jamie
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Words that need protection from stemming, i.e., protwords.txt

Posted by David Woodward <dw...@loc.gov>.

Hi.

Any good protwords.txt out there?

In a fairly standard solr analyzer chain, we use the English Porter analyzer like so:

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g.,

"maine" is stemmed to "main" - clearly goofing up precision about "Maine" without doing much good for variants of "main".

So - I have an entry for my protwords.txt. What else should go in there?

Thanks for your ideas,

Dave Woodward

Re: Words that need protection from stemming, i.e., protwords.txt

Posted by Chris Hostetter <ho...@fucit.org>.

: Subject: Words that need protection from stemming, i.e., protwords.txt
: References: <49...@gmail.com>
:  <39...@gmail.com>
:  <49...@stimulussoft.com>
: In-Reply-To: <49...@stimulussoft.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Words that need protection from stemming, i.e., protwords.txt

Posted by patrick o'leary <pj...@pjaol.com>.

Porter is a little outdated I've found KStem much better
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem

You'll still need a good protected word list, but KStem is just a little
nicer

On Fri, Jan 16, 2009 at 6:20 PM, David Woodward <dw...@loc.gov> wrote:

> Hi.
>
> Any good protwords.txt out there?
>
> In a fairly standard solr analyzer chain, we use the English Porter
> analyzer like so:
>
> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
>
> For most purposes the porter does just fine, but occasionally words come
> along that really don't work out to well, e.g.,
>
> "maine" is stemmed to "main" - clearly goofing up precision about "Maine"
> without doing much good for variants of "main".
>
> So - I have an entry for my protwords.txt. What else should go in there?
>
> Thanks for your ideas,
>
> Dave Woodward
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Words that need protection from stemming, i.e., protwords.txt

Posted by David Woodward <dw...@loc.gov>.

Hi.

Any good protwords.txt out there?

In a fairly standard solr analyzer chain, we use the English Porter analyzer like so:

<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>

For most purposes the porter does just fine, but occasionally words come along that really don't work out to well, e.g.,

"maine" is stemmed to "main" - clearly goofing up precision about "Maine" without doing much good for variants of "main".

So - I have an entry for my protwords.txt. What else should go in there?

Thanks for your ideas,

Dave Woodward


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search Across All Fields

Posted by Chris Hostetter <ho...@fucit.org>.

: Subject: Search Across All Fields
: References: <49...@gmail.com>
:     <39...@gmail.com>
: In-Reply-To: <39...@gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking






-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search Across All Fields

Posted by "Zhang, Lisheng" <Li...@BroadVision.com>.

Hi,

Inside (priority:beauty ..) there is an AND,
is that operator what you want?

Best regards, Lisheng

-----Original Message-----
From: Jamie [mailto:jamie@stimulussoft.com]
Sent: Friday, January 16, 2009 3:02 PM
To: java-user@lucene.apache.org
Subject: Search Across All Fields


Hi Everyone

I have two queries:

Query 1
======

(attachments:"beauty supply") AND sentdate:[d20081117010000 TO 
d20090117235900]

Query 2
======

(priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty 
archivedate:beauty receiveddate:beauty from:beauty to:beauty 
subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty 
sensitivity:beauty sender:beauty recipient:beauty body:beauty 
attachments:beauty attachname:beauty AND priority:supply attach:supply 
score:supply size:supply sentdate:supply archivedate:supply 
receiveddate:supply from:supply to:supply subject:supply cc:supply 
bcc:supply deliveredto:supply flag:supply sensitivity:supply 
sender:supply recipient:supply body:supply attachments:supply 
attachname:supply) AND sentdate:[d20081117010000 TO d20090117235900]

Query 1 returns 138 results, while Query 2 return 0 result. Any idea 
why? The second query is meant to offer the search across all fields, 
whereas the first query specifies one field. Is there a better way to 
conduct a search across all fields? Am I missing something?

Thanks in advance for your help!

Regards,

Jamie


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Across All Fields

Posted by Jamie <ja...@stimulussoft.com>.

Hi Everyone

I have two queries:

Query 1
======

(attachments:"beauty supply") AND sentdate:[d20081117010000 TO 
d20090117235900]

Query 2
======

(priority:beauty attach:beauty score:beauty size:beauty sentdate:beauty 
archivedate:beauty receiveddate:beauty from:beauty to:beauty 
subject:beauty cc:beauty bcc:beauty deliveredto:beauty flag:beauty 
sensitivity:beauty sender:beauty recipient:beauty body:beauty 
attachments:beauty attachname:beauty AND priority:supply attach:supply 
score:supply size:supply sentdate:supply archivedate:supply 
receiveddate:supply from:supply to:supply subject:supply cc:supply 
bcc:supply deliveredto:supply flag:supply sensitivity:supply 
sender:supply recipient:supply body:supply attachments:supply 
attachname:supply) AND sentdate:[d20081117010000 TO d20090117235900]

Query 1 returns 138 results, while Query 2 return 0 result. Any idea 
why? The second query is meant to offer the search across all fields, 
whereas the first query specifies one field. Is there a better way to 
conduct a search across all fields? Am I missing something?

Thanks in advance for your help!

Regards,

Jamie


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Nightly source builds of Lucene ..

Posted by Kay Kay <ka...@gmail.com>.

Yes - I was referring to the nightly builds of Lucene.

For eg-  this page  - http://lucene.apache.org/java/docs/index.html 
(search for "Nightly Source Builds" ) contains the link to 
http://people.apache.org/builds/lucene/java/nightly/ .  It might be 
worth updating this link.

For now - as an alternative - the hudson url mentioned has it to get the 
gzipped-source though .

http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/714/artifact/artifacts/ 
.

Chris Hostetter wrote:
> : maybe try:
> : 
> : http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/
>
> I believe Kay was asking about Lucene-Java nightly builds.
>
> : > I am trying to access the nightly lucene builds here at -
> : > http://people.apache.org/builds/lucene/java/nightly/  .  It does not seem to
> : > be available for sometime.  Just curious if that is the right source to
>
> That's an old URL that no longer works, if you found a link to it 
> somewhere on an apache.org site please let us know so we can fix it.
>
> The problem is it's currently setup to redirect you here...
>    http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
> ...this is the same URL that's currently linked to from the developer's 
> resource page...
>    http://lucene.apache.org/java/docs/developer-resources.html
> ...but unfortunately it doesn't work.  I believe the problem is that now 
> that we have a central hudson system,even though the lucene zone is still 
> doing lucene builds, i don't think it's running the hudson app on port 
> 8080.
>
> I'll try to update things to link/redirect to...
>
> http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/artifact/
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Nightly source builds of Lucene ..

Posted by Chris Hostetter <ho...@fucit.org>.

: maybe try:
: 
: http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/

I believe Kay was asking about Lucene-Java nightly builds.

: > I am trying to access the nightly lucene builds here at -
: > http://people.apache.org/builds/lucene/java/nightly/  .  It does not seem to
: > be available for sometime.  Just curious if that is the right source to

That's an old URL that no longer works, if you found a link to it 
somewhere on an apache.org site please let us know so we can fix it.

The problem is it's currently setup to redirect you here...
   http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
...this is the same URL that's currently linked to from the developer's 
resource page...
   http://lucene.apache.org/java/docs/developer-resources.html
...but unfortunately it doesn't work.  I believe the problem is that now 
that we have a central hudson system,even though the lucene zone is still 
doing lucene builds, i don't think it's running the hudson app on port 
8080.

I'll try to update things to link/redirect to...

http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/artifact/



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Nightly source builds of Lucene ..

Posted by Ryan McKinley <ry...@gmail.com>.

maybe try:

http://hudson.zones.apache.org/hudson/view/Solr/job/Solr-trunk/



On Jan 16, 2009, at 4:47 PM, Kay Kay wrote:

> I am trying to access the nightly lucene builds here at -  http://people.apache.org/builds/lucene/java/nightly/ 
>   .  It does not seem to be available for sometime.  Just curious if  
> that is the right source to access the same.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org