You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Greg Gershman <gr...@yahoo.com> on 2007/03/07 16:07:45 UTC
Negative Filtering (such as for profanity)
I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND -@#$% AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I believe the bit set returned by a purely negative QueryFilter is empty, so no matter how many results returned by the initial query, the result after filtering is always zero documents.
I was wondering if anyone had suggestions as to how else to do this. I've considered simply amending the query string submitted by the user to include a pre-generated String that would exclude the query terms, but I consider this a non-elegant solution. I had also thought about creating a new sub-class of QueryFilter, NegativeQueryFilter. Basically, it would works just like a QueryFilter, taking a positive query (so, I would pass it an OR'ed list of profane words), then the resulting bits are simply flipped. I think this would work, unless I'm missing something. I'm going to experiment with it, I'd appreciate anyone's thoughts on this.
Thanks,
Greg
____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
Re: Negative Filtering (such as for profanity)
Posted by Paul Elschot <pa...@xs4all.nl>.
On Wednesday 07 March 2007 16:07, Greg Gershman wrote:
> I'm attempting to create a profanity filter. I thought to use a QueryFilter
created with a Query of (-$#!+ AND -@#$% AND etc). The problem I have run
into is that, as a pure negative query is not supported (a query for (-term)
DOES NOT return the inverse of a query for (term)), I believe the bit set
returned by a purely negative QueryFilter is empty, so no matter how many
results returned by the initial query, the result after filtering is always
zero documents.
It may have been suggested already: you can create a Filter
for the positive Query, and then invert the bits in the BitSet of the Filter
to have negative filtering.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negative Filtering (such as for profanity)
Posted by Mark Miller <ma...@gmail.com>.
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/MatchAllDocsQuery.html
You can use that Query in front of a NOT query clause.
Greg Gershman wrote:
> I'm attempting to create a profanity filter. I thought to use a QueryFilter created with a Query of (-$#!+ AND -@#$% AND etc). The problem I have run into is that, as a pure negative query is not supported (a query for (-term) DOES NOT return the inverse of a query for (term)), I believe the bit set returned by a purely negative QueryFilter is empty, so no matter how many results returned by the initial query, the result after filtering is always zero documents.
>
> I was wondering if anyone had suggestions as to how else to do this. I've considered simply amending the query string submitted by the user to include a pre-generated String that would exclude the query terms, but I consider this a non-elegant solution. I had also thought about creating a new sub-class of QueryFilter, NegativeQueryFilter. Basically, it would works just like a QueryFilter, taking a positive query (so, I would pass it an OR'ed list of profane words), then the resulting bits are simply flipped. I think this would work, unless I'm missing something. I'm going to experiment with it, I'd appreciate anyone's thoughts on this.
>
> Thanks,
>
> Greg
>
>
>
>
>
> ____________________________________________________________________________________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Negative Filtering (such as for profanity)
Posted by Grant Ingersoll <gs...@apache.org>.
Not sure if this helpful given your proposed solution, but could you
do something on the indexing side, such as:
1. Remove the profanity from the token stream, much like a
stopword. This would also mean stripping it from the display text
2. If your TokenFilter comes across a profanity, somehow mark the
document as containing a profanity via a "profanity" Field (not sure
if there is a way, in Lucene, to add another Field while you are in
the analysis phase, but you could also have it update a table in a db
or something.) Then on search, you could just say (regular query)
+profanity:false
HTH,
Grant
On Mar 7, 2007, at 10:07 AM, Greg Gershman wrote:
> I'm attempting to create a profanity filter. I thought to use a
> QueryFilter created with a Query of (-$#!+ AND -@#$% AND etc). The
> problem I have run into is that, as a pure negative query is not
> supported (a query for (-term) DOES NOT return the inverse of a
> query for (term)), I believe the bit set returned by a purely
> negative QueryFilter is empty, so no matter how many results
> returned by the initial query, the result after filtering is always
> zero documents.
>
> I was wondering if anyone had suggestions as to how else to do
> this. I've considered simply amending the query string submitted
> by the user to include a pre-generated String that would exclude
> the query terms, but I consider this a non-elegant solution. I had
> also thought about creating a new sub-class of QueryFilter,
> NegativeQueryFilter. Basically, it would works just like a
> QueryFilter, taking a positive query (so, I would pass it an OR'ed
> list of profane words), then the resulting bits are simply
> flipped. I think this would work, unless I'm missing something.
> I'm going to experiment with it, I'd appreciate anyone's thoughts
> on this.
>
> Thanks,
>
> Greg
>
>
>
>
>
> ______________________________________________________________________
> ______________
> It's here! Your new message!
> Get new email alerts with the free Yahoo! Toolbar.
> http://tools.search.yahoo.com/toolbar/features/mail/
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org