You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Amin Mohammed-Coleman <am...@gmail.com> on 2009/09/04 11:17:05 UTC

Filtering question/advice

Hi,

I am looking at applying a security filter for our lucene document and I was
wondering if I could get feedback on whether the solution I have come up
with.  Firstly I will explain the scenario and followed by the proposed
solution:


We have a concept of a Layer which is a project whereby a broker can trade
with underwriters.  A layer can have more than one underwriter working on
this project therefore both underwriters can search for the same layer.  The
issue is the following:

UWA signs business on a Layer L1 using a reference 'HELLO'

UWB signs business on the same Layer L1 using a reference 'BYE'

Both Underwriters are legitimately allowed to access the Layer L1 so the
security rules will not remove any search hits for L1. However, if UWB
searches for text 'HELLO' he should not get L1 in his search results as he
is not to know that L1 includes a writer reference HELLO for UWA. In the
simple case he will see this result.  Now this is not acceptable for our
case.

The proposed solution is that we do the following:

Document:
uw-reference = HELLO
uw-reference = BYE

With additional field like

uw-uwa = HELLO
uw-uwb = BYE

So when UWB performs a search of "HELLO" there will be an additional filter
applied which would be like "uw-uwb:HELLO" so the final query would be like:

uw-reference:HELLO + (uw-uwb:HELLO) (approximately)

Th

I created a test case to test this solution and it works. The problem is
that if UWB searches for "HELLO" that exists in another field such as:
data:HELLO then he should get a result. It's only when the query is matched
on reference he should not see anything.  My testcase fails when the match
is made on the data field as the security filter does not pass (valid
filter).  Is there a way around this?  Hope this made sense!

Any advice would be highly appreciated

Re: Filtering question/advice

Posted by Amin Mohammed-Coleman <am...@gmail.com>.
Hi,

Apologies for resending this email but just wondering if I could get some
input on the below.  I am in the final stages of getting a proof of concept
together and this is the final piece of the puzzle.

Sorry again for sending this!

Cheers
Amin

On Fri, Sep 4, 2009 at 10:38 AM, Amin Mohammed-Coleman <am...@gmail.com>wrote:

> Hi
> I include a testcase to show what I am trying to do.  Testcase number 3
> fails.
>
> Thanks
> Amin
>
>
> On Fri, Sep 4, 2009 at 10:17 AM, Amin Mohammed-Coleman <am...@gmail.com>wrote:
>
>> Hi,
>>
>> I am looking at applying a security filter for our lucene document and I
>> was wondering if I could get feedback on whether the solution I have come up
>> with.  Firstly I will explain the scenario and followed by the proposed
>> solution:
>>
>>
>> We have a concept of a Layer which is a project whereby a broker can trade
>> with underwriters.  A layer can have more than one underwriter working on
>> this project therefore both underwriters can search for the same layer.  The
>> issue is the following:
>>
>> UWA signs business on a Layer L1 using a reference 'HELLO'
>>
>> UWB signs business on the same Layer L1 using a reference 'BYE'
>>
>> Both Underwriters are legitimately allowed to access the Layer L1 so the
>> security rules will not remove any search hits for L1. However, if UWB
>> searches for text 'HELLO' he should not get L1 in his search results as he
>> is not to know that L1 includes a writer reference HELLO for UWA. In the
>> simple case he will see this result.  Now this is not acceptable for our
>> case.
>>
>> The proposed solution is that we do the following:
>>
>> Document:
>> uw-reference = HELLO
>> uw-reference = BYE
>>
>> With additional field like
>>
>> uw-uwa = HELLO
>> uw-uwb = BYE
>>
>> So when UWB performs a search of "HELLO" there will be an additional
>> filter applied which would be like "uw-uwb:HELLO" so the final query would
>> be like:
>>
>> uw-reference:HELLO + (uw-uwb:HELLO) (approximately)
>>
>> Th
>>
>> I created a test case to test this solution and it works. The problem is
>> that if UWB searches for "HELLO" that exists in another field such as:
>> data:HELLO then he should get a result. It's only when the query is
>> matched on reference he should not see anything.  My testcase fails when the
>> match is made on the data field as the security filter does not pass (valid
>> filter).  Is there a way around this?  Hope this made sense!
>>
>> Any advice would be highly appreciated
>>
>
>

Re: Filtering question/advice

Posted by Amin Mohammed-Coleman <am...@gmail.com>.
Hi
Sorry for not getting back to you.  Been swamped with stuff and work and
home.  Just managed to check my lucene emails!

You are right i made some silly mistakes with the testcase and have updated
accordingly.  The test is still failing but the properties are set
correctly:

public class UnderwriterReferenceTest {
    private Directory directory;
    private Analyzer analyzer;
    private IndexSearcher indexSearcher;
    private IndexWriter indexWriter;
    private Document layerDocumentA;


    @Before
    public void setUp() throws Exception {
        directory = new RAMDirectory();
        analyzer = new StandardAnalyzer();
        indexWriter = new IndexWriter(directory, analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);

    }

   @Before
   public void setUpDocumentsForIndexing() throws Exception {

       Field uw1 = new Field("uw-refernce", "hello", Field.Store.NO,
Field.Index.ANALYZED);
       Field uw2 = new Field("uw-refernce", "bye", Field.Store.NO,
Field.Index.ANALYZED);


       Field uw1UWA = new Field("uw-uwa", "hello", Field.Store.NO,
Field.Index.ANALYZED);
       Field uw2UWB = new Field("uw-uwb", "bye", Field.Store.NO,
Field.Index.ANALYZED);


       layerDocumentA = new Document();
       layerDocumentA.add(uw1);
       layerDocumentA.add(uw2);
       layerDocumentA.add(uw1UWA);
       layerDocumentA.add(uw2UWB);
   }


    @Test
    public void
testUWBCannotSeeResultIfUWReferenceIsSearchThatDoesNotBelongToUWB() throws
Exception {
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);

        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();
        filter.setUwId("uwb");
        filter.setTermValue("hello");

        QueryParser queryParser = new QueryParser("uw-refernce", analyzer);
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();
        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(0, topDocs.totalHits);
    }


    @Test
    public void
testUWACanSeeResultIfUWReferenceIsSearchThatDoesBelongToUWA() throws
Exception {
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);

        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();
        filter.setUwId("uwa");
        filter.setTermValue("hello");

        QueryParser queryParser = new QueryParser("uw-refernce", analyzer);
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();
        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(1, topDocs.totalHits);
    }

    @Test
    public void testUWBCanSeeResultIfSearchTermMatchesOnSomethingElse()
throws Exception {

        Field data = new Field("data", "hello", Field.Store.NO,
Field.Index.ANALYZED);
        layerDocumentA.add(data);
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);



        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();
        filter.setTermValue("hello");
        filter.setUwId("uwb");

        MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new
String[]{"uw-refernce", "data"}, analyzer);
        queryParser.enable_tracing();
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();



        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(1, topDocs.totalHits);

    }

   private class UnderwriterReferenceFilter extends Filter {

    private String uwId;
    private String termValue;

    public void setTermValue(String termValue) {
       this.termValue = termValue;
    }

    public void setUwId(String uwId) {
        this.uwId = uwId;
    }

    @Override
    public DocIdSet getDocIdSet(org.apache.lucene.index.IndexReader reader)
throws java.io.IOException  {
        if (uwId == null || "".equals(uwId)) {
            throw new IllegalArgumentException("uwidnot set for filtering");
        }

        OpenBitSet bitSet = new OpenBitSet( reader.maxDoc());
        Term term = new Term("uw-"+uwId, termValue);
        TermDocs termDocs = reader.termDocs( term );
        while ( termDocs.next() ) {
            bitSet.set( termDocs.doc() );
        }

        return bitSet;

    }
   }

}


I did the test using the grouping mechanism that you mentioned which worked
however I just wanted to know whether the same can be achieved with filters.


Thanks again!

Amin

On Thu, Sep 17, 2009 at 11:59 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> FWWI: a test case with multiple asserts is more useful if you clarify
> where it failes ... ie: show us the failure message, or put a comment on
> athe line of the assert that fails.
>
> i didn't run your testcase, but skimming it a few things jumpt out at me
> that might explain whatever problem you are seeing...
>
> :        Field uw1 = new Field("uw-refernce", "hello", Field.Store.NO,
> : Field.Index.ANALYZED);
> :        Field uw2 = new Field("uw-refernce", "bye", Field.Store.NO,
> : Field.Index.ANALYZED);
>                 ...
> :        layerDocumentA = new Document();
> :        layerDocumentA.add(uw1);
> :        layerDocumentA.add(uw1);
>
> ...did you really mean to add uw1 twice? or did you mean to add uw2 as
> well (it's never used)...
>
> :     public void testUWBCanSeeResultIfSearchTermMatchesOnSomethingElse()
> : throws Exception {
>                ...
> :         UnderwriterReferenceFilter filter = new
> : UnderwriterReferenceFilter();
>
> ...you never set any properties on this Filter before you use it. reading
> it's implementation, that should cause an IllegalArgumentException.
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Filtering question/advice

Posted by Chris Hostetter <ho...@fucit.org>.
FWWI: a test case with multiple asserts is more useful if you clarify 
where it failes ... ie: show us the failure message, or put a comment on 
athe line of the assert that fails.

i didn't run your testcase, but skimming it a few things jumpt out at me 
that might explain whatever problem you are seeing...

:        Field uw1 = new Field("uw-refernce", "hello", Field.Store.NO,
: Field.Index.ANALYZED);
:        Field uw2 = new Field("uw-refernce", "bye", Field.Store.NO,
: Field.Index.ANALYZED);
		...
:        layerDocumentA = new Document();
:        layerDocumentA.add(uw1);
:        layerDocumentA.add(uw1);

...did you really mean to add uw1 twice? or did you mean to add uw2 as 
well (it's never used)...

:     public void testUWBCanSeeResultIfSearchTermMatchesOnSomethingElse()
: throws Exception {
		...
:         UnderwriterReferenceFilter filter = new
: UnderwriterReferenceFilter();

...you never set any properties on this Filter before you use it. reading 
it's implementation, that should cause an IllegalArgumentException.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filtering question/advice

Posted by Amin Mohammed-Coleman <am...@gmail.com>.
Hi

Thanks for your reponse.  Here is the following testcase:

public class UnderwriterReferenceTest {
    private Directory directory;
    private Analyzer analyzer;
    private IndexSearcher indexSearcher;
    private IndexWriter indexWriter;
    private Document layerDocumentA;


    @Before
    public void setUp() throws Exception {
        directory = new RAMDirectory();
        analyzer = new StandardAnalyzer();
        indexWriter = new IndexWriter(directory, analyzer,
IndexWriter.MaxFieldLength.UNLIMITED);

    }

   @Before
   public void setUpDocumentsForIndexing() throws Exception {

       Field uw1 = new Field("uw-refernce", "hello", Field.Store.NO,
Field.Index.ANALYZED);
       Field uw2 = new Field("uw-refernce", "bye", Field.Store.NO,
Field.Index.ANALYZED);


       Field uw1UWA = new Field("uw-uwa", "hello", Field.Store.NO,
Field.Index.ANALYZED);
       Field uw2UWB = new Field("uw-uwb", "bye", Field.Store.NO,
Field.Index.ANALYZED);


       layerDocumentA = new Document();
       layerDocumentA.add(uw1);
       layerDocumentA.add(uw1);
       layerDocumentA.add(uw1UWA);
       layerDocumentA.add(uw2UWB);
   }


    @Test
    public void
testUWBCannotSeeResultIfUWReferenceIsSearchThatDoesNotBelongToUWB() throws
Exception {
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);

        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();
        filter.setUwId("uwb");
        filter.setTermValue("hello");

        QueryParser queryParser = new QueryParser("uw-refernce", analyzer);
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();
        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(0, topDocs.totalHits);
    }


    @Test
    public void
testUWACanSeeResultIfUWReferenceIsSearchThatDoesBelongToUWA() throws
Exception {
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);

        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();
        filter.setUwId("uwa");
        filter.setTermValue("hello");

        QueryParser queryParser = new QueryParser("uw-refernce", analyzer);
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();
        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(1, topDocs.totalHits);
    }

    @Test
    public void testUWBCanSeeResultIfSearchTermMatchesOnSomethingElse()
throws Exception {

        Field data = new Field("data", "hello", Field.Store.NO,
Field.Index.ANALYZED);
        layerDocumentA.add(data);
        indexWriter.addDocument(layerDocumentA);
        indexWriter.commit();
        indexWriter.close();

        indexSearcher = new IndexSearcher(directory);



        UnderwriterReferenceFilter filter = new
UnderwriterReferenceFilter();

        MultiFieldQueryParser queryParser = new MultiFieldQueryParser(new
String[]{"uw-refernce", "data"}, analyzer);
        queryParser.enable_tracing();
        Query q = queryParser.parse("hello");
        long start = System.currentTimeMillis();



        TopDocs topDocs = indexSearcher.search(q, filter, 100);
        long end = System.currentTimeMillis();
        System.out.println("Total time taken to search = " + (end - start) +
" milli seconds");
        System.out.println("Total results returned = " + topDocs.totalHits);
        assertNotNull(topDocs);
        assertEquals(1, topDocs.totalHits);

    }

   private class UnderwriterReferenceFilter extends Filter {

    private String uwId;
    private String termValue;

    public void setTermValue(String termValue) {
       this.termValue = termValue;
    }

    public void setUwId(String uwId) {
        this.uwId = uwId;
    }

    @Override
    public DocIdSet getDocIdSet(org.apache.lucene.index.IndexReader reader)
throws java.io.IOException  {
        if (uwId == null || "".equals(uwId)) {
            throw new IllegalArgumentException("uwidnot set for filtering");
        }

        OpenBitSet bitSet = new OpenBitSet( reader.maxDoc());
        Term term = new Term("uw-"+uwId, termValue);
        TermDocs termDocs = reader.termDocs( term );
        while ( termDocs.next() ) {
            bitSet.set( termDocs.doc() );
        }

        return bitSet;

    }
   }

}

As you can see I am using a filter to filter out results based on uwId.  Is
it possible to use the grouping that you mentioned within a filter or should
be implemented some other way for example appending the grouping concept
after recieveing the query from the user?


Thanks again for your input!

Cheers
Amin


On Tue, Sep 8, 2009 at 11:24 PM, Chris Hostetter
<ho...@fucit.org>wrote:

> : Hi
> : I include a testcase to show what I am trying to do.  Testcase number 3
> : fails.
>
> the mailing list is finicky about attachments ... the best thing to do is
> to include your test case directly in the body of your email as plain
> text.
>
> : > I created a test case to test this solution and it works. The problem
> is
> : > that if UWB searches for "HELLO" that exists in another field such as:
> : > data:HELLO then he should get a result. It's only when the query is
> matched
> : > on reference he should not see anything.  My testcase fails when the
> match
> : > is made on the data field as the security filter does not pass (valid
> : > filter).  Is there a way around this?  Hope this made sense!
>
> it sounds like perhaps you just need to group your clauses...
>
>  data:HELLO (uw-reference:HELLO +uw-uwb:HELLO)
>
> ...if HELLO is in data, you'll get a match regardless of the other fields.
> if it's not in data, but it is in uw-reference, it will only match if it's
> also in uw-uwb.
>
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Filtering question/advice

Posted by Chris Hostetter <ho...@fucit.org>.
: Hi
: I include a testcase to show what I am trying to do.  Testcase number 3
: fails.

the mailing list is finicky about attachments ... the best thing to do is 
to include your test case directly in the body of your email as plain 
text.

: > I created a test case to test this solution and it works. The problem is
: > that if UWB searches for "HELLO" that exists in another field such as:
: > data:HELLO then he should get a result. It's only when the query is matched
: > on reference he should not see anything.  My testcase fails when the match
: > is made on the data field as the security filter does not pass (valid
: > filter).  Is there a way around this?  Hope this made sense!

it sounds like perhaps you just need to group your clauses...

  data:HELLO (uw-reference:HELLO +uw-uwb:HELLO)

...if HELLO is in data, you'll get a match regardless of the other fields.  
if it's not in data, but it is in uw-reference, it will only match if it's 
also in uw-uwb.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Filtering question/advice

Posted by Amin Mohammed-Coleman <am...@gmail.com>.
Hi
I include a testcase to show what I am trying to do.  Testcase number 3
fails.

Thanks
Amin

On Fri, Sep 4, 2009 at 10:17 AM, Amin Mohammed-Coleman <am...@gmail.com>wrote:

> Hi,
>
> I am looking at applying a security filter for our lucene document and I
> was wondering if I could get feedback on whether the solution I have come up
> with.  Firstly I will explain the scenario and followed by the proposed
> solution:
>
>
> We have a concept of a Layer which is a project whereby a broker can trade
> with underwriters.  A layer can have more than one underwriter working on
> this project therefore both underwriters can search for the same layer.  The
> issue is the following:
>
> UWA signs business on a Layer L1 using a reference 'HELLO'
>
> UWB signs business on the same Layer L1 using a reference 'BYE'
>
> Both Underwriters are legitimately allowed to access the Layer L1 so the
> security rules will not remove any search hits for L1. However, if UWB
> searches for text 'HELLO' he should not get L1 in his search results as he
> is not to know that L1 includes a writer reference HELLO for UWA. In the
> simple case he will see this result.  Now this is not acceptable for our
> case.
>
> The proposed solution is that we do the following:
>
> Document:
> uw-reference = HELLO
> uw-reference = BYE
>
> With additional field like
>
> uw-uwa = HELLO
> uw-uwb = BYE
>
> So when UWB performs a search of "HELLO" there will be an additional filter
> applied which would be like "uw-uwb:HELLO" so the final query would be like:
>
> uw-reference:HELLO + (uw-uwb:HELLO) (approximately)
>
> Th
>
> I created a test case to test this solution and it works. The problem is
> that if UWB searches for "HELLO" that exists in another field such as:
> data:HELLO then he should get a result. It's only when the query is matched
> on reference he should not see anything.  My testcase fails when the match
> is made on the data field as the security filter does not pass (valid
> filter).  Is there a way around this?  Hope this made sense!
>
> Any advice would be highly appreciated
>