You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Karthick Sundaram <ka...@trigent.com.INVALID> on 2020/01/22 21:12:11 UTC

Can Lucene be used as Rules Engine?

Gentlemen:

 

I am using Lucene as search engine for the below requirement:

 

Millions of documents (text files) are there.

Each text file has thousands of words (plain Strings with space separated).

Example content of a text file 1 (just showing few words): 0001AAA 0001AAB
0001AAC 0061000 PSBP06 MFBP05 ...

Example content of a text file 2 (just showing few words): 0001AAX 0001AAB
0001AAN 0061002 PSBP07 MFBP06 ...

 

Then there are millions of rules captured in the database. For easy
understanding, I specify couple of rules below:

 

Rule 1:

CONDITION 1: WITH: 0001AAA OR 0001AAC

CONDITION 2: WITH: PSBP06 OR PSBP07

CONDITION 3: WITH: MFBP05

 

Rule 2:

CONDITION 1: WITH: 0001AAN OR 0001AAC

CONDITION 2: WITH: PSBP06

CONDITION 3: WITH: PSBP08

CONDITION 4: NOT WITH: MFBP05

 

Requirement is, for a given rule, find the text files matching at least one
word in each condition of the rule

I indexed the contents of each text file as a Lucene document with a Field
"FileContents" and another field to just store the file name

So, for the Rule 1, I constructed query as (0001AAA OR 0001AAC) AND (PSBP06
OR PSBP07) AND (MFBP05)

And for Rule 2, the query is (0001AAN OR 0001AAC) AND (PSBP06) AND (PSBP08)
AND NOT (MFBP05).

 

Queries are working and able to find the appropriate text files.

 

Now, I have another requirement which is reverse of above requirement.

i.e., For the given text file, I need to find the list of Rules that can
match.

Example: For the text file 1, the "Rule 1" should match, because the text
file 1 has 0001AAA which satisfies condition 1, PSBP06 will satisfies
condition 2, MFBP05 will satisfy condition 3.

Rule 1 has 3 conditions and at least one word in each condition matches for
text file 1. So Rule 1 is good for text file 1.

Rule 2 should not match for text file 1 because PSBP08 is not there in it.

 

I don't know whether i can index the "Rule" information in Lucene. A rule
can have 1 or more conditions, so I can't use fixed number of Fields to
query on. Even if there are fixed number of fields, the query has to check
for each field to match at least one word.

Is it possible to handle this requirement using Lucene? or should I go for
other options?

I am new to Lucene, any help would be appreciated.

 

Thanks,

Kart


Re: Can Lucene be used as Rules Engine?

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Kart.
I still don't fully get the problem. But usually implementing Rule Engine
requires to use
https://lucene.apache.org/core/7_3_1/sandbox/org/apache/lucene/search/CoveringQuery.html
which
check number of rule clauses in a dedicated field.

On Thu, Jan 23, 2020 at 12:12 AM Karthick Sundaram
<ka...@trigent.com.invalid> wrote:

> Gentlemen:
>
>
>
> I am using Lucene as search engine for the below requirement:
>
>
>
> Millions of documents (text files) are there.
>
> Each text file has thousands of words (plain Strings with space separated).
>
> Example content of a text file 1 (just showing few words): 0001AAA 0001AAB
> 0001AAC 0061000 PSBP06 MFBP05 ...
>
> Example content of a text file 2 (just showing few words): 0001AAX 0001AAB
> 0001AAN 0061002 PSBP07 MFBP06 ...
>
>
>
> Then there are millions of rules captured in the database. For easy
> understanding, I specify couple of rules below:
>
>
>
> Rule 1:
>
> CONDITION 1: WITH: 0001AAA OR 0001AAC
>
> CONDITION 2: WITH: PSBP06 OR PSBP07
>
> CONDITION 3: WITH: MFBP05
>
>
>
> Rule 2:
>
> CONDITION 1: WITH: 0001AAN OR 0001AAC
>
> CONDITION 2: WITH: PSBP06
>
> CONDITION 3: WITH: PSBP08
>
> CONDITION 4: NOT WITH: MFBP05
>
>
>
> Requirement is, for a given rule, find the text files matching at least one
> word in each condition of the rule
>
> I indexed the contents of each text file as a Lucene document with a Field
> "FileContents" and another field to just store the file name
>
> So, for the Rule 1, I constructed query as (0001AAA OR 0001AAC) AND (PSBP06
> OR PSBP07) AND (MFBP05)
>
> And for Rule 2, the query is (0001AAN OR 0001AAC) AND (PSBP06) AND (PSBP08)
> AND NOT (MFBP05).
>
>
>
> Queries are working and able to find the appropriate text files.
>
>
>
> Now, I have another requirement which is reverse of above requirement.
>
> i.e., For the given text file, I need to find the list of Rules that can
> match.
>
> Example: For the text file 1, the "Rule 1" should match, because the text
> file 1 has 0001AAA which satisfies condition 1, PSBP06 will satisfies
> condition 2, MFBP05 will satisfy condition 3.
>
> Rule 1 has 3 conditions and at least one word in each condition matches for
> text file 1. So Rule 1 is good for text file 1.
>
> Rule 2 should not match for text file 1 because PSBP08 is not there in it.
>
>
>
> I don't know whether i can index the "Rule" information in Lucene. A rule
> can have 1 or more conditions, so I can't use fixed number of Fields to
> query on. Even if there are fixed number of fields, the query has to check
> for each field to match at least one word.
>
> Is it possible to handle this requirement using Lucene? or should I go for
> other options?
>
> I am new to Lucene, any help would be appreciated.
>
>
>
> Thanks,
>
> Kart
>
>

-- 
Sincerely yours
Mikhail Khludnev