You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Roy Teeuwen <ro...@teeuwen.be> on 2018/06/26 18:13:28 UTC

Oak - Creating second fulltext lucene index

Hey guys,

I have the following situation:

- I have a sentence, for example "This is my %%sentence%% I want to search for" and I would like to query for the term %%sentence%% (could be in any property)
- The default built-in full text oak lucene index uses the Standard Analyzer, which parses out the %%.
- I could use the WhitespaceAnalyzer, which would index the sentence as This, is, my , %%sentence%%, I, ...
- By using this analyzer but also putting it on nt:base, I would go against the docs which state I should not create a second index that also uses the same nodetype

So my question here is, how could I solve this? Is it possible to specify that this index should not be used for normal querying, and then doing a rep-native where I specify the functionName of that specific index so that I can query it anyway? Any other solutions would be helpful too

Thanks,
Roy

Re: Oak - Creating second fulltext lucene index

Posted by Roy Teeuwen <ro...@teeuwen.be>.

Hey all,

So that anyone who ever searches for this also has the answer (i'm currently using the second option I mentioned). I got it working by creating the following IndexFieldProvider as a POC, and it seems that it adds the values to the default "lucene" index, so I can do the following query: //*[rep:native('lucene', 'inlinevariable:sentence')]

@Component
public class InlineVariableIndexFieldProvider implements IndexFieldProvider {

    @Override
    public Iterable<Field> getAugmentedFields(String s, NodeState document, NodeState indexDefinition) {
        Set<Field> fields = Sets.newHashSet();

        for (PropertyState property : document.getProperties()) {
            if (property.getType().equals(Type.STRING)) {
                String value = property.getValue(Type.STRING);
                addInlineVariables(value, fields);
            }
        }

        return fields;
    }

    private void addInlineVariables(String value, Set<Field> fields) {
        Pattern pattern = Pattern.compile("%%(.*?)%%");
        Matcher matcher = pattern.matcher(value);
        while (matcher.find()) {
            fields.add(new StringField("inlinevariable", matcher.group(1), Field.Store.NO));
        }
    }

    @Override
    public Set<String> getSupportedTypes() {
        Set<String> supportedTypes = new HashSet<>();
        supportedTypes.add("nt:unstructured");
        return supportedTypes;
    }
}
Greets,
Roy

> On 13 Jul 2018, at 13:20, Roy Teeuwen <ro...@teeuwen.be> wrote:
> 
> Hey Thomas,
> 
> Thanks for the reply! How would I make sure the cost is always higher?
> 
> There is also second option that I was thinking about, correct me if I'm wrong please:
> 
> - Create a new lucene property index that searches for a nonexisting property
> - Create an IndexFieldProvider class that parses out the %%sentence%% from a NodeState
> - Save the value in a field name only when the indexDefinition NodeState is the newly created lucene property index
> - Do a native query to this specific lucene index for the field name
> 
> Would this work?
> 
> Thanks,
> Roy
> 
>> On 13 Jul 2018, at 12:28, Thomas Mueller <mu...@adobe.com.INVALID> wrote:
>> 
>> Hi,
>> 
>> You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So: 
>> 
>> * add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
>> * in this second index, set the tag "myindex"
>> * in the query, use "option(index tagged myindex)"
>> 
>> That way, only your query will use that index, and the other queries will use the (lower cost) default index.
>> 
>> Regards,
>> Thomas
>> 
>> 
>

Re: Oak - Creating second fulltext lucene index

Posted by Roy Teeuwen <ro...@teeuwen.be>.

Hey Thomas,

Thanks for the reply! How would I make sure the cost is always higher?

There is also second option that I was thinking about, correct me if I'm wrong please:

- Create a new lucene property index that searches for a nonexisting property
- Create an IndexFieldProvider class that parses out the %%sentence%% from a NodeState
- Save the value in a field name only when the indexDefinition NodeState is the newly created lucene property index
- Do a native query to this specific lucene index for the field name

Would this work?

Thanks,
Roy

> On 13 Jul 2018, at 12:28, Thomas Mueller <mu...@adobe.com.INVALID> wrote:
> 
> Hi,
> 
> You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So: 
> 
> * add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
> * in this second index, set the tag "myindex"
> * in the query, use "option(index tagged myindex)"
> 
> That way, only your query will use that index, and the other queries will use the (lower cost) default index.
> 
> Regards,
> Thomas
> 
>

Re: Oak - Creating second fulltext lucene index

Posted by Thomas Mueller <mu...@adobe.com.INVALID>.

Hi,

You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So: 

* add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
* in this second index, set the tag "myindex"
* in the query, use "option(index tagged myindex)"

That way, only your query will use that index, and the other queries will use the (lower cost) default index.

Regards,
Thomas

Re: Oak - Creating second fulltext lucene index

Posted by Roy Teeuwen <ro...@teeuwen.be>.

No one who could help me out on this issue? 

> On 26 Jun 2018, at 20:13, Roy Teeuwen <ro...@teeuwen.be> wrote:
> 
> Hey guys,
> 
> I have the following situation:
> 
> - I have a sentence, for example "This is my %%sentence%% I want to search for" and I would like to query for the term %%sentence%% (could be in any property)
> - The default built-in full text oak lucene index uses the Standard Analyzer, which parses out the %%.
> - I could use the WhitespaceAnalyzer, which would index the sentence as This, is, my , %%sentence%%, I, ...
> - By using this analyzer but also putting it on nt:base, I would go against the docs which state I should not create a second index that also uses the same nodetype
> 
> So my question here is, how could I solve this? Is it possible to specify that this index should not be used for normal querying, and then doing a rep-native where I specify the functionName of that specific index so that I can query it anyway? Any other solutions would be helpful too
> 
> Thanks,
> Roy