You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Roy Teeuwen <ro...@teeuwen.be> on 2018/06/26 18:13:28 UTC
Oak - Creating second fulltext lucene index
Hey guys,
I have the following situation:
- I have a sentence, for example "This is my %%sentence%% I want to search for" and I would like to query for the term %%sentence%% (could be in any property)
- The default built-in full text oak lucene index uses the Standard Analyzer, which parses out the %%.
- I could use the WhitespaceAnalyzer, which would index the sentence as This, is, my , %%sentence%%, I, ...
- By using this analyzer but also putting it on nt:base, I would go against the docs which state I should not create a second index that also uses the same nodetype
So my question here is, how could I solve this? Is it possible to specify that this index should not be used for normal querying, and then doing a rep-native where I specify the functionName of that specific index so that I can query it anyway? Any other solutions would be helpful too
Thanks,
Roy
Re: Oak - Creating second fulltext lucene index
Posted by Roy Teeuwen <ro...@teeuwen.be>.
Hey all,
So that anyone who ever searches for this also has the answer (i'm currently using the second option I mentioned). I got it working by creating the following IndexFieldProvider as a POC, and it seems that it adds the values to the default "lucene" index, so I can do the following query: //*[rep:native('lucene', 'inlinevariable:sentence')]
@Component
public class InlineVariableIndexFieldProvider implements IndexFieldProvider {
@Override
public Iterable<Field> getAugmentedFields(String s, NodeState document, NodeState indexDefinition) {
Set<Field> fields = Sets.newHashSet();
for (PropertyState property : document.getProperties()) {
if (property.getType().equals(Type.STRING)) {
String value = property.getValue(Type.STRING);
addInlineVariables(value, fields);
}
}
return fields;
}
private void addInlineVariables(String value, Set<Field> fields) {
Pattern pattern = Pattern.compile("%%(.*?)%%");
Matcher matcher = pattern.matcher(value);
while (matcher.find()) {
fields.add(new StringField("inlinevariable", matcher.group(1), Field.Store.NO));
}
}
@Override
public Set<String> getSupportedTypes() {
Set<String> supportedTypes = new HashSet<>();
supportedTypes.add("nt:unstructured");
return supportedTypes;
}
}
Greets,
Roy
> On 13 Jul 2018, at 13:20, Roy Teeuwen <ro...@teeuwen.be> wrote:
>
> Hey Thomas,
>
> Thanks for the reply! How would I make sure the cost is always higher?
>
> There is also second option that I was thinking about, correct me if I'm wrong please:
>
> - Create a new lucene property index that searches for a nonexisting property
> - Create an IndexFieldProvider class that parses out the %%sentence%% from a NodeState
> - Save the value in a field name only when the indexDefinition NodeState is the newly created lucene property index
> - Do a native query to this specific lucene index for the field name
>
> Would this work?
>
> Thanks,
> Roy
>
>> On 13 Jul 2018, at 12:28, Thomas Mueller <mu...@adobe.com.INVALID> wrote:
>>
>> Hi,
>>
>> You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So:
>>
>> * add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
>> * in this second index, set the tag "myindex"
>> * in the query, use "option(index tagged myindex)"
>>
>> That way, only your query will use that index, and the other queries will use the (lower cost) default index.
>>
>> Regards,
>> Thomas
>>
>>
>
Re: Oak - Creating second fulltext lucene index
Posted by Roy Teeuwen <ro...@teeuwen.be>.
Hey Thomas,
Thanks for the reply! How would I make sure the cost is always higher?
There is also second option that I was thinking about, correct me if I'm wrong please:
- Create a new lucene property index that searches for a nonexisting property
- Create an IndexFieldProvider class that parses out the %%sentence%% from a NodeState
- Save the value in a field name only when the indexDefinition NodeState is the newly created lucene property index
- Do a native query to this specific lucene index for the field name
Would this work?
Thanks,
Roy
> On 13 Jul 2018, at 12:28, Thomas Mueller <mu...@adobe.com.INVALID> wrote:
>
> Hi,
>
> You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So:
>
> * add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
> * in this second index, set the tag "myindex"
> * in the query, use "option(index tagged myindex)"
>
> That way, only your query will use that index, and the other queries will use the (lower cost) default index.
>
> Regards,
> Thomas
>
>
Re: Oak - Creating second fulltext lucene index
Posted by Thomas Mueller <mu...@adobe.com.INVALID>.
Hi,
You could use a tag (see http://jackrabbit.apache.org/oak/docs/query/query-engine.html#Query_Option_Index_Tag). So:
* add the second index with a higher cost than the original index (e.g. using a high costPerExecution / costPerEntry)
* in this second index, set the tag "myindex"
* in the query, use "option(index tagged myindex)"
That way, only your query will use that index, and the other queries will use the (lower cost) default index.
Regards,
Thomas
Re: Oak - Creating second fulltext lucene index
Posted by Roy Teeuwen <ro...@teeuwen.be>.
No one who could help me out on this issue?
> On 26 Jun 2018, at 20:13, Roy Teeuwen <ro...@teeuwen.be> wrote:
>
> Hey guys,
>
> I have the following situation:
>
> - I have a sentence, for example "This is my %%sentence%% I want to search for" and I would like to query for the term %%sentence%% (could be in any property)
> - The default built-in full text oak lucene index uses the Standard Analyzer, which parses out the %%.
> - I could use the WhitespaceAnalyzer, which would index the sentence as This, is, my , %%sentence%%, I, ...
> - By using this analyzer but also putting it on nt:base, I would go against the docs which state I should not create a second index that also uses the same nodetype
>
> So my question here is, how could I solve this? Is it possible to specify that this index should not be used for normal querying, and then doing a rep-native where I specify the functionName of that specific index so that I can query it anyway? Any other solutions would be helpful too
>
> Thanks,
> Roy