You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Ankush Nagapure (Jira)" <ji...@apache.org> on 2021/10/06 17:49:00 UTC
[jira] [Created] (OAK-9596) Full text search using Lucene index for
binary content
Ankush Nagapure created OAK-9596:
------------------------------------
Summary: Full text search using Lucene index for binary content
Key: OAK-9596
URL: https://issues.apache.org/jira/browse/OAK-9596
Project: Jackrabbit Oak
Issue Type: Task
Components: indexing, lucene
Reporter: Ankush Nagapure
I am trying out jackrabbit oak with lucene in a file node store. The index definition record is created successfully but it seems the index record is not created. lucene index creation code snippets:
{code:java}
public void initRepository() {
LuceneIndexProvider provider = new LuceneIndexProvider();
Jcr jcr = new Jcr(nodeStore)
.withAsyncIndexing("async",3)
.with(new LuceneIndexEditorProvider())
.with((QueryIndexProvider) provider)
.with((Observer) provider)
.withAsyncIndexing("async",3);
repository = jcr.createRepository();
log.info("Repository initialized");
}
public void createLuceneIndex() throws RepositoryException {
Session session = createAdminSession();
Node lucene = JcrUtils.getOrCreateByPath("/oak:index/lucene", "oak:Unstructured",
"oak:QueryIndexDefinition", session, false);
lucene.setProperty("compatVersion", 2);
lucene.setProperty("type", "lucene");
lucene.setProperty("async", "async");
Node rules = lucene.addNode("indexRules", "nt:unstructured");
Node allProps = rules.addNode("nt:base")
.addNode("properties", "nt:unstructured")
.addNode("allProps", "oak:Unstructured");
allProps.setProperty(Property.JCR_DATA, ".*");
allProps.setProperty("isRegexp", true);
allProps.setProperty("nodeScopeIndex", true);
session.save();
session.logout();
log.info("Lucene index created");
}
{code}
After creating Lucene index, I have uploaded *test.doc* file in node store using below code:
{code:java}
log.info("Setting the JCR content for file name: test.doc, under path: " + folderNode.getPath());
final Binary binary = new BinaryImpl(fileBytes);
final Node content = folderNode.addNode(Property.JCR_CONTENT, NodeType.NT_RESOURCE);
content.setProperty(Property.JCR_DATA, binary);
//JCR session save code here
{code}
test.doc file contents:
{code:java}
HelloWorld, Test file contents for Full text search using Lucene index.{code}
I have used below query to fetch result:
{code:java}
final Query query = queryManager.createQuery(
"select * from [nt:base] where contains(*, 'HelloWorld')",
Query.JCR_SQL2);
final QueryResult result = query.execute();
final NodeIterator nodeIter = result.getNodes();
log.info("Number of nodes: " + nodeIter.getSize());{code}
But this query is not returning node where file contents are stored. I am getting below result:
{code:java}
Number of nodes: 0{code}
Logs:
{code:java}
"Traversal query (query without index): select * from [nt:base] where contains(*,'HelloWorld'); consider creating an index"{code}
Could you please let me know how to create proper index and search full text in binary contents.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)