You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2014/06/06 13:26:02 UTC

[jira] [Updated] (DERBY-590) How to integrate Derby with Lucene API?

     [ https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-590:
-------------------------------------

    Attachment: multifield.diff

Thanks, Rick. Those were the exact changes that were needed.

The attached patch [^multifield.diff] shows an example of how it could be used.

I made two small adjustments:

1) Instead of hard-coding the field names, I made LuceneSupport read them dynamically from a database property (derby.tests.lucene.fields), so that I could verify that the original Lucene tests still pass. (They do still pass, by the way.) Also the field names are stored in the Lucene index property file, so that LuceneQueryVTI can find them too. This is of course just a temporary hack until we figure out the correct API.

2) I made LuceneUtils.defaultQueryParser() always return a MultiFieldQueryParser, since MultiFieldQueryParser seems to behave just like QueryParser in the degenerate case with a single field.

Since I didn't feel like writing a Java source file parser, I changed my example use case to search in XML files, so that I could use the XML parser that is in the JRE. I added a test case to LuceneSupportTest to verify that it could be used for that.

The test case creates an index with two fields: tags and text. The tags field contains only the XML tags, whereas the text field contains only the text elements of the XML file. This way, you can use the index to search for data and metadata separately in the XML documents stored in your table.

Now, while writing the test case, I found that you will most likely want to use a custom query parser when you use it this way. The reason is that the default query parser uses the same analyzer as the index writer used to extract tokens from the search terms. That means, if you like in this case use a custom analyzer that parser XML documents, the query parser will also expect the terms in the query to be XML documents. So you'll end up with rather silly-looking queries.

For example, to search for documents that contain the text "abc", you cannot make the query {{text:"abc"}}, but have to wrap it in dummy XML tags to make it parsable {{text:"<dummy>abc</dummy>"}}.

The custom query parser doesn't need to be very complex, though. The test case in the patch shows one example in the method {{createXMLQueryParser()}}. That method simply creates a MultiFieldQueryParser with a plain StandardAnalyzer. With that parser, you can write queries like:

- {{text:abc}} to search for "abc" in the text elements of the XML

- {{tags:abc}} to search for XML tags called "abc"

- {{abc}} to search for "abc" in both text elements and tags

What do you think? Does it sound like a useful addition?

> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: LucenePlugin.html, LucenePlugin.html, LucenePlugin.html, derby-590-01-ag-publicAccessToLuceneRoutines.diff, derby-590-01-ah-publicAccessToLuceneRoutines.diff, derby-590-01-am-publicAccessToLuceneRoutines.diff, derby-590-02-aa-cleanupFindbugsErrors.diff, derby-590-03-aa-removeTestingDiagnostic.diff, derby-590-04-aa-removeIDFromListIndexes.diff, derby-590-05-aa-accessDeclaredMembers.diff, derby-590-06-aa-suppressAccessChecks.diff, derby-590-07-aa-accessClassInPackage.sun.misc.diff, derby-590-08-aa-omitLuceneFlag.diff, derby-590-09-aa-localeSensitiveAnalysis.diff, derby-590-10-aa-fixLocaleTest.diff, derby-590-11-aa-moveCode.diff, derby-590-12-aa-newJar.diff, derby-590-13-aa-indexViews.diff, derby-590-14-aa-coarseGrainedAuthorization.diff, derby-590-15-aa-requireHardUpgrade.diff, derby-590-16-aa-adjustUpgradeTest.diff, derby-590-17-aa-closeInputStreamOnPropertiesFile.diff, derby-590-18-aa-cleanupAPI.diff, derby-590-19-aa-cleanupAPI2.diff, derby-590-20-aa-customQueryParser.diff, derby-590-21-aa-noTimeTravel.diff, derby-590-22-aa-cleanupPrivacy.diff, derby-590-23-aa-correctTestLocale.diff, derby-590-24-ad-luceneDirectory.diff, derby-590-26-ac-backupRestore.diff, derby-590-26-ad-backupRestoreEncryption.diff, derby-590-27-aa-publicAPILuceneUtils.diff, derby-590-28-renameLuceneJars.diff, derby-590-29-aa-useLucene_4.7.1.diff, derby-590-30-aa-nullableScoreCeiling.diff, exceptions.diff, lucene_demo.diff, lucene_demo_2.diff, multifield.diff, netbeans.diff, netbeans2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)