You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Anthony Baker (JIRA)" <ji...@apache.org> on 2017/06/17 18:04:19 UTC
[jira] [Closed] (GEODE-2913) Update Lucene documentation

     [ https://issues.apache.org/jira/browse/GEODE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anthony Baker closed GEODE-2913.
--------------------------------

> Update Lucene documentation
> ---------------------------
>
>                 Key: GEODE-2913
>                 URL: https://issues.apache.org/jira/browse/GEODE-2913
>             Project: Geode
>          Issue Type: Bug
>          Components: docs
>            Reporter: Karen Smoler Miller
>            Assignee: Karen Smoler Miller
>             Fix For: 1.2.0
>
>
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .addField("field1name")
> .addField("field2name")
> .create()
> {code}
> *  Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create the Lucene index before creating the region, even though this member does not hold any region data.
> If the index is not defined on the accessor, an exception like this will be thrown while attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] java.lang.IllegalStateException: Must create Lucene index full_index on region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create Lucene index full_index on region /data because it is defined in another member.
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The Lucene search will always be done on the server.  Besides, _you can't create an index on a client._
> * If you configure Invalidates for region entries (alone or as part of expiration), these will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't, so the query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results. The second time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> *  Destroying a region will *NOT* automatically destroy any Lucene index associated with that region. Instead, attempting to destroy a region with a Lucene index will throw a colocated region exception. 
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation chain cannot be destroyed, unless all its children [[/cusip_index#_data.files]] are destroyed
> at org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
> at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
> at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
> at DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
> {quote}
> * The process to change a Lucene index using gfsh: 
>       1. export region data
>       2. destroy Lucene index, destroy region 
>       3. create new index, create new region without user-defined business logic callbacks
>       4. import data with option to turn on callbacks (to invoke Lucene Async Event Listener to index the data)
>       5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not supported.
> * Document security implementation and defaults.  If a user has security configured for their cluster, creating a Lucene index requires DATA:MANAGE privilege (similar to OQL), but doing Lucene queries requires DATA:WRITE privilege because a function is called (different from OQL which requires only DATA:READ privilege). Here are all the required privileges for the gfsh commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction configured with local destroy. If using Lucene indexing, eviction can only be configured with overflow to disk. In this case, only the region data is overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] java.lang.UnsupportedOperationException: Lucene indexes on regions with eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene indexes on regions with eviction and action local destroy are not supported
> at org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(LuceneRegionListener.java:85)
> at org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(GemFireCacheImpl.java:3154)
> at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3013)
> at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2991)
> {quote}
> * We can use the same field name in different objects where the field has a different data type, but this may have unexpected consequences. For example, if I created an index on the field SSN with these following entries
>       Object_1 object_1 has String SSN = "1111"
>       Object_2 object_2 has Integer SSN = 1111
>       Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as IntPoint and FloatPoint in the Lucene world. The standard analyzer will not try to tokenize these value. The standard analyzer will only try to break up string values. So,
> **  If I do a string search for "SSN: 1111" , Lucene will return object_1.
> **  If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 , Lucene will return object_2
> **  If I do a FloatRangeQuery with upper limit 1111.5 and lower limit : 1111.0 , Lucene will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an exception will be thrown. A LuceneQueryException is thrown on the client/accessor:
> {quote}
> Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException: Lucene Query cannot be executed within a transaction
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
> at TestClient.executeQuerySingleMethod(TestClient.java:196)
> at TestClient.main(TestClient.java:59)
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system is 'quiet'; i.e. no puts, updates, or deletes are in progress. Otherwise the backups for Lucene indexes will not match the data in the region that is being indexed (i.e. incremental backups will not be consistent between the data region and the Lucene index region due to delayed processing associated with the AEQ). If the region data needs to be restored from backup, then you must follow the same process for changing a Lucene index in order to re-create the index region.
> *  Update docs section on "Memory Requirements for Cached Data" to include conservative estimate of 737 bytes per entry overhead for a Lucene index. All the other caveats mentioned for OQL indexes also apply for Lucene indexes... your mileage may vary...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)