You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "David Johnson (JIRA)" <ji...@apache.org> on 2014/12/05 22:09:12 UTC

[jira] [Updated] (USERGRID-255) Re-indexer That Removes Source from ES

     [ https://issues.apache.org/jira/browse/USERGRID-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Johnson updated USERGRID-255:
-----------------------------------
    Description: 
The two-dot-o Query Index module currently stores both document source and fields in ElasticSearch. Since we only ever retrieve ID numbers from ES, there is no need for us to store source and it is a waste of resources.

This is what we need:

1) A way to configure Usergrid to either store source or not store source.

2) An index-rebuild "Tool" (implemented as a REST end-point) that either remove source, or add source depending on how the system in configured to operate. The Tool must allow us to re-index without downtime. Possible approach:

For each application:

a) Tool creates a new index and adds that index to the application's read and write alias. 

b) Tool removes the old index from the application's write alias so it is no longer written to.

b) Tool deletes the mappings for each newly added index, then re-creates them with the new store-source settings.

c) Tool re-indexes the application's collections.

d) Once re-index is complete, Tool deletes the old index.





  was:From Todd: When I was testing our indexing schema changes, I noticed our document source was appearing in ES. We should verify if this is happening in production, and if so, turn this off on the client side. Since we store all data in Cassandra, we do not need to also store it into ES. This means we can index more in ES and use a lot less disk space, so it's very important operationally.

        Summary: Re-indexer That Removes Source from ES  (was: Verify we're not storing document source in Elastic Search)

> Re-indexer That Removes Source from ES
> --------------------------------------
>
>                 Key: USERGRID-255
>                 URL: https://issues.apache.org/jira/browse/USERGRID-255
>             Project: Usergrid
>          Issue Type: Story
>          Components: Stack
>            Reporter: Todd Nine
>            Assignee: David Johnson
>
> The two-dot-o Query Index module currently stores both document source and fields in ElasticSearch. Since we only ever retrieve ID numbers from ES, there is no need for us to store source and it is a waste of resources.
> This is what we need:
> 1) A way to configure Usergrid to either store source or not store source.
> 2) An index-rebuild "Tool" (implemented as a REST end-point) that either remove source, or add source depending on how the system in configured to operate. The Tool must allow us to re-index without downtime. Possible approach:
> For each application:
> a) Tool creates a new index and adds that index to the application's read and write alias. 
> b) Tool removes the old index from the application's write alias so it is no longer written to.
> b) Tool deletes the mappings for each newly added index, then re-creates them with the new store-source settings.
> c) Tool re-indexes the application's collections.
> d) Once re-index is complete, Tool deletes the old index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)