You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "Todd Nine (JIRA)" <ji...@apache.org> on 2015/04/08 22:29:12 UTC

[jira] [Issue Comment Deleted] (USERGRID-536) Change our index structure for static mapping and cleanup api

     [ https://issues.apache.org/jira/browse/USERGRID-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated USERGRID-536:
-------------------------------
    Comment: was deleted

(was: We're not supporting arrays of dimension > 1.  You can have an array of dimension 1, and we will index any values in this array.  If you have a 2+ dimensional array, we will ignore the dimensions > 2.)

> Change our index structure for static mapping and cleanup api
> -------------------------------------------------------------
>
>                 Key: USERGRID-536
>                 URL: https://issues.apache.org/jira/browse/USERGRID-536
>             Project: Usergrid
>          Issue Type: Story
>          Components: Stack
>            Reporter: Todd Nine
>            Assignee: Todd Nine
>
> Currently, our dynamic mapping causes several issues with elastic search.  We should change our mapping to use a static structure, and resolve this operational pain.
> We need to make the following changes.
> h2. Modify our IndexScope
> This should more closely resemble the elements of an edge since this represents an edge. It will simplify the use of our query module and make development clearer.  This scope should be refactored into the following objects.  
> * IndexEdge - Id, name, timestamp, edgeType (source or target)
> * SearchEdge - Id, name, edgeType
> Note: edgeType is the type of the Id within the edge.  Does this Id represent a source Id, or does it represent a targetId?  The entity to be indexed will implicitly be the opposite of the type specified.  I.E if it's a source edge, the document is the target.  If it's a target edge, the document is the source.
> These values should also be stored within our document, so that we can index our documents.  Note that we perform bidirectional indexing in some cases, such was users, groups etc.  When we do this, we need to ensure that mark the direction of the edge appropriately.
> h2. Change default sort ordering
> When sorting is unspecified, we should order by timestamp descending from our index edge.  This ensures that we retain the correct edge time semantics, and will properly order collections and connections
> h2. Remove the legacy query class
> We don't need the Query class, it has far too many functions to be a well encapsulated object.  Instead, we should simply take the string QL, the SearchEdge and the limit to return our candidates.  From there, we should parse and visit the query internally to the query logic, NOT externally.
> h2. Create a static mapping
> The mapping should contains the following static fields.
> * entityId - The entity id
> * entityType - The entity type (from the id)
> * entityVersion - The entity version
> * edgeId - The edge Id
> * edgeName - The edge name
> * edgeTimestamp - The edge timestamp
> * edgeType - source | target
> * edgeSearch - edgeId + edgeName + edgeType
> It will then contain an array of "fields"  Each of these fields will have the following formation.
> {code}
> { "name":"[entity field name as a path]", "[field type]":[field value}
> {code}
> We will define a field type for each type of field.  Note that each field tuple will always contain a single field and a single value.  Possible field types are the following.
> * string - This will be mapped into 2 mapping with multi mappings.  It will be a string unanalyzed, and an analyzed string.  The 2 fields will then be "string_u" and "string_a".  The Query visitor will need to update the field name appropriately
> * long - An unanalyzed long
> * double - An unanalyzed double
> * boolean - An unanalyzed boolean
> * location - A geolocation field
> The entity path will be a flattened path from the root json element to the max json element.  It can be though of as a path through the tree of json elements.  We will use a dot '.' to delimit the fields.  X.Y.Z for nested objects.  Primitive arrays will contain a field object for each element in the array.
> h2. Indexing
>   When indexing entities, we will no longer modify or prefix field names.  They will be inserted into the value exactly as their path appears after lower case.
> h2. Querying
>   When querying, the "contains" operation for a string will need to use the "string_a" data type.  When using =, we will need to use the string_u data type.  Each criteria will need to use nested object querying, to ensure the property name and property value are both part of the same field tuple.
> h3. References
> Multi Field Mapping: http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html
> Nested Objects: http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
> Nested Object Search: http://www.elastic.co/guide/en/elasticsearch/guide/master/nested-sorting.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)