You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sdap.apache.org by "Frank Greguska (JIRA)" <ji...@apache.org> on 2018/07/24 18:11:00 UTC

[jira] [Created] (SDAP-127) ID should be unique in SOLR schema

Frank Greguska created SDAP-127:
-----------------------------------

             Summary: ID should be unique in SOLR schema
                 Key: SDAP-127
                 URL: https://issues.apache.org/jira/browse/SDAP-127
             Project: Apache Science Data Analytics Platform
          Issue Type: Improvement
            Reporter: Frank Greguska


The "solr_id_s" field is currently the "uniqueKey" for the schema:

https://github.com/apache/incubator-sdap-nexus/blob/107438af45b479348ffb75a667b276ee3c81f9da/data-access/config/schemas/solr/nexustiles/conf/managed-schema#L200


This is fine but a lot of the algorithms depend on the simple "id" field for working with tiles (the id field is the same as solr_id_s but without the prefix used for document routing):
https://github.com/apache/incubator-sdap-nexus/blob/107438af45b479348ffb75a667b276ee3c81f9da/data-access/config/schemas/solr/nexustiles/conf/managed-schema#L120

If possible, the "id" field should also be marked as unique so that it is impossible to generate tiles with identical "id"s.

This problem was found because of SLCP ice shelf data where 2 variables from the same granule were being ingested. The ID is generated from the granule name and section spec and an optional 'salt' value. In this case no salt was used (incorrectly) so the tiles were generated with identical "id"s but no error occurred because they had different dataset names which caused the "solr_id_s" field to be unique.

Not sure if it is possible to have more than one unique field in a SOLR schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)