You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gora.apache.org by GitBox <gi...@apache.org> on 2021/03/02 04:30:22 UTC

[GitHub] [gora] podorvanova opened a new pull request #234: GORA-664 Add datastore for Elasticsearch

podorvanova opened a new pull request #234:
URL: https://github.com/apache/gora/pull/234


   [Outreachy Winter 2020-2021]
   This PR implements an Apache Elasticsearch datastore for Apache Gora.
   
   Your feedback would be much appreciated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] djkevincr merged pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
djkevincr merged pull request #234:
URL: https://github.com/apache/gora/pull/234


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gora.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] kamaci commented on pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
kamaci commented on pull request #234:
URL: https://github.com/apache/gora/pull/234#issuecomment-841358468


   Congrats @podorvanova for the PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] kamaci commented on a change in pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #234:
URL: https://github.com/apache/gora/pull/234#discussion_r632650676



##########
File path: gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/query/ElasticsearchResult.java
##########
@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gora.elasticsearch.query;
+
+import org.apache.gora.persistency.impl.PersistentBase;
+import org.apache.gora.query.Query;
+import org.apache.gora.query.impl.ResultBase;
+import org.apache.gora.store.DataStore;
+
+import java.util.List;
+
+/**
+ * ElasticsearchResult specific implementation of the
+ * {@link org.apache.gora.query.Result} interface.
+ */
+public class ElasticsearchResult<K, T extends PersistentBase> extends ResultBase<K, T> {
+
+    /**
+     * List of resulting persistent objects.
+     */
+    private List<T> persistentObjects;
+
+    /**
+     * List of resulting objects keys.
+     */
+    private List<K> persistentKeys;
+
+    public ElasticsearchResult(DataStore<K, T> dataStore, Query<K, T> query, List<K> persistentKeys, List<T> persistentObjects) {
+        super(dataStore, query);
+        this.persistentKeys = persistentKeys;
+        this.persistentObjects = persistentObjects;
+    }
+
+    @Override
+    public float getProgress() {
+        if (persistentObjects.size() == 0) {
+            return 1;
+        } else {

Review comment:
       No need to write `else`. It can be:
   
   ```
   if (persistentObjects.size() == 0) {
       return 1;
   }
   
   return offset / (float) persistentObjects.size();
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] kamaci commented on a change in pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #234:
URL: https://github.com/apache/gora/pull/234#discussion_r632649046



##########
File path: gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/mapping/ElasticsearchMappingBuilder.java
##########
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gora.elasticsearch.mapping;
+
+import com.google.inject.ConfigurationException;
+import org.apache.commons.io.IOUtils;
+import org.apache.gora.elasticsearch.store.ElasticsearchStore;
+import org.apache.gora.persistency.impl.PersistentBase;
+import org.apache.gora.util.GoraException;
+import org.jdom.Document;
+import org.jdom.Element;
+import org.jdom.JDOMException;
+import org.jdom.input.SAXBuilder;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.SAXException;
+
+import javax.xml.XMLConstants;
+import javax.xml.transform.Source;
+import javax.xml.transform.stream.StreamSource;
+import javax.xml.validation.Schema;
+import javax.xml.validation.SchemaFactory;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.Charset;
+import java.util.List;
+import java.util.Locale;
+
+/**
+ * Builder for Mapping definitions of Elasticsearch.
+ */
+public class ElasticsearchMappingBuilder<K, T extends PersistentBase> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(ElasticsearchMappingBuilder.class);
+
+    /**
+     * XSD validation file for the XML mapping.
+     */
+    private static final String XSD_MAPPING_FILE = "gora-elasticsearch.xsd";
+
+    // Index description
+    static final String ATT_NAME = "name";
+
+    static final String ATT_TYPE = "type";
+
+    // Class description
+    static final String TAG_CLASS = "class";
+
+    static final String ATT_KEYCLASS = "keyClass";
+
+    static final String ATT_INDEX = "index";
+
+    static final String TAG_FIELD = "field";
+
+    static final String ATT_DOCFIELD = "docfield";
+
+    static final String ATT_SCALINGFACTOR = "scalingFactor";
+
+    /**
+     * Mapping instance being built.
+     */
+    private ElasticsearchMapping elasticsearchMapping;
+
+    private final ElasticsearchStore<K, T> dataStore;
+
+    /**
+     * Constructor for ElasticsearchMappingBuilder.
+     *
+     * @param store ElasticsearchStore instance
+     */
+    public ElasticsearchMappingBuilder(final ElasticsearchStore<K, T> store) {
+        this.elasticsearchMapping = new ElasticsearchMapping();
+        this.dataStore = store;
+    }
+
+    /**
+     * Returns the Elasticsearch Mapping being built.
+     *
+     * @return Elasticsearch Mapping instance
+     */
+    public ElasticsearchMapping getElasticsearchMapping() {
+        return elasticsearchMapping;
+    }
+
+    /**
+     * Sets the Elasticsearch Mapping.
+     *
+     * @param elasticsearchMapping Elasticsearch Mapping instance
+     */
+    public void setElasticsearchMapping(ElasticsearchMapping elasticsearchMapping) {
+        this.elasticsearchMapping = elasticsearchMapping;
+    }
+
+    /**
+     * Reads Elasticsearch mappings from file.
+     *
+     * @param inputStream   Mapping input stream
+     * @param xsdValidation Parameter for enabling XSD validation
+     */
+    public void readMappingFile(InputStream inputStream, boolean xsdValidation) {
+        try {
+            SAXBuilder saxBuilder = new SAXBuilder();
+            if (inputStream == null) {
+                LOG.error("The mapping input stream is null!");
+                throw new GoraException("The mapping input stream is null!");
+            }
+
+            // Convert input stream to a string to use it a few times
+            String mappingStream = IOUtils.toString(inputStream, Charset.defaultCharset());
+
+            // XSD validation for XML file
+            if (xsdValidation) {
+                Source xmlSource = new StreamSource(IOUtils.toInputStream(mappingStream, Charset.defaultCharset()));
+                Schema schema = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
+                        .newSchema(new StreamSource(getClass().getClassLoader().getResourceAsStream(XSD_MAPPING_FILE)));
+                schema.newValidator().validate(xmlSource);
+                LOG.info("Mapping file is valid.");
+            }
+
+            Document document = saxBuilder.build(IOUtils.toInputStream(mappingStream, Charset.defaultCharset()));
+            if (document == null) {
+                LOG.error("The mapping document is null!");
+                throw new GoraException("The mapping document is null!");
+            }
+
+            Element root = document.getRootElement();
+            // Extract class descriptions
+            @SuppressWarnings("unchecked")
+            List<Element> classElements = root.getChildren(TAG_CLASS);
+            for (Element classElement : classElements) {
+                final Class<T> persistentClass = dataStore.getPersistentClass();
+                final Class<K> keyClass = dataStore.getKeyClass();
+                if (haveKeyClass(keyClass, classElement)
+                        && havePersistentClass(persistentClass, classElement)) {
+                    loadPersistentClass(classElement, persistentClass);
+                    break;
+                }
+            }
+

Review comment:
       You can remove extra spaces.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] podorvanova commented on pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
podorvanova commented on pull request #234:
URL: https://github.com/apache/gora/pull/234#issuecomment-843754824


   Thanks @kamaci for your feedback! I have updated the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [gora] kamaci commented on a change in pull request #234: GORA-664 Add datastore for Elasticsearch

Posted by GitBox <gi...@apache.org>.
kamaci commented on a change in pull request #234:
URL: https://github.com/apache/gora/pull/234#discussion_r632653393



##########
File path: gora-elasticsearch/src/main/java/org/apache/gora/elasticsearch/store/ElasticsearchStoreMetadataAnalyzer.java
##########
@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gora.elasticsearch.store;
+
+import org.apache.gora.elasticsearch.utils.ElasticsearchParameters;
+import org.apache.gora.store.impl.DataStoreMetadataAnalyzer;
+import org.apache.gora.util.GoraException;
+import org.elasticsearch.client.RequestOptions;
+import org.elasticsearch.client.RestHighLevelClient;
+import org.elasticsearch.client.indices.GetIndexRequest;
+import org.elasticsearch.client.indices.GetIndexResponse;
+import org.elasticsearch.cluster.metadata.MappingMetadata;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+
+public class ElasticsearchStoreMetadataAnalyzer extends DataStoreMetadataAnalyzer {
+
+    private RestHighLevelClient elasticsearchClient;
+
+    @Override
+    public void initialize() throws GoraException {
+        ElasticsearchParameters parameters = ElasticsearchParameters.load(properties, getConf());
+        elasticsearchClient = ElasticsearchStore.createClient(parameters);
+    }
+
+    @Override
+    public String getType() {
+        return "ELASTICSEARCH";
+    }
+
+    @Override
+    public List<String> getTablesNames() throws GoraException {
+        GetIndexRequest request = new GetIndexRequest("*");
+        GetIndexResponse response;
+        try {
+            response = elasticsearchClient.indices().get(request, RequestOptions.DEFAULT);
+        } catch (IOException ex) {
+            throw new GoraException(ex);
+        }
+        assert response != null;

Review comment:
       Consider logging and throwing an error without using assert.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org