You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/07/13 18:52:29 UTC

[GitHub] [solr] madrob commented on a diff in pull request #940: SOLR-16292 : NVector for alternative great-circle distance calculations

madrob commented on code in PR #940:
URL: https://github.com/apache/solr/pull/940#discussion_r920385253


##########
solr/core/src/test/org/apache/solr/util/NVectorUtilTest.java:
##########
@@ -0,0 +1,65 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util;
+
+import org.junit.Test;
+
+import java.text.DecimalFormat;
+
+import static org.junit.Assert.assertEquals;
+
+public class NVectorUtilTest {
+
+    DecimalFormat df = new DecimalFormat("##.####");
+
+    @Test
+    public void latLongToNVector() {
+        double lat = 52.024535;
+        double lon = -0.490155;
+        double[] n = NVectorUtil.latLongToNVector(lat, lon);
+        double[] ll = NVectorUtil.NVectorToLatLong(n);
+        assertSimilar(lat, ll[0]);
+        assertSimilar(lon, ll[1]);
+    }
+
+    void assertSimilar(double expected, double actual) {

Review Comment:
   Why do we need to do string formatting? Wouldn't it be more clear to use the `assertEquals(double, double, delta)` form with delta being... 0.0001 it looks like you want?



##########
solr/core/src/java/org/apache/solr/search/function/distance/NVector.java:
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search.function.distance;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.queries.function.docvalues.DoubleDocValues;
+import org.apache.lucene.queries.function.valuesource.MultiValueSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.solr.common.SolrException;
+
+import java.io.IOException;
+import java.util.Map;
+
+import static org.apache.solr.util.NVectorUtil.NVectorDist;
+
+public class NVector extends ValueSource {

Review Comment:
   I think it's confusing to have two classes named NVector in different packages.



##########
solr/core/src/test/org/apache/solr/util/FastInvTrigTest.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util;
+
+import org.apache.solr.SolrTestCase;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Random;
+
+import static org.apache.solr.util.NVectorUtil.EARTH_RADIUS;
+
+public class FastInvTrigTest extends SolrTestCase {
+
+    final static int num_points = 100000;
+    final static double EPSILON = 0.0001;
+    static final Random r = new Random();
+    private static final double TEN_METERS = 0.01;
+
+    static double[][] points = new double[num_points][2];
+
+    @Before
+    public void initAll() {
+        for (int i = 0; i < num_points; i++) {
+            points[i] = generateRandomPoint();
+        }
+    }
+
+    public static double deg2rad(double deg) {
+        return deg * (Math.PI / 180);
+    }
+
+    public static double[] generateRandomPoint() {
+        double u = r.nextDouble();
+        double v = r.nextDouble();
+
+        double latitude = deg2rad(Math.toDegrees(Math.acos(u * 2 - 1)) - 90);
+        double longitude = deg2rad(360 * v - 180);
+        return new double[]{latitude, longitude};
+    }
+
+    @Test
+    public void acos() {
+        for (double i = -1; i <= 1; i = i + 0.00001) {
+            assertTrue(FastInvTrig.acos(i) - Math.acos(i) <= EPSILON);

Review Comment:
   nit: use assertEquals to get better error messages



##########
solr/core/src/java/org/apache/solr/util/NVectorUtil.java:
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util;
+
+public class NVectorUtil {
+
+    public static final double EARTH_RADIUS = 6173.008;//km google:standard mean earth radius;

Review Comment:
   https://www.wolframalpha.com/input?i=earth+radius+km puts it at 6371.009 - transcription error?



##########
solr/core/src/java/org/apache/solr/schema/NVector.java:
##########
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.schema;
+
+import org.apache.lucene.document.StoredField;
+import org.apache.lucene.index.IndexableField;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.queries.function.valuesource.MultiValueSource;
+import org.apache.lucene.search.SortField;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.response.TextResponseWriter;
+import org.apache.solr.search.QParser;
+import org.apache.solr.uninverting.UninvertingReader;
+import org.apache.solr.util.NVectorUtil;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+public class NVector extends CoordinateFieldType {
+
+    @Override
+    protected void init(IndexSchema schema, Map<String, String> args) {
+        super.init(schema, args);
+        dimension = 3;
+        createSuffixCache(3);
+    }
+
+    @Override
+    public List<IndexableField> createFields(SchemaField field, Object value) {
+        String externalVal = value.toString();
+        String[] point = parseCommaSeparatedList(externalVal, dimension);
+        String[] nvector = NVectorUtil.latLongToNVector(point);
+
+        List<IndexableField> f = new ArrayList<>((dimension * 2) + 1);
+
+        if (field.indexed()) {
+            for (int i = 0; i < dimension; i++) {
+                SchemaField sf = subField(field, i, schema);
+                f.addAll(sf.createFields(nvector[i]));
+            }
+        }
+
+        if (field.stored()) {
+            f.add(createField(field.getName(), externalVal, StoredField.TYPE));
+        }
+        return f;
+    }
+
+    @Override
+    public ValueSource getValueSource(SchemaField field, QParser parser) {
+        ArrayList<ValueSource> vs = new ArrayList<>(dimension);
+        for (int i = 0; i < dimension; i++) {
+            SchemaField sub = subField(field, i, schema);
+            vs.add(sub.getType()
+                .getValueSource(sub, parser));
+        }
+        return new NVectorValueSource(vs);
+    }
+
+
+    /**
+     * Given a string containing <i>dimension</i> values encoded in it, separated by commas,
+     * return a String array of length <i>dimension</i> containing the values.
+     *
+     * @param externalVal The value to parse
+     * @param dimension   The expected number of values for the point
+     * @return An array of the values that make up the point (aka vector)
+     * @throws SolrException if the dimension specified does not match the number found
+     */
+    public static String[] parseCommaSeparatedList(String externalVal, int dimension) throws SolrException {
+        //TODO: Should we support sparse vectors?
+        String[] out = new String[dimension];
+        int idx = externalVal.indexOf(',');
+        int end = idx;
+        int start = 0;
+        int i = 0;
+        if (idx == -1 && dimension == 1 && externalVal.length() > 0) {//we have a single point, dimension better be 1
+            out[0] = externalVal.trim();
+            i = 1;
+        } else if (idx > 0) {//if it is zero, that is an error
+            //Parse out a comma separated list of values, as in: 73.5,89.2,7773.4
+            for (; i < dimension; i++) {
+                while (start < end && externalVal.charAt(start) == ' ') start++;
+                while (end > start && externalVal.charAt(end - 1) == ' ') end--;
+                if (start == end) {
+                    break;
+                }
+                out[i] = externalVal.substring(start, end);
+                start = idx + 1;
+                end = externalVal.indexOf(',', start);

Review Comment:
   Numbers will always be stored as decimals with `.` and separator as `,`? Need to make sure this doesn't mess up in other locales that use comma as the decimal separator.



##########
solr/core/src/test-files/solr/collection1/conf/schema-nvector.xml:
##########
@@ -0,0 +1,930 @@
+<?xml version="1.0" ?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements.  See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!-- The Solr schema file. This file should be named "schema.xml" and
+     should be located where the classloader for the Solr webapp can find it.
+
+     This schema is used for testing, and as such has everything and the
+     kitchen sink thrown in. See example/solr/conf/schema.xml for a
+     more concise example.
+
+  -->
+
+<schema name="test" version="1.0">

Review Comment:
   can we minimize this schema to only the fields that it needs instead of the full 1000 lines of everything?



##########
solr/core/src/java/org/apache/solr/util/FastInvTrig.java:
##########
@@ -0,0 +1,108 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util;
+
+public class FastInvTrig {

Review Comment:
   Did you write this code? Is it borrowed from a library? Is it an implementation of something found in a text book?



##########
solr/core/src/java/org/apache/solr/search/function/distance/NVector.java:
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search.function.distance;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.queries.function.docvalues.DoubleDocValues;
+import org.apache.lucene.queries.function.valuesource.MultiValueSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.solr.common.SolrException;
+
+import java.io.IOException;
+import java.util.Map;
+
+import static org.apache.solr.util.NVectorUtil.NVectorDist;
+
+public class NVector extends ValueSource {
+
+    private final MultiValueSource p1;
+    private final MultiValueSource p2;
+    private final double radius;
+
+    public NVector(MultiValueSource p1, MultiValueSource p2, double radius) {
+        this.p1 = p1;
+        this.p2 = p2;
+        if (p1.dimension() != 3 || p2.dimension() != 3) {
+            throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Illegal dimension for value sources");
+        }
+        this.radius = radius;
+    }
+
+    @Override
+    public FunctionValues getValues(Map<Object,Object> context, LeafReaderContext readerContext) throws IOException {
+
+        final FunctionValues vals1 = p1.getValues(context, readerContext);
+        final FunctionValues vals2 = p2.getValues(context, readerContext);
+
+        return new DoubleDocValues(this) {
+
+            @Override
+            public double doubleVal(int doc) throws IOException {
+                double[] dv1 = new double[p1.dimension()];
+                double[] dv2 = new double[p2.dimension()];
+                vals1.doubleVal(doc, dv1);
+                vals2.doubleVal(doc, dv2);
+                return  NVectorDist(dv1, dv2, radius);
+            }
+
+            @Override
+            public String toString(int doc) throws IOException {
+                return name() +
+                    ',' +
+                    vals1.toString(doc) +
+                    ',' +
+                    vals2.toString(doc) +
+                    ')';
+            }
+        };
+    }
+
+    protected String name() {
+        return "nvector";
+    }
+
+    @Override
+    public boolean equals(Object o) {
+        if (this.getClass() != o.getClass()) return false;

Review Comment:
   `o` could be null



##########
solr/core/src/java/org/apache/solr/search/function/distance/NVector.java:
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.search.function.distance;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.queries.function.FunctionValues;
+import org.apache.lucene.queries.function.ValueSource;
+import org.apache.lucene.queries.function.docvalues.DoubleDocValues;
+import org.apache.lucene.queries.function.valuesource.MultiValueSource;
+import org.apache.lucene.search.IndexSearcher;
+import org.apache.solr.common.SolrException;
+
+import java.io.IOException;
+import java.util.Map;
+
+import static org.apache.solr.util.NVectorUtil.NVectorDist;
+
+public class NVector extends ValueSource {
+
+    private final MultiValueSource p1;
+    private final MultiValueSource p2;

Review Comment:
   I'm unclear what these are - I don't think they're lat/long. Are they some offset and phase? Two separate vectors? I don't think p1 and p2 are descriptive enough for non-domain experts (hi! that's me!)



##########
solr/core/src/test/org/apache/solr/util/FastInvTrigTest.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.util;
+
+import org.apache.solr.SolrTestCase;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.util.Random;
+
+import static org.apache.solr.util.NVectorUtil.EARTH_RADIUS;
+
+public class FastInvTrigTest extends SolrTestCase {
+
+    final static int num_points = 100000;
+    final static double EPSILON = 0.0001;
+    static final Random r = new Random();

Review Comment:
   we don't use this random, use the one provided by the test framework so that seeds are reproducible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org