You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/02/28 20:07:18 UTC

[GitHub] [accumulo] harjitdotsingh opened a new pull request #2535: Serializer for BigDecimal #2226

harjitdotsingh opened a new pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535


   This PR contains code for #2226. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819192896



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       Wondering if the scale should be added first so that its sorted on first. Thinking of numbers like `4.2E-100`,  `4.2E-10`, `4.2E+10`, `4.2E+100`  
   
   ```suggestion
         dos.writeInt(scale);
         dos.write(encodedbigInt);
   ```

##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    var list = Stream.of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10",

Review comment:
       Try adding a number with the same unscaled value that has different exponents.
   
   ```suggestion
       var list = Stream.of("4.2E+10","4.2E+100","4.2E-10","4.2E-100","4.2","2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816383559



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+
+    try {
+      int scale = bd.scale();
+      BigInteger bigInt = bd.unscaledValue();
+      byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+      byte[] ret = new byte[4 + encodedbigInt.length];
+      int len = ret.length;
+      DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret));
+      len = len ^ 0x80000000;
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);
+      dos.close();

Review comment:
       sure :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819648133



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       Also encoding by adding scale first won't have any bearing. When we sort the encoded value it's just a bunch of bytes. We will have to write our comparator which does what we want.  I'm evaluating few other options, will keep posted. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816382140



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+

Review comment:
       I can do that. We are sorting this list and then comparing it with the encoded sorting to get an expected and actual. So how would an ordered list here help ? I think we should create an issue so that I can clean up the other coders also with the same code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819635919



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       That won't work. Consider:
   
   ```java
   var x = new BigDecimal("2030E-3"); // value == 2.030; x.scale() == 3
   var y = new BigDecimal("303E-2");  // value == 3.03;  y.scale() == 2
   ```
   
   You can't sort by scale first, because scale is not related to the numeric ordering, only the amount of information about the precision that is stored. I'm actually not even sure whether we should be using precision or scale.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816420261



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+

Review comment:
       I was under the impression that the values here represented the expected sort order. If the test itself is sorting first, I'm not sure it's giving us the test coverage we want. I'd have to spend more time looking.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r818844426



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {

Review comment:
       ```suggestion
   /**
    * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
    *
    * @since 2.1.0
    */
   public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r820126842



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       Here's a good example to show that scale is not equivalent to exponent (paste into `jshell PRINTING` to run):
   ```java
   var y = new BigDecimal("210");
   IntStream.rangeClosed(-1, 2).forEach(i -> {
     var x = y.setScale(i);
     printf("unscaledValue() = %s; scale = %s; precision = %s; toString = %s%n", x.unscaledValue(), x.scale(), x.precision(), x);
   });
   ```
   
   Since BigDecimal has arbitrary precision, scale is used to track how the unscaled value must be scaled (how many places to move the decimal) to preserve the intended precision.
   
   Here's the output:
   
   ```
   unscaledValue() = 21; scale = -1; precision = 2; toString = 2.1E+2
   unscaledValue() = 210; scale = 0; precision = 3; toString = 210
   unscaledValue() = 2100; scale = 1; precision = 4; toString = 210.0
   unscaledValue() = 21000; scale = 2; precision = 5; toString = 210.00
   ```
   
   These all `compareTo` with `0`, but none of them `equals` any of the others.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816305874



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();

Review comment:
       Can this be private final?

##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+

Review comment:
       But also, shouldn't we expect these to be ordered numerically? These are all mixed up. I don't think that's what users are going to want.

##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+
+    try {
+      int scale = bd.scale();
+      BigInteger bigInt = bd.unscaledValue();
+      byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+      byte[] ret = new byte[4 + encodedbigInt.length];
+      int len = ret.length;
+      DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret));
+      len = len ^ 0x80000000;
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);
+      dos.close();
+      return ret;
+    } catch (IOException e) {
+      throw new RuntimeException(e);

Review comment:
       There is an UncheckedIOException that would be suitable instead of a generic RTE.

##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,

Review comment:
       I'm not sure BigDecimal's compareTo method is a sufficient comparator here. It will treat things as equal if they are equal in magnitude, but with different scale. However, when we serialize, we want to preserve the scale. We should want to keep the order predictable, so the scale must be considered as part of the comparison. This test may pass, but so would other orderings. But, we need to ensure that the test fails if other orderings pass.

##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+
+  }
+
+  @Test
+  public void testDecode() {
+    assertDecodes(new BigDecimalLexicoder(), BigDecimal.valueOf(-3.000));

Review comment:
       BigDecimalLexicoder can be constructed once and reused. It should be stateless, so we don't need more than one for the test.

##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+
+    try {
+      int scale = bd.scale();
+      BigInteger bigInt = bd.unscaledValue();
+      byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+      byte[] ret = new byte[4 + encodedbigInt.length];
+      int len = ret.length;
+      DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret));
+      len = len ^ 0x80000000;
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);
+      dos.close();

Review comment:
       This can be a try-with-resources block.

##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+

Review comment:
       This might be more readable (formatting might be off):
   
   ```suggestion
           List.of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10", "-65537.000", "-65537.00", "-65537.0")
               .stream().map(BigDecimal::new).collect(Collectors.toList()));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816382140



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,
+        Arrays.asList(new BigDecimal("2.0"), new BigDecimal("2.00"), new BigDecimal("2.000"),
+            new BigDecimal("-3.000"), new BigDecimal("-2.00"), new BigDecimal("0.0000"),
+            new BigDecimal("0.1"), new BigDecimal("0.10"), new BigDecimal("-65537.000"),
+            new BigDecimal("-65537.00"), new BigDecimal("-65537.0")));
+

Review comment:
       I can do that.Could you please also create issue for the other Coders so that I can change the others Tests also




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816381063



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+
+    try {
+      int scale = bd.scale();
+      BigInteger bigInt = bd.unscaledValue();
+      byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+      byte[] ret = new byte[4 + encodedbigInt.length];
+      int len = ret.length;
+      DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret));
+      len = len ^ 0x80000000;
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);
+      dos.close();
+      return ret;
+    } catch (IOException e) {
+      throw new RuntimeException(e);

Review comment:
       I was just following what was there in the other Codes and hence.Let me fix it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r821989060



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I came across this paper which I was doing some reading. I think we should to change the encoding to use Bits and hopefully, that would give us what we want.  Here is the link 
   [https://arxiv.org/abs/1506.01598]. It has been done in C++ thinking of translating that in Java and see if it gets us what we need. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819043425



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *

Review comment:
       @ctubbsii  Shall we have the LexiCoder implement the Comparator interface or write a different Comparator for BigDecimals? Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r818875962



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    List<BigDecimal> list =
+        List.of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10", "-65537.000",
+            "-65537.00", "-65537.0").stream().map(BigDecimal::new).collect(Collectors.toList());

Review comment:
       Looks like this could be simplified even more by using `Stream.of()`. The suggested code should be formatted correctly but changed the imports I think so might fail the checks until they are updated.
   ```suggestion
       var list = Stream.of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10",
           "-65537.000", "-65537.00", "-65537.0").map(BigDecimal::new).collect(Collectors.toList());
   ```
   I changed to `var` here because keeping it as list messed with the formatting making for some less readable code:
   ```Java
       List<
           BigDecimal> list =
               Stream
                   .of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10",
                       "-65537.000", "-65537.00", "-65537.0")
                   .map(BigDecimal::new).collect(Collectors.toList());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r818890064



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.List;
+import java.util.stream.Collectors;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    List<BigDecimal> list =
+        List.of("2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10", "-65537.000",
+            "-65537.00", "-65537.0").stream().map(BigDecimal::new).collect(Collectors.toList());

Review comment:
       Let me use stream and see what that does. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816420842



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,

Review comment:
       The concern I have isn't whether or not we're passing the comparator. The concern I have is that BigDecimal's own comparator does not guarantee a particular ordering of distinct input. It's non-deterministic, because it treats some values as equal when they differ in scale.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819057934



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I just re-read the javadoc for the DataOutputStream, and while it only guarantees that it can be read back via a DataInputStream, leaving some ambiguity about whether it can write any metadata to the stream, the current implementation doesn't do that and it would break things if it started doing to. So, it's fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r822041814



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I don't think that paper covers how to deal with the arbitrary precision of BigDecimal. And it's non-trivial to get the exponent from a normalized BigDecimal without explicitly setting the scale, which loses precision.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819043425



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *

Review comment:
       Shall we have the LexiCoder implement the Comparator interface or write a different Comparator for BigDecimals? Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819635919



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       That won't work. Consider:
   
   ```java
   var x = new BigDecimal("2030E-3"); // value == 2.030; x.scale() == 3
   var y = new BigDecimal("303E-2");  // value == 3.03;   y.scale() == 2
   ```
   
   You can't sort by scale first, because scale is not related to the numeric ordering, only the amount of information about the precision that is stored. I'm actually not even sure whether we should be using precision or scale.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
EdColeman commented on pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#issuecomment-1059249203


   There seems to be some confusion on this implementation and the need that it remains functionally equivalent to other lexicoders so that it provides consistent sort order (necessary for correct operations) and also provides a uniform user experience so that if they switch data types they would not need to change any of their business logic.
   
   Maybe it would help if you provided concrete use cases where a BigDecimal lexicoder is necessary for your functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819838176



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I was thinking the scale was equivalent to an exponent.  The DobuleLexicoder encodes the exponent and then mantissa for proper sorting.  If the scale is not the exponent, then to get proper sorting would need something like an exponent encoded first using a fixed width followed by the mantissa.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816383479



##########
File path: core/src/test/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoderTest.java
##########
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.math.BigDecimal;
+import java.util.Arrays;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.AbstractLexicoderTest;
+import org.junit.jupiter.api.Test;
+
+public class BigDecimalLexicoderTest extends AbstractLexicoderTest {
+
+  @Test
+  public void testSortOrder() {
+
+    assertSortOrder(new BigDecimalLexicoder(), BigDecimal::compareTo,

Review comment:
       If we don't pass the comparator wouldn't it still use the default comparator which would still not give us want we want?  Shouldn't we write our Comparator? Because our encoder and decoder know how data is stored and would compare it correctly.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r816383559



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+
+    try {
+      int scale = bd.scale();
+      BigInteger bigInt = bd.unscaledValue();
+      byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+      byte[] ret = new byte[4 + encodedbigInt.length];
+      int len = ret.length;
+      DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret));
+      len = len ^ 0x80000000;
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);
+      dos.close();

Review comment:
       sure :-). We should do it for other types also. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r818966979



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *

Review comment:
       I think the biggest problem with the native Java sort order, is that it's non-deterministic (for items that differ only in scale). Lexicoder orderings should be deterministic. So, we'd need to use a comparator that produces the same output, no matter the order of the input, and have the lexicoder be implemented to preserve that ordering. Otherwise, the user experience could be very confusing. For example, the string lexical ordering of "2.0" and "2.00" would place "2.0" before "2.00". However, the comparator for BigDecimal is perfectly happy to order "2.00" before "2.0", because their `compareTo` value is `0`, even though their `equals` value is `false`.

##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       This seems very dependent on DataOutputStream not introducing any additional bytes into the output stream. Since that's an implementation detail, I don't think we should rely on that fact. Why not just use `System.arraycopy` to guarantee you don't overflow the `ret` buffer?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819635919



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       That won't work. Consider:
   
   ```java
   var x = new BigDecimal("2030E-3"); // value == 2.030; x.scale() == 3
   var y = new BigDecimal("303E-2");   // value == 3.03;   y.scale() == 2
   ```
   
   You can't sort by scale first, because scale is not related to the numeric ordering, only the amount of information about the precision that is stored. I'm actually not even sure whether we should be using precision or scale.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r821018344



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I tried to rewrite this locally to use an exponent and it mostly works.  Below is what I was experimenting with.
   
   ```java
   public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
   
     private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
   
     @Override
     public BigDecimal decode(byte[] b) {
       // This concrete implementation is provided for binary compatibility, since the corresponding
       // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
       return super.decode(b);
     }
   
     private static int getExponent(BigDecimal bd){
       return -1*bd.scale()+bd.precision()-1;
     }
   
     /**
      * inverse of {@link #getExponent(BigDecimal)}
      */
     private static int getScale(int exponent, int precision) {
       return (exponent+1-precision) * -1;
     }
   
     private byte[] encodeBigInt(BigInteger bigInt) {
       byte[] bytes = bigInt.toByteArray();
       // flip the sign bit
       bytes[0] = (byte) (0x80 ^ bytes[0]);
       return bytes;
     }
   
     private BigInteger decodeBigInt(byte[] data, int off, int len) {
       // unflip the sign bit
       data[off] = (byte) (0x80 ^ data[off]);
       return new BigInteger(data, off, len);
     }
   
     @Override
     public byte[] encode(BigDecimal bd) {
       BigInteger bigInt = bd.unscaledValue();
       byte[] encodedbigInt = encodeBigInt(bigInt);
       // Length is set to size of encoded BigInteger + length of the scale value
       byte[] ret = new byte[5 + encodedbigInt.length];
       try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
         // sort first on the sign, make negative numbers come first
         dos.write((byte)bd.signum()  ^ 0x80);
   
         // sort second on the exponent
         int exponent = getExponent(bd);
         if(bd.signum() < 0) {
           // when the number if negative the exponents need to sort in reverse order
           exponent = exponent * -1;
         }
         dos.writeInt(exponent ^ 0x80000000);
   
         // sort third on the unscaled value.
         dos.write(encodedbigInt);
         return ret;
   
       } catch (IOException e) {
         throw new UncheckedIOException(e);
       }
     }
   
     @Override
     protected BigDecimal decodeUnchecked(byte[] b, int offset, int origLen)
         throws IllegalArgumentException {
   
       try {
         DataInputStream dis = new DataInputStream(new ByteArrayInputStream(b, offset, origLen));
         int signum = (byte)(dis.read() ^ 0x80);
   
         int exponent = dis.readInt() ^ 0x80000000;
         if(signum < 0)
           exponent = exponent * -1;
   
         BigInteger bigInt = decodeBigInt(b,5,b.length - 5);
   
         // TODO is there a better way to get this
         int precision = bigInt.abs().toString().length();
   
         return new BigDecimal(bigInt, getScale(exponent, precision));
       } catch (IOException ioe) {
         throw new UncheckedIOException(ioe);
       }
     }
   }
   ```
   
   I modified the test function to use the following test data.
   
   
   ```java
     @Test
     public void testSortOrder() {
   
       var list = Stream.of("4.2E+10","4.2E+100","4.1E+100","4.3E+100","4.2E-10","4.2E-100",
           "-4.2E+10","-4.2E+100","-4.2E-10","-4.2E-100","-4.1E-100","-4.3E-100","4.2","-4.2","2.1","2.2",
           "2.0", "2.00", "2.000", "-3.000", "-2.00", "0.0000", "0.1", "0.10", "-65537.000", "-65537.123",
           "-65537.00", "-65537.0", "4.1E-10","4.3E-10").map(BigDecimal::new).collect(Collectors.toList());
   
       assertSortOrder(new BigDecimalLexicoder(), list);
   
     }
   ```
   
   And it fails with the following difference in sort order.
   
   ```
   Expected :[-4.2E+100, -4.2E+10, -65537.123, -65537.000, -65537.00, -65537.0, -4.2, -3.000, -2.00, -4.2E-10, -4.3E-100, -4.2E-100, -4.1E-100, 0.0000, 4.2E-100, 4.1E-10, 4.2E-10, 4.3E-10, 0.1, 0.10, 2.0, 2.00, 2.000, 2.1, 2.2, 4.2, 4.2E+10, 4.1E+100, 4.2E+100, 4.3E+100]
   
   Actual   :[-4.2E+100, -4.2E+10, -65537.00, -65537.0, -65537.123, -65537.000, -4.2, -3.000, -2.00, -4.2E-10, -4.3E-100, -4.2E-100, -4.1E-100, 0.0000, 4.2E-100, 4.1E-10, 4.2E-10, 4.3E-10, 0.1, 0.10, 2.00, 2.000, 2.0, 2.1, 2.2, 4.2, 4.2E+10, 4.1E+100, 4.2E+100, 4.3E+100]
   ```
   
   So everything in the test data is sorting fine.  The only problem is `-65537.123` is sorting after `-65537.0`.  I understand why its doing this, not sure what to do about it for now though.
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r820126842



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       Here's a good example to show that scale is not equivalent to exponent (paste into `jshell PRINTING` to run):
   ```java
   var y = new BigDecimal("210");
   IntStream.rangeClosed(-1, 2).forEach(i -> {
     var x = y.setScale(i);
     printf("unscaledValue() = %s; scale = %s; precision = %s; toString = %s%n", x.unscaledValue(), x.scale(), x.precision(), x);
   });
   ```
   
   Since BigDecimal has arbitrary precision, scale is used to track how the unscaled value must be scaled (how many places to move the decimal) to preserve the intended precision.
   
   Here's the output:
   
   ```
   unscaledValue() = 21; scale = -1; precision = 2; toString = 2.1E+2
   unscaledValue() = 210; scale = 0; precision = 3; toString = 210
   unscaledValue() = 2100; scale = 1; precision = 4; toString = 210.0
   unscaledValue() = 21000; scale = 2; precision = 5; toString = 210.00
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r820126842



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       Here's a good example to show that scale is not equivalent to exponent (paste into `jshell PRINTING` to run):
   ```java
   var y = new BigDecimal("210");
   IntStream.rangeClosed(-1, 2).forEach(i -> {
     var x = y.setScale(i);
     printf("unscaledValue() = %s; scale = %s; precision = %s; toString = %s%n", x.unscaledValue(), x.scale(), x.precision(), x);
   });
   ```
   
   Since BigDecimal has arbitrary precision, scale is used to track how the unscaled value must be scaled (how many places to move the decimal) to preserve the intended precision. It's basically the number of `0` digits to add to the number of digits in the unscaled value in order to achieve the desired number of significant digits in the precision.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819155895



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *

Review comment:
       I'm not sure either are strictly necessary. What matters is that the lexical ordering of the encoded form by this Lexicoder is deterministic, in numerical order first, then in lexical order by precision. So, `new BigDecimal(new BigInteger("203"), 2)` should sort lexically prior to `new BigDecimal(new BigInteger("2030"), 3)`, for example.
   
   The main reason a Comparator is needed is to test that the ordering is preserved in both directions. So at the very least, a better comparator will be needed for testing. It may be sufficient to use `Comparator.naturalOrder().thenComparingInt(BigDecimal::precision)`, but I don't know if there are any edge cases where this breaks.
   
   As for the javadoc, rather than stating that we're maintaining its native Java sort order, we should specify that we're maintaining its numerical order and precision, so that numerically equivalent values, the ones with the least precision are ordered first. When this lexicoder is reversed, the opposite should be true.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r819042355



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       welcome your comments  😄 .  I would have to change the implementation a little because now I will have to convert  the int to a ByteArray and preserve and or switch the ordering. All our other LexiCoders are using the same approach shouldn't we also change that. Also if it were writing extra bytes our decode tests would have failed or we would have caught that. Let me think what and how we can handle it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] harjitdotsingh commented on a change in pull request #2535: Serializer for BigDecimal #2226

Posted by GitBox <gi...@apache.org>.
harjitdotsingh commented on a change in pull request #2535:
URL: https://github.com/apache/accumulo/pull/2535#discussion_r821989060



##########
File path: core/src/main/java/org/apache/accumulo/core/client/lexicoder/BigDecimalLexicoder.java
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.client.lexicoder;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.apache.accumulo.core.clientImpl.lexicoder.FixedByteArrayOutputStream;
+
+/**
+ * A lexicoder to encode/decode a BigDecimal to/from bytes that maintain its native Java sort order.
+ *
+ * @since 2.1.0
+ */
+public class BigDecimalLexicoder extends AbstractLexicoder<BigDecimal> {
+
+  private final BigIntegerLexicoder bigIntegerLexicoder = new BigIntegerLexicoder();
+
+  @Override
+  public BigDecimal decode(byte[] b) {
+    // This concrete implementation is provided for binary compatibility, since the corresponding
+    // superclass method has type-erased return type Object. See ACCUMULO-3789 and #1285.
+    return super.decode(b);
+  }
+
+  @Override
+  public byte[] encode(BigDecimal bd) {
+    // To encode we separate out the scale and the unscaled value
+    // serialize each value individually and store them
+    int scale = bd.scale();
+    BigInteger bigInt = bd.unscaledValue();
+    byte[] encodedbigInt = bigIntegerLexicoder.encode(bigInt);
+    // Length is set to size of encoded BigInteger + length of the scale value
+    byte[] ret = new byte[4 + encodedbigInt.length];
+    try (DataOutputStream dos = new DataOutputStream(new FixedByteArrayOutputStream(ret))) {
+      scale = scale ^ 0x80000000;
+      dos.write(encodedbigInt);
+      dos.writeInt(scale);

Review comment:
       I came across this paper which I was doing some reading. I think we should to change the encoding to use Bits and hopefully, that would give us what we want.  Here is the link 
   [Decimal Lexicoder](https://arxiv.org/abs/1506.01598). It has been done in C++ thinking of translating that in Java and see if it gets us what we need. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org