You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2020/10/25 22:47:44 UTC

[GitHub] [incubator-datasketches-java] xinyuwan opened a new issue #337: CompactSketch ArrayIndexOutOfBoundsException

xinyuwan opened a new issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337


   Hi, we are using Theta Sketches java library to calculate reach metrics. Based on the Java Example from the Data Sketch website, we are using Union to join multiple sketches and then get the CompactSketch in binary format.
   
   However, we do observe issues when we get CompactSketch from Union as the following stacktrace:
   
   ```
   Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 137 out of bounds for length 137 
   at org.apache.datasketches.theta.CompactSketch.compactCache(CompactSketch.java:97) 
   at org.apache.datasketches.theta.UnionImpl.getResult(UnionImpl.java:238) 
   at org.apache.datasketches.theta.UnionImpl.getResult(UnionImpl.java:212) 
   ```
   
   
   Can you guys let us know under what case this would happen and what's the root cause?
   
   Thanks,
   Bill
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho commented on issue #337: CompactSketch ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
leerho commented on issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-718213594


   Our sketches are single threaded and thus, not thread-safe.
   
   Are these sketches being accessed by more than one thread?  Because this
   smells like a concurrent access problem.
   
      - The fact that it is intermittent
      - The compactCache() method is deterministic.
      - The input argument *curCount* must be correct and remain correct for
      the duration of this method which includes the for-loop.
      - If, for example, the *srcCache*, which contains a non-compacted
      hash-table, receives a new entry while this method is running the
      cacheOut[j++] will fail with precisely an ArrayIndexOOB error.
   
   Apparently you have written wrapper classes around these sketches. I
   suggest you synchronize all the method calls in these wrapper classes.
   
   
   
   On Wed, Oct 28, 2020 at 12:32 PM xinyuwan <no...@github.com> wrote:
   
   > Thanks @jmalkin <https://github.com/jmalkin> @leerho
   > <https://github.com/leerho> for the quick response. Let me add more
   > context to the issue:
   >
   > *Problem*: we encounter this ArrayOOB exception non-deterministically.
   > The same input may fail once and succeed later and I cannot reproduce the
   > error from local when I do individual calls to the getCompactSketch().
   > *Library version*:
   > org.apache.datasketches:datasketches-java:1.3.0-incubating
   > *Use case*: Here is a description on how we are using the sketch:
   >
   >    1. We are aggregating reach metrics from minute granularity to hourly
   >    and then to daily granularity. We do this by inserting UUIDs into
   >    UpdateSketch and serialize the compact form of it into Protobuf ByteString.
   >    2. In the minute-to-hour and hour-to-day aggregation, we are
   >    deserializing the ByteString back to Sketch and Union them.
   >    3. Once all minutse of one hour(or hours of one day) are all updated
   >    to the Union, we call Union.getResult() and serialize it into Protobuf
   >    ByteString again. The error only occurs during hour-to-day
   >    Union.getResult() and non-deterministically (Not sure if this is because
   >    the size of the sketch to be merged to Union is larger at this time). The
   >    error rate is about 5% of the total requests.
   >    4. Throughout the aggregation, we use Norminal Entries (K) = 1024 for
   >    both UpdateSketch and Union.
   >
   > Here is some code snippet:
   >
   >    1. We have SingleEntityUnionAccumulator which takes a counter key enum
   >    and ByteString of the sketch (compacted update sketch)
   >
   > public class SingleEntityUnionAccumulator {
   >
   >     final SketchOperations sketchOperations;
   >     final Map<AdImpressionStatsCounter, ReachData> reach;
   >
   >     public SingleEntityUnionAccumulator(@Nonnull final SketchOperations sketchOperations) {
   >         super(sketchOperations);
   >     }
   >
   >     public void accumulate(
   >             final AdImpressionStatsCounter counterKey, final ByteString sketchBytes) {
   >         ReachData reachData = putReachDataIfAbsent(counterKey);
   >         Sketch sketch = sketchOperations.byteStringToSketch(sketchBytes);
   >         reachData.getUnion().update(sketch);
   >     }
   >
   >     public Optional<SingleEntityReachData> toReachData() {
   >         // return null if there is no reach data to write to BT
   >         if (MapUtils.isEmpty(this.getReach())) {
   >             return Optional.empty();
   >         }
   >
   >         Map<Integer, ReachDataEntry> dataEntryMap =
   >                 this.getReach().entrySet().stream()
   >                         .collect(
   >                                 toMap(
   >                                         e -> e.getKey().getHash(),
   >                                         e -> e.getValue().toReachDataEntry()));
   >         return Optional.of(
   >                 SingleEntityReachData.newBuilder()
   >                         .putAllDataEntryByCounter(dataEntryMap)
   >                         .setEntityHierarchy(this.getHierarchy())
   >                         .build());
   >     }
   > }
   >
   > 2 The toReachData() method is where we see the exception throwing from.
   > Specifically, e -> e.getValue().toReachDataEntry())) which calls
   > Union.getResult()
   >
   >     public ReachDataEntry toReachDataEntry() {
   >         return ReachDataEntry.newBuilder()
   >                 .setSketch(sketchOperations.sketchToByteString(getCompactSketch()))
   >                 .setSeedValue(seedValue)
   >                 .build();
   >     }
   >
   >     public CompactSketch getCompactSketch() {
   >         return this.union.getResult();
   >     }
   >
   >
   >
   >    1. The SketchOperations is a helper class doing all the SerDe of
   >    sketch and union. In this case:
   >
   >     @Override
   >     public ByteString sketchToByteString(final Sketch sketch) {
   >         return ByteString.copyFrom(sketch.compact().toByteArray());
   >     }
   >
   >     @Override
   >     public Sketch byteStringToSketch(final ByteString sketchBytes) {
   >         return Sketches.wrapSketch(Memory.wrap(sketchBytes.toByteArray()));
   >     }
   >
   >
   > I'm not sure if ArrayIndexOOB indicates that something wrong on the
   > memory/heap side, but can you guys let us know if this can be a cause
   > during the Union.getResult()?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-718160923>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ADCXRQVCHOUFBJDDNCRRANTSNBWT3ANCNFSM4S6UWQDQ>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] xinyuwan commented on issue #337: CompactSketch ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
xinyuwan commented on issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-718160923


   Thanks @jmalkin @leerho for the quick response. Let me add more context to the issue:
   
   **Problem**: we encounter this ArrayOOB exception non-deterministically. The same input may fail once and succeed later and I cannot reproduce the error from local when I do individual calls to the getCompactSketch().
   **Library version**: org.apache.datasketches:datasketches-java:1.3.0-incubating
   **Use case**: Here is a description on how we are using the sketch:
   
   1. We are aggregating reach metrics from minute granularity to hourly and then to daily granularity. We do this by inserting UUIDs into UpdateSketch and serialize the compact form of it into Protobuf ByteString.
   2. In the minute-to-hour and hour-to-day aggregation, we are deserializing the ByteString back to Sketch and Union them. 
   3. Once all minutse of one hour(or hours of one day) are all updated to the Union, we call Union.getResult() and serialize it into Protobuf ByteString again. The error only occurs during hour-to-day Union.getResult() and non-deterministically (Not sure if this is because the size of the sketch to be merged to Union is larger at this time). The error rate is about 5% of the total requests.
   4. Throughout the aggregation, we use Norminal Entries (K) = 1024 for both UpdateSketch and Union.
   
   Here is some code snippet:
   1. We have SingleEntityUnionAccumulator which takes a counter key enum and ByteString of the sketch (compacted update sketch)
   ```
   public class SingleEntityUnionAccumulator {
   
       final SketchOperations sketchOperations;
       final Map<AdImpressionStatsCounter, ReachData> reach;
   
       public SingleEntityUnionAccumulator(@Nonnull final SketchOperations sketchOperations) {
           super(sketchOperations);
       }
   
       public void accumulate(
               final AdImpressionStatsCounter counterKey, final ByteString sketchBytes) {
           ReachData reachData = putReachDataIfAbsent(counterKey);
           Sketch sketch = sketchOperations.byteStringToSketch(sketchBytes);
           reachData.getUnion().update(sketch);
       }
   
       public Optional<SingleEntityReachData> toReachData() {
           // return null if there is no reach data to write to BT
           if (MapUtils.isEmpty(this.getReach())) {
               return Optional.empty();
           }
   
           Map<Integer, ReachDataEntry> dataEntryMap =
                   this.getReach().entrySet().stream()
                           .collect(
                                   toMap(
                                           e -> e.getKey().getHash(),
                                           e -> e.getValue().toReachDataEntry()));
           return Optional.of(
                   SingleEntityReachData.newBuilder()
                           .putAllDataEntryByCounter(dataEntryMap)
                           .setEntityHierarchy(this.getHierarchy())
                           .build());
       }
   }
   ```
   2 The toReachData() method is where we see the exception throwing from. Specifically,  e -> e.getValue().toReachDataEntry())) which calls Union.getResult()
   ```
       public ReachDataEntry toReachDataEntry() {
           return ReachDataEntry.newBuilder()
                   .setSketch(sketchOperations.sketchToByteString(getCompactSketch()))
                   .setSeedValue(seedValue)
                   .build();
       }
   
       public CompactSketch getCompactSketch() {
           return this.union.getResult();
       }
   
   ```
   3. The SketchOperations is a helper class doing all the SerDe of sketch and union. In this case:
   ```
       @Override
       public ByteString sketchToByteString(final Sketch sketch) {
           return ByteString.copyFrom(sketch.compact().toByteArray());
       }
   
       @Override
       public Sketch byteStringToSketch(final ByteString sketchBytes) {
           return Sketches.wrapSketch(Memory.wrap(sketchBytes.toByteArray()));
       }
   
   ```
   
   I'm not sure if ArrayIndexOOB indicates that something wrong on the memory/heap side, but can you guys let us know if this can be a cause during the Union.getResult()?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho commented on issue #337: CompactSketch ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
leerho commented on issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-716227344


   Unfortunately, this information is not sufficient to give us much clue as to what is going on.  Please send us a small java program that reproduces this problem and we will be glad to help you.
   
   Cheers.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] leerho closed issue #337: CompactSketch ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
leerho closed issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


[GitHub] [incubator-datasketches-java] jmalkin commented on issue #337: CompactSketch ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
jmalkin commented on issue #337:
URL: https://github.com/apache/incubator-datasketches-java/issues/337#issuecomment-716227296


   Please provide the library version you're using and a code snippet so we can try to reproduce the error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org