You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2022/01/14 03:34:42 UTC

[GitHub] [datasketches-java] hqx871 commented on issue #382: HIP result is not stable

hqx871 commented on issue #382:
URL: https://github.com/apache/datasketches-java/issues/382#issuecomment-1012719806


   @leerho Thanks for your reply. By HIP result is not stable I mean when HllSketch choose hipAccum or Historical Inverse Probability strategy may get different result if change the unioned sketches order. As the follow code example, I get results:
   
   [3820.0975208182876, 3813.762046000671, 3827.557127714105, 3824.370010995233, 3821.2456442772445, 3817.0347462105, 3825.5043384701926, 3819.158133388752, 3813.826214879161, 3814.82872512458]
   
   
   ```
   import org.apache.datasketches.hll.HllSketch;
   import org.apache.datasketches.hll.TgtHllType;
   import org.apache.datasketches.hll.Union;
   import org.apache.datasketches.memory.Memory;
   
   import java.util.ArrayList;
   import java.util.List;
   import java.util.concurrent.ThreadLocalRandom;
   
   public class HipDemo
   {
     public static void main(String[] args)
     {
       final int lgK = 15;
       final int rowCount = 2 << 8;
       final int cardinality = 2 << (lgK - 3);
       final int sketchNum = 10;
       final List<HllSketch> sketches = new ArrayList<>();
       for (int i = 0; i < sketchNum; i++) {
         sketches.add(generateSketch(lgK, rowCount, cardinality));
       }
   
       List<Double> estimates = new ArrayList<>();
       for (int i = 0; i < 10; i++) {
         //change order caused different estimate result.
         estimates.add(runUnion(randOrder(sketches), lgK));
         //estimates.add(runUnion(sketches, lgK));
       }
       System.out.println(estimates);
     }
   
     public static double runUnion(List<HllSketch> sketches, int lgK)
     {
       Union union = new Union(lgK);
       for (HllSketch sketch : sketches) {
         union.update(sketch);
       }
       HllSketch result = union.getResult(TgtHllType.HLL_8);
       return result.getEstimate();
     }
   
     public static List<HllSketch> randOrder(List<HllSketch> sketches)
     {
       List<HllSketch> part1 = new ArrayList<>();
       List<HllSketch> part2 = new ArrayList<>();
       ThreadLocalRandom random = ThreadLocalRandom.current();
       for (HllSketch sketch : sketches) {
         if (random.nextBoolean()) {
           part1.add(sketch);
         } else {
           part2.add(sketch);
         }
       }
       List<HllSketch> all = new ArrayList<>();
       all.addAll(part1);
       all.addAll(part2);
       return all;
     }
   
     public static HllSketch generateSketch(int lgK, int count, int cardinality)
     {
       ThreadLocalRandom random = ThreadLocalRandom.current();
       HllSketch sketch = new HllSketch(lgK);
       for (int j = 0; j < count; j++) {
         sketch.update(random.nextInt(cardinality));
       }
       byte[] bytes = sketch.toCompactByteArray();
       sketch = HllSketch.wrap(Memory.wrap(bytes));
       return sketch;
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org