You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by "romseygeek (via GitHub)" <gi...@apache.org> on 2024/02/19 12:25:24 UTC

[I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

romseygeek opened a new issue, #510:
URL: https://github.com/apache/datasketches-java/issues/510

   UpdateSketch#update(byte[]) passes its byte array argument to MurmurHash3#hash(byte[],int,int,long) but always uses an offset of 0 and a length of data.length.  Adding a new update method that also takes an offset and a length would allow users to update sketches without having to copy data if their inputs are already references into large byte arrays.
   
   For context, I'm coming from Apache Lucene where term data is almost always passed around as references into these types of large byte arrays, and being able to update sketches like this would be very helpful in cutting down memory copies.  If the maintainers think this would be useful to others I'm happy to open a pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-2016631159

   Thank you for contributing this.  I haven't had a chance to review it, but will get to it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1987661300

   That would be great!  Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1989711001

    I have been decoding your statement.  I wasn't sure where "here" was, but it must have been in /bloomfilter/ or /sampling/ ?
   
   - The update(Memory) in /filters/ bloomfilter already does what you want.   Unless you also want update(ByteBuffer) as an option.
   - I would ignore /sampling/ directory
   - As you said, /hll/ already has update(BB), so ignore
   
   That leaves:
   - .../tuple/arrayofdoubles/ArrayOfDoublesUpdatableSketch.java
   - .../tuple/UpdatableSketch.java
   - .../cpc/CpcSketch.java  (somehow this was omitted in the first list !)
   - .../theta/UnionImpl.java
   - .../theta/UpdateSketch.java
   
   Feel free to do what you can!  I really appreciate the help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1986702017

   I will get back to you shortly.  It turns out there are 6 places in the library where this would be relevant, and I hate to have you do all that work, so maybe I should do it.  Here are the approximate relative locations:
   
   - .../filters/bloomfilter/BloomFilter.java:212:  public void update(final byte[] data)...
   - .../hll/BaseHllSketch.java:363:  public void update(final byte[] data)...
   - .../tuple/arrayofdoubles/ArrayOfDoublesUpdatableSketch.java:119:  public void update(final byte[] key, final double[] values)...
   - .../tuple/UpdatableSketch.java:135:  public void update(final byte[] key, final U value)...
   - .../theta/UnionImpl.java:479:  public void update(final byte[] data)...
   - .../theta/UpdateSketch.java:271:  public UpdateReturnState update(final byte[] data)...
   
   Are you still up for it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "romseygeek (via GitHub)" <gi...@apache.org>.
romseygeek commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1980391609

   An `update(ByteByffer)` method on all the counting sketches sounds good to me.  Happy to open PRs for these - would you prefer separate ones for each sketch type or a single one for all three?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "romseygeek (via GitHub)" <gi...@apache.org>.
romseygeek commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1988914705

   I started out adding an `update(Memory)` method here, but that runs into trouble with `Union` which already has an `update(Memory)` method that does something quite different.  So I think adding an `update(ByteBuffer)` method is probably the way to go (which HLL already has so that's one less place to update).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1979374088

   @romseygeek 
   Any comments on my reply of 2 weeks ago?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "romseygeek (via GitHub)" <gi...@apache.org>.
romseygeek commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1987278424

   I'll make a start and see how long it takes me to do a couple of them :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho commented on issue #510:
URL: https://github.com/apache/datasketches-java/issues/510#issuecomment-1955176054

   This is a valuable suggestion. But it begs the question of whether we need to add this capability to all the other array update methods ... String, char[], int[], long[] ... which kinda bloats the API.    Or, instead, add an update(ByteBuffer).  And/Or, add an update(Memory).
   
   The last two methods allow wrapping of any primitive array and creating a view of it to pass to the sketch, all without any copies. (with ByteBuffer and the multibyte primitives, one has to be very careful to be explicit about endianness.)
   
   This suggestion also begs the question of whether we need to do this with our other unique counting sketches: CPC and Tuple.  HLL already has an update(ByteBuffer). 
   
   Thoughts?  
   
   Would you also be willing to help with PRs for the other unique counting sketches?
   
   Cheers, and thanks for the suggestion!!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org


Re: [I] Allow UpdateSketch#update to take a byte array slice (datasketches-java)

Posted by "leerho (via GitHub)" <gi...@apache.org>.
leerho closed issue #510: Allow UpdateSketch#update to take a byte array slice
URL: https://github.com/apache/datasketches-java/issues/510


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org