You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "marsupialtail (via GitHub)" <gi...@apache.org> on 2023/05/09 19:08:42 UTC

[GitHub] [arrow] marsupialtail commented on issue #35508: adding data to tdigest in pyarrow

marsupialtail commented on issue #35508:
URL: https://github.com/apache/arrow/issues/35508#issuecomment-1540750019

   Hey, if you are not picky about using random python projects, try this:
   
   pip3 install ldbpy
   
   Then:
   
   ```
   #!/usr/bin/env python3
   import pyarrow as pa
   import numpy as np
   from pyarrow.cffi import ffi
   c_schema = ffi.new("struct ArrowSchema*")
   schema_ptr = int(ffi.cast("uintptr_t", c_schema))
   c_array = ffi.new("struct ArrowArray*")
   array_ptr = int(ffi.cast("uintptr_t", c_array))
   import polars
   lineitem = polars.read_parquet("demo-tpch/lineitem.parquet")
   arr = lineitem.to_arrow()["l_tax"].combine_chunks()
   arr._export_to_c(array_ptr, schema_ptr)
   import ldbpy, time
   a = ldbpy.NTDigest(20,100,10000)
   start = time.time()
   a.batch_add_arrow([array_ptr] * 20, [schema_ptr] * 20)
   print(a.quantile(0, 0.5))
   print(a.quantile(1, 0.1))
   print(time.time() - start)
   import pyarrow.compute as pac
   start = time.time()
   print(pac.tdigest(arr, 0.5))
   print(time.time() - start)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org