You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by dl via user <us...@arrow.apache.org> on 2022/07/01 14:47:51 UTC

support for sparse tensors

  
Hi,  
  
I'm trying to understand support for sparse tensors in Arrow. It looks like
there is ["experimental" support using the C++
API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
tensors). When was this introduced? I see in the code base
[here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
Cython sparse array classes. Can these be accessed using the Python API. Are
they included in the 8.0.0 release? Is there any other support for sparse
arrays/tensors in the Python API? Are there good examples for any of this, in
particular for using the 8.0.0 Python API to create sparse tensors?  
  
Thanks,  
David  
  
  


Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Just catching up on that conversation. This looks promising. After seeing
David's message I was thinking that I might not be able to add an extension
type for a SparseCSRMatrix directly, but a struct would work to have a single
field hold the data vs. three. Anyway, I'll try to get my head around this.  
  
Thanks!  
  

On 7/7/2022 8:24 AM, Rok Mihevc wrote:  

> That sounds like the case David Li is describing. You can use
> SparseCSRMatrix as a field value, but you have to introduce an extension
> type for it [1]. Best see David's suggestion.
>
>  
>
>
> [1]: <https://arrow.apache.org/docs/python/extending_types.html>
>
>  
>
>
> On Thu, Jul 7, 2022 at 3:58 PM dl
> <[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>

>> Thanks. That helps.  
>  
>  Can SparseCSRMatrix be used the way I'm trying to use it, as a field value
> in a table? I think that would need a DataType associated with it to give
> the field.  
>  
>
>>

>> On 7/6/2022 6:25 PM, Rok Mihevc wrote:  
>
>>

>>> arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components  
>
>>>

>>> arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense
version of original matrix

>>>

>>>  
>
>>>

>>> On Thu, Jul 7, 2022 at 3:12 AM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>>

>>>> Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy()
doesn't seem to preserve the shape of the matrix. Am I wrong? For example
using the code from my original message, printing the result of
arrow_sparse_csr_matrix.to_numpy() in one case gives:  
>  
>  (array([[0.91263427],  
>  [0.98520395],  
>  [0.98082576],  
>  [0.97490447],  
>  [0.94312307],  
>  [0.90573414],  
>  [0.95057244],  
>  [0.94955576],  
>  [0.90342821]]), array([0, 9], dtype=int64), array([ 0, 4, 33, 38, 46, 49,
> 61, 64, 83], dtype=int64))  
>  
>  vs.  
>  
>  >>> acsr.shape  
>  (1, 100)  
>  
>  
>
>>>>

>>>> On 7/6/2022 4:01 PM, dl wrote:  
>
>>>>

>>>>> I have tabular data with one record field of type
scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow
table. I had been first converting the csr_matrix first to a custom
representation using three fields (shape, keys, indices) and building the
pyarrow table using a schema with the types of these fields and table data
with a separate list for each field (and each list having one entry per input
record). I was hoping I could use a single pyarrow.SparseCSRMatrix field
instead of the custom three field representation. Is that possible?
Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary
for different records. But I don't think "typically (1,N)" matters. It would
work with variable shape (M,N). The shape field has type pyarrow.List with
value_type = pyarrow.int32().  
>  
>
>>>>>

>>>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:  
>
>>>>>

>>>>>> Hey David,

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> I don't think Table is designed in a way that you could "populate" it
with a 2D tensor. It should rather be populated with a collection of equal
length arrays.

>>>>>>

>>>>>> Sparse CSR tensor on the other hand is composed of three arrays
(indices, indptr, values) and you need a bit more involved logic to manipulate
those than regular arrays. See [1] for memory layout definition.

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> What are you looking to accomplish? What access patterns are you
expecting?

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> Rok

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> [1]
<https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> On Wed, Jul 6, 2022 at 10:48 PM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>>>>>

>>>>>>> Hi Rok,  
>  
>  What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:  
>  
>  schema = pyarrow.schema(fields, metadata=metadata)  
>  table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>  
>  where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>>>

>>>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>>>>>>

>>>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests
are perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>

>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> Rok

>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>>>>>>

>>>>>>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>     import numpy as np

>>>>>>>>>     import pyarrow as pa

>>>>>>>>>     from scipy.sparse import csr_matrix

>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>     a = np.random.rand(100)

>>>>>>>>>     a[a < .9] = 0.0

>>>>>>>>>     s = csr_matrix(a)

>>>>>>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>>>>>>  
>>>>>>>>>

>>>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>  
>
>>>>>>>>>

>>>>>>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>>>>>>

>>>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>  
>
>>>>>>>>>>

>>>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>>>>>>

>>>>>>>>>>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>>>>>>

>>>>>>>>>>  
>
>>>>>>>>>

>>>>>>>>>  
>
>>>>>>>

>>>>>>>  
>
>>>>>

>>>>>  
>
>>>>

>>>>  
>
>>

>>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
That sounds like the case David Li is describing. You can use
SparseCSRMatrix as a field value, but you have to introduce an extension
type for it [1]. Best see David's suggestion.

[1]: https://arrow.apache.org/docs/python/extending_types.html

On Thu, Jul 7, 2022 at 3:58 PM dl <dy...@yahoo.com> wrote:

> Thanks. That helps.
>
> Can SparseCSRMatrix be used the way I'm trying to use it, as a field value
> in a table? I think that would need a DataType associated with it to give
> the field.
>
> On 7/6/2022 6:25 PM, Rok Mihevc wrote:
>
> arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components
> arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense
> version of original matrix
>
> On Thu, Jul 7, 2022 at 3:12 AM dl <dy...@yahoo.com> wrote:
>
>> Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy()
>> doesn't seem to preserve the shape of the matrix. Am I wrong? For example
>> using the code from my original message, printing the result of
>> arrow_sparse_csr_matrix.to_numpy() in one case gives:
>>
>> (array([[0.91263427],
>>        [0.98520395],
>>        [0.98082576],
>>        [0.97490447],
>>        [0.94312307],
>>        [0.90573414],
>>        [0.95057244],
>>        [0.94955576],
>>        [0.90342821]]), array([0, 9], dtype=int64), array([ 0,  4, 33, 38,
>> 46, 49, 61, 64, 83], dtype=int64))
>>
>> vs.
>>
>> >>> acsr.shape
>> (1, 100)
>>
>>
>> On 7/6/2022 4:01 PM, dl wrote:
>>
>> I have tabular data with one record field of type
>> scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow
>> table. I had been first converting the csr_matrix first to a custom
>> representation using three fields (shape, keys, indices) and building the
>> pyarrow table using a schema with the types of these fields and table data
>> with a separate list for each field (and each list having one entry per
>> input record). I was hoping I could use a single pyarrow.SparseCSRMatrix
>> field  instead of the custom three field representation. Is that possible?
>> Incidentally, the shape of the csr_matrix is typically (1,N) where N may
>> vary for different records. But I don't think "typically (1,N)" matters. It
>> would work with variable shape (M,N). The shape field has type pyarrow.List
>> with value_type = pyarrow.int32().
>>
>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>>
>> Hey David,
>>
>> I don't think Table is designed in a way that you could "populate" it
>> with a 2D tensor. It should rather be populated with a collection of equal
>> length arrays.
>> Sparse CSR tensor on the other hand is composed of three arrays (indices,
>> indptr, values) and you need a bit more involved logic to manipulate those
>> than regular arrays. See [1] for memory layout definition.
>>
>> What are you looking to accomplish? What access patterns are you
>> expecting?
>>
>> Rok
>>
>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>>
>> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>>
>>> Hi Rok,
>>>
>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
>>> need to build a table with rows which include a field of this type. I don't
>>> see a related example in the test module. I'm doing something like:
>>>
>>> schema = pyarrow.schema(fields, metadata=metadata)
>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>>
>>> where fields is a list of tuples of the form (field_name, pyarrow_type),
>>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
>>> SparseCSRMatrix field? Or will this not work?
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>
>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
>>> perhaps most extensive description of what is doable:
>>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>>>
>>> Rok
>>>
>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org>
>>> wrote:
>>>
>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>
>>>> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix
>>>>
>>>> a = np.random.rand(100)
>>>> a[a < .9] = 0.0
>>>> s = csr_matrix(a)
>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>
>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>
>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>
>>>> I find pyarrow.SparseCSRMatrix mentioned here
>>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>>>> But how do I use that? Is there documentation for that class?
>>>>
>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to understand support for sparse tensors in Arrow. It looks
>>>> like there is "experimental" support using the C++ API
>>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>>>> When was this introduced? I see in the code base here
>>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
>>>> Cython sparse array classes. Can these be accessed using the Python API.
>>>> Are they included in the 8.0.0 release? Is there any other support for
>>>> sparse arrays/tensors in the Python API? Are there good examples for any of
>>>> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Thanks. That helps.  
  
Can SparseCSRMatrix be used the way I'm trying to use it, as a field value in
a table? I think that would need a DataType associated with it to give the
field.  
  

On 7/6/2022 6:25 PM, Rok Mihevc wrote:  

> arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components  
>
>
> arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense
> version of original matrix
>
>  
>
>
> On Thu, Jul 7, 2022 at 3:12 AM dl
> <[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>

>> Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy()
doesn't seem to preserve the shape of the matrix. Am I wrong? For example
using the code from my original message, printing the result of
arrow_sparse_csr_matrix.to_numpy() in one case gives:  
>  
>  (array([[0.91263427],  
>  [0.98520395],  
>  [0.98082576],  
>  [0.97490447],  
>  [0.94312307],  
>  [0.90573414],  
>  [0.95057244],  
>  [0.94955576],  
>  [0.90342821]]), array([0, 9], dtype=int64), array([ 0, 4, 33, 38, 46, 49,
> 61, 64, 83], dtype=int64))  
>  
>  vs.  
>  
>  >>> acsr.shape  
>  (1, 100)  
>  
>  
>
>>

>> On 7/6/2022 4:01 PM, dl wrote:  
>
>>

>>> I have tabular data with one record field of type scipy.sparse.csr_matrix.
I want to convert this tabular data to a pyarrow table. I had been first
converting the csr_matrix first to a custom representation using three fields
(shape, keys, indices) and building the pyarrow table using a schema with the
types of these fields and table data with a separate list for each field (and
each list having one entry per input record). I was hoping I could use a
single pyarrow.SparseCSRMatrix field instead of the custom three field
representation. Is that possible? Incidentally, the shape of the csr_matrix is
typically (1,N) where N may vary for different records. But I don't think
"typically (1,N)" matters. It would work with variable shape (M,N). The shape
field has type pyarrow.List with value_type = pyarrow.int32().  
>  
>
>>>

>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:  
>
>>>

>>>> Hey David,

>>>>

>>>>  
>
>>>>

>>>> I don't think Table is designed in a way that you could "populate" it
with a 2D tensor. It should rather be populated with a collection of equal
length arrays.

>>>>

>>>> Sparse CSR tensor on the other hand is composed of three arrays (indices,
indptr, values) and you need a bit more involved logic to manipulate those
than regular arrays. See [1] for memory layout definition.

>>>>

>>>>  
>
>>>>

>>>> What are you looking to accomplish? What access patterns are you
expecting?

>>>>

>>>>  
>
>>>>

>>>> Rok

>>>>

>>>>  
>
>>>>

>>>> [1] <https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>

>>>>

>>>>  
>
>>>>

>>>> On Wed, Jul 6, 2022 at 10:48 PM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>>>

>>>>> Hi Rok,  
>  
>  What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:  
>  
>  schema = pyarrow.schema(fields, metadata=metadata)  
>  table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>  
>  where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>

>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>>>>

>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> Rok

>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>>>>

>>>>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>>>>  
>>>>>>>  
>>>>>>>     import numpy as np

>>>>>>>     import pyarrow as pa

>>>>>>>     from scipy.sparse import csr_matrix

>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>>     a = np.random.rand(100)

>>>>>>>     a[a < .9] = 0.0

>>>>>>>     s = csr_matrix(a)

>>>>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>>>>  
>>>>>>>

>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>  
>
>>>>>>>

>>>>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>>>>

>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>  
>
>>>>>>>>

>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>>>>

>>>>>>>>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>

>>>>>>>  
>
>>>>>

>>>>>  
>
>>>

>>>  
>
>>

>>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components
arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense
version of original matrix

On Thu, Jul 7, 2022 at 3:12 AM dl <dy...@yahoo.com> wrote:

> Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy()
> doesn't seem to preserve the shape of the matrix. Am I wrong? For example
> using the code from my original message, printing the result of
> arrow_sparse_csr_matrix.to_numpy() in one case gives:
>
> (array([[0.91263427],
>        [0.98520395],
>        [0.98082576],
>        [0.97490447],
>        [0.94312307],
>        [0.90573414],
>        [0.95057244],
>        [0.94955576],
>        [0.90342821]]), array([0, 9], dtype=int64), array([ 0,  4, 33, 38,
> 46, 49, 61, 64, 83], dtype=int64))
>
> vs.
>
> >>> acsr.shape
> (1, 100)
>
>
> On 7/6/2022 4:01 PM, dl wrote:
>
> I have tabular data with one record field of type scipy.sparse.csr_matrix.
> I want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field  instead of the custom
> three field representation. Is that possible? Incidentally, the shape of
> the csr_matrix is typically (1,N) where N may vary for different records.
> But I don't think "typically (1,N)" matters. It would work with variable
> shape (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>
> Hey David,
>
> I don't think Table is designed in a way that you could "populate" it with
> a 2D tensor. It should rather be populated with a collection of equal
> length arrays.
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
> What are you looking to accomplish? What access patterns are you expecting?
>
> Rok
>
> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>
> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>
>> Hi Rok,
>>
>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
>> need to build a table with rows which include a field of this type. I don't
>> see a related example in the test module. I'm doing something like:
>>
>> schema = pyarrow.schema(fields, metadata=metadata)
>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>
>> where fields is a list of tuples of the form (field_name, pyarrow_type),
>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
>> SparseCSRMatrix field? Or will this not work?
>>
>> Thanks,
>> David
>>
>>
>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>
>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
>> perhaps most extensive description of what is doable:
>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>>
>> Rok
>>
>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>>
>>> So, I guess this is supported in 8.0.0. I can do this:
>>>
>>> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix
>>>
>>> a = np.random.rand(100)
>>> a[a < .9] = 0.0
>>> s = csr_matrix(a)
>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>
>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>
>>> On 7/1/2022 8:19 AM, dl wrote:
>>>
>>> I find pyarrow.SparseCSRMatrix mentioned here
>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>>> But how do I use that? Is there documentation for that class?
>>>
>>> On 7/1/2022 7:47 AM, dl wrote:
>>>
>>>
>>> Hi,
>>>
>>> I'm trying to understand support for sparse tensors in Arrow. It looks
>>> like there is "experimental" support using the C++ API
>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>>> When was this introduced? I see in the code base here
>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
>>> Cython sparse array classes. Can these be accessed using the Python API.
>>> Are they included in the 8.0.0 release? Is there any other support for
>>> sparse arrays/tensors in the Python API? Are there good examples for any of
>>> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>
>>>
>>>
>>
>
>

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy() doesn't
seem to preserve the shape of the matrix. Am I wrong? For example using the
code from my original message, printing the result of
arrow_sparse_csr_matrix.to_numpy() in one case gives:  
  
(array([[0.91263427],  
[0.98520395],  
[0.98082576],  
[0.97490447],  
[0.94312307],  
[0.90573414],  
[0.95057244],  
[0.94955576],  
[0.90342821]]), array([0, 9], dtype=int64), array([ 0, 4, 33, 38, 46, 49, 61,
64, 83], dtype=int64))  
  
vs.  
  
>>> acsr.shape  
(1, 100)  
  
  

On 7/6/2022 4:01 PM, dl wrote:  

> I have tabular data with one record field of type scipy.sparse.csr_matrix. I
> want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field instead of the custom three
> field representation. Is that possible? Incidentally, the shape of the
> csr_matrix is typically (1,N) where N may vary for different records. But I
> don't think "typically (1,N)" matters. It would work with variable shape
> (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().  
>  
>
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:  
>
>

>> Hey David,

>>

>>  
>
>>

>> I don't think Table is designed in a way that you could "populate" it with
a 2D tensor. It should rather be populated with a collection of equal length
arrays.

>>

>> Sparse CSR tensor on the other hand is composed of three arrays (indices,
indptr, values) and you need a bit more involved logic to manipulate those
than regular arrays. See [1] for memory layout definition.

>>

>>  
>
>>

>> What are you looking to accomplish? What access patterns are you expecting?

>>

>>  
>
>>

>> Rok

>>

>>  
>
>>

>> [1] <https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>

>>

>>  
>
>>

>> On Wed, Jul 6, 2022 at 10:48 PM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>

>>> Hi Rok,  
>  
>  What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:  
>  
>  schema = pyarrow.schema(fields, metadata=metadata)  
>  table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>  
>  where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>

>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>>

>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>

>>>>

>>>>  
>
>>>>

>>>> Rok

>>>>

>>>>  
>
>>>>

>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>>

>>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>>  
>>>>>  
>>>>>     import numpy as np

>>>>>     import pyarrow as pa

>>>>>     from scipy.sparse import csr_matrix

>>>>>  
>>>>>  
>>>>>  
>>>>>     a = np.random.rand(100)

>>>>>     a[a < .9] = 0.0

>>>>>     s = csr_matrix(a)

>>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>>  
>>>>>

>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>  
>
>>>>>

>>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>>

>>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>  
>
>>>>>>

>>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>>

>>>>>>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>>

>>>>>>  
>
>>>>>

>>>>>  
>
>>>

>>>  
>
>
>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
There's one here:
https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/array/tensor.py#L662-L664
https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/array/arrow_conversion.py#L310-L397

On Wed, Jul 13, 2022 at 8:41 PM dl via user <us...@arrow.apache.org> wrote:

> Are there *any* examples of controlling conversion with the
> __arrow_array__ protocol? I haven't been able to find one. For array type
> inference, is it enough just to add the __arrow_array__ method to the
> custom array class or does the class need to be registered somewhere?
>
> Thanks...
>
> On 7/13/2022 9:45 AM, David Li wrote:
>
> If `l` is a plain list there, I don't think it's possible. The
> __arrow_array__ protocol relies on you to have a type that you can define
> the method on. I also don't think there are other customization hooks for
> pa.array() but maybe someone else knows better.
>
> On Tue, Jul 12, 2022, at 17:18, dl via user wrote:
>
> Hi David,
>
> Are there any good examples for the first section
> <https://arrow.apache.org/docs/python/extending_types.html#controlling-conversion-to-pyarrow-array-with-the-arrow-array-protocol>
> of your reference [1]: Controlling conversion to pyarrow.Array with the
> __arrow_array__ protocol?
>
> I find examples of creating an extension array using an extension type
> with explicit code in test_extension_type.py
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py>,
> e.g. in test_ext_array_basics. I'm thinking it might be possible to have
> the array type inferred by pyarrow.array() or pyarrow.Table.from_arrays()
> using a extension array type as suggested there. Am I right about this? If
> so is there a good example? I haven't been able to get this to work.
>
> For the record, here is what I can do.
>
> l = list()*for *i *in *range(4):
>     s = csr_matrix(random_dense())
>     struct = [(*'shape'*, s.shape),
>               (*'keys'*, s.data),
>               (*'indexes'*, s.indices)]
>     l.append(struct)struct_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                           (*'keys'*, pa.list_(pa.float64())),
>                           (*'indexes'*, pa.list_(pa.int64()))])
> arrow_array = pa.array(l,struct_type)extension_array = pa.ExtensionArray.from_storage(SparseStructType(), arrow_array)
> *class *SparseStructType(pa.PyExtensionType):
>     storage_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                               (*'keys'*, pa.list_(pa.float64())),
>                               (*'indexes'*, pa.list_(pa.int64()))])
>     *def *__init__(self):
>         pa.PyExtensionType.__init__(self,self.storage_type)
>
>     *def *__reduce__(self):
>         *return *SparseStructType, ()
>
>
> I would like to be able to do something like
>
>
> extension_array = pa.array(l,SparseStructType())
>
>
> having the extension type of the array inferred by pa.array. Is that
> possible?
>
> Thanks,
> David
>
>
> On 7/6/2022 4:26 PM, David Li wrote:
>
> If I'm not mistaken, what you want is basically an extension type [1] for
> tensors, so you can have a column where each row contains a tensor/matrix.
> This has been discussed for quite some time [2].
>
> Incidentally, you can keep the three-field representation but pack it into
> a single toplevel field with the Struct type.
>
> [1]: https://arrow.apache.org/docs/python/extending_types.html
> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>
> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>
> I have tabular data with one record field of type scipy.sparse.csr_matrix.
> I want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field  instead of the custom
> three field representation. Is that possible? Incidentally, the shape of
> the csr_matrix is typically (1,N) where N may vary for different records.
> But I don't think "typically (1,N)" matters. It would work with variable
> shape (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().
>
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>
> Hey David,
>
> I don't think Table is designed in a way that you could "populate" it with
> a 2D tensor. It should rather be populated with a collection of equal
> length arrays.
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
> What are you looking to accomplish? What access patterns are you expecting?
>
> Rok
>
> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>
> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>
> Hi Rok,
>
> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:
>
> schema = pyarrow.schema(fields, metadata=metadata)
> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>
> where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?
>
> Thanks,
> David
>
>
>
> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>
> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>
> Rok
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>
> So, I guess this is supported in 8.0.0. I can do this:
>
> *import *numpy *as *np*import *pyarrow *as *pa*from *scipy.sparse *import *csr_matrix
>
>
>
> a = np.random.rand(100)
> a[a < .9] = 0.0
> s = csr_matrix(a)
> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>
>
>
> Now, how do I use that to build a pyarrow table? Stay tuned...
>
>
> On 7/1/2022 8:19 AM, dl wrote:
>
> I find pyarrow.SparseCSRMatrix mentioned here
> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
> But how do I use that? Is there documentation for that class?
>
>
> On 7/1/2022 7:47 AM, dl wrote:
>
>
> Hi,
>
> I'm trying to understand support for sparse tensors in Arrow. It looks
> like there is "experimental" support using the C++ API
> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
> When was this introduced? I see in the code base here
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
> Cython sparse array classes. Can these be accessed using the Python API.
> Are they included in the 8.0.0 release? Is there any other support for
> sparse arrays/tensors in the Python API? Are there good examples for any of
> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>
> Thanks,
> David
>
>
>
>
>
>

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Are there _any_ examples of controlling conversion with the __arrow_array__
protocol? I haven't been able to find one. For array type inference, is it
enough just to add the __arrow_array__ method to the custom array class or
does the class need to be registered somewhere?  
  
Thanks...  
  

On 7/13/2022 9:45 AM, David Li wrote:  

> If `l` is a plain list there, I don't think it's possible. The
> __arrow_array__ protocol relies on you to have a type that you can define
> the method on. I also don't think there are other customization hooks for
> pa.array() but maybe someone else knows better.  
>
>
>  
>
>
> On Tue, Jul 12, 2022, at 17:18, dl via user wrote:  
>
>

>> Hi David,  
>
>>

>>  
>
>>

>> Are there any good examples for the [first
section](https://arrow.apache.org/docs/python/extending_types.html#controlling-
conversion-to-pyarrow-array-with-the-arrow-array-protocol) of your reference
[1]: Controlling conversion to pyarrow.Array with the __arrow_array__
protocol?  
>
>>

>>  
>
>>

>> I find examples of creating an extension array using an extension type with
explicit code in
[test_extension_type.py](https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py),
e.g. in test_ext_array_basics. I'm thinking it might be possible to have the
array type inferred by pyarrow.array() or pyarrow.Table.from_arrays() using a
extension array type as suggested there. Am I right about this? If so is there
a good example? I haven't been able to get this to work.  
>
>>

>>  
>
>>

>> For the record, here is what I can do.  
>
>>

>>  
>
>>  
>>  
>>     l = list()

>>     **for** i **in** range(4):

>>         s = csr_matrix(random_dense())

>>         struct = [( **'shape'** , s.shape),

>>                   ( **'keys'** , s.data),

>>                   ( **'indexes'** , s.indices)]

>>         l.append(struct) __ struct_type = pa.struct([( **'shape'** ,
pa.list_(pa.int32())),

>>                               ( **'keys'** , pa.list_(pa.float64())),

>>                               ( **'indexes'** , pa.list_(pa.int64()))])

>>     arrow_array = pa.array(l,struct_type)

>>     extension_array = pa.ExtensionArray.from_storage(SparseStructType(),
arrow_array)

>>  
>>     **class** SparseStructType(pa.PyExtensionType):

>>         storage_type = pa.struct([( **'shape'** , pa.list_(pa.int32())),

>>                                   ( **'keys'** , pa.list_(pa.float64())),

>>                                   ( **'indexes'** , pa.list_(pa.int64()))])

>>         **def** __init__(self):

>>             pa.PyExtensionType.__init__(self,self.storage_type)

>>  
>>         **def** __reduce__(self):

>>             **return** SparseStructType, ()

>>  
>>

>>  
>
>>

>> I would like to be able to do something like  
>
>>

>>  
>
>>

>>  
>
>>  
>>  
>>     extension_array = pa.array(l,SparseStructType())

>>  
>>

>>  
>
>>

>> having the extension type of the array inferred by pa.array. Is that
possible?  
>
>>

>>  
>
>>

>> Thanks,  
>
>>

>> David  
>
>>

>>  
>
>>

>>  
>
>>

>> On 7/6/2022 4:26 PM, David Li wrote:  
>
>>

>>> If I'm not mistaken, what you want is basically an extension type [1] for
tensors, so you can have a column where each row contains a tensor/matrix.
This has been discussed for quite some time [2].  
>
>>>

>>>  
>
>>>

>>> Incidentally, you can keep the three-field representation but pack it into
a single toplevel field with the Struct type.  
>
>>>

>>>  
>
>>>

>>> [1]: <https://arrow.apache.org/docs/python/extending_types.html>  
>
>>>

>>> [2]: <https://issues.apache.org/jira/browse/ARROW-1614>  
>
>>>

>>>  
>
>>>

>>> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:  
>
>>>

>>>> I have tabular data with one record field of type
scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow
table. I had been first converting the csr_matrix first to a custom
representation using three fields (shape, keys, indices) and building the
pyarrow table using a schema with the types of these fields and table data
with a separate list for each field (and each list having one entry per input
record). I was hoping I could use a single pyarrow.SparseCSRMatrix field
instead of the custom three field representation. Is that possible?
Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary
for different records. But I don't think "typically (1,N)" matters. It would
work with variable shape (M,N). The shape field has type pyarrow.List with
value_type = pyarrow.int32().  
>
>>>>

>>>>  
>
>>>>

>>>>  
>
>>>>

>>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:  
>
>>>>

>>>>> Hey David,  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> I don't think Table is designed in a way that you could "populate" it
with a 2D tensor. It should rather be populated with a collection of equal
length arrays.  
>
>>>>>

>>>>> Sparse CSR tensor on the other hand is composed of three arrays
(indices, indptr, values) and you need a bit more involved logic to manipulate
those than regular arrays. See [1] for memory layout definition.  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> What are you looking to accomplish? What access patterns are you
expecting?  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> Rok  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> [1]
<https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> On Wed, Jul 6, 2022 at 10:48 PM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>>>>

>>>>>> Hi Rok,  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
need to build a table with rows which include a field of this type. I don't
see a related example in the test module. I'm doing something like:  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> schema = pyarrow.schema(fields, metadata=metadata)  
>
>>>>>>

>>>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> where fields is a list of tuples of the form (field_name,
pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be
for a SparseCSRMatrix field? Or will this not work?  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> Thanks,  
>
>>>>>>

>>>>>> David  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>>>>>

>>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>  
>
>>>>>>>

>>>>>>>  
>
>>>>>>>

>>>>>>> Rok  
>
>>>>>>>

>>>>>>>  
>
>>>>>>>

>>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>>>>>

>>>>>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>  
>>>>>>>>  
>>>>>>>>     **import** numpy **as** np

>>>>>>>>     **import** pyarrow **as** pa

>>>>>>>>     **from** scipy.sparse **import** csr_matrix

>>>>>>>>  
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  
>>>>>>>>     a = np.random.rand(100)

>>>>>>>>     a[a < .9] = 0.0

>>>>>>>>     s = csr_matrix(a)

>>>>>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>>>>>  
>>>>>>>>  
>>>>>>>>  
>>>>>>>>

>>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>>>>>

>>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>
>>>>>>>>>

>>>>>>>>>  
>
>>>>>>>>>

>>>>>>>>>  
>
>>>>>>>>>

>>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>>>>>

>>>>>>>>>>  
>
>>>>>>>>>>

>>>>>>>>>> Hi,  
>
>>>>>>>>>>

>>>>>>>>>>  
>
>>>>>>>>>>

>>>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It
looks like there is ["experimental" support using the C++
API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
tensors). When was this introduced? I see in the code base
[here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
Cython sparse array classes. Can these be accessed using the Python API. Are
they included in the 8.0.0 release? Is there any other support for sparse
arrays/tensors in the Python API? Are there good examples for any of this, in
particular for using the 8.0.0 Python API to create sparse tensors?  
>
>>>>>>>>>>

>>>>>>>>>>  
>
>>>>>>>>>>

>>>>>>>>>> Thanks,  
>
>>>>>>>>>>

>>>>>>>>>> David  
>
>>>>>>>>>>

>>>>>>>>>>  
>
>>>>>>>>>>

>>>>>>>>>>  
>
>>>

>>>  
>
>
>  
>

  


Re: support for sparse tensors

Posted by David Li <li...@apache.org>.
If `l` is a plain list there, I don't think it's possible. The __arrow_array__ protocol relies on you to have a type that you can define the method on. I also don't think there are other customization hooks for pa.array() but maybe someone else knows better.

On Tue, Jul 12, 2022, at 17:18, dl via user wrote:
> Hi David,
> 
> Are there any good examples for the first section <https://arrow.apache.org/docs/python/extending_types.html#controlling-conversion-to-pyarrow-array-with-the-arrow-array-protocol> of your reference [1]: Controlling conversion to pyarrow.Array with the __arrow_array__ protocol?
> 
> I find examples of creating an extension array using an extension type with explicit code in test_extension_type.py <https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py>, e.g. in test_ext_array_basics. I'm thinking it might be possible to have the array type inferred by pyarrow.array() or pyarrow.Table.from_arrays() using a extension array type as suggested there. Am I right about this? If so is there a good example? I haven't been able to get this to work.
> 
> For the record, here is what I can do.
> 
> l = list()
> *for *i *in *range(4):
>     s = csr_matrix(random_dense())
>     struct = [(*'shape'*, s.shape),
>               (*'keys'*, s.data),
>               (*'indexes'*, s.indices)]
>     l.append(struct)*
*struct_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                           (*'keys'*, pa.list_(pa.float64())),
>                           (*'indexes'*, pa.list_(pa.int64()))])
> arrow_array = pa.array(l,struct_type)
> extension_array = pa.ExtensionArray.from_storage(SparseStructType(), arrow_array)
> 
> *class *SparseStructType(pa.PyExtensionType):
>     storage_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                               (*'keys'*, pa.list_(pa.float64())),
>                               (*'indexes'*, pa.list_(pa.int64()))])
>     *def *__init__(self):
>         pa.PyExtensionType.__init__(self,self.storage_type)
> 
>     *def *__reduce__(self):
>         *return *SparseStructType, ()
> 
> I would like to be able to do something like
> 
> 
> extension_array = pa.array(l,SparseStructType())
> 
> having the extension type of the array inferred by pa.array. Is that possible?
> 
> Thanks,
> David
> 
> 
> On 7/6/2022 4:26 PM, David Li wrote:
>> If I'm not mistaken, what you want is basically an extension type [1] for tensors, so you can have a column where each row contains a tensor/matrix. This has been discussed for quite some time [2].
>> 
>> Incidentally, you can keep the three-field representation but pack it into a single toplevel field with the Struct type. 
>> 
>> [1]: https://arrow.apache.org/docs/python/extending_types.html
>> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>> 
>> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>>> I have tabular data with one record field of type scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow table. I had been first converting the csr_matrix first to a custom representation using three fields (shape, keys, indices) and building the pyarrow table using a schema with the types of these fields and table data with a separate list for each field (and each list having one entry per input record). I was hoping I could use a single pyarrow.SparseCSRMatrix field  instead of the custom three field representation. Is that possible? Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary for different records. But I don't think "typically (1,N)" matters. It would work with variable shape (M,N). The shape field has type pyarrow.List with value_type = pyarrow.int32().
>>> 
>>> 
>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>>>> Hey David, 
>>>> 
>>>> I don't think Table is designed in a way that you could "populate" it with a 2D tensor. It should rather be populated with a collection of equal length arrays.
>>>> Sparse CSR tensor on the other hand is composed of three arrays (indices, indptr, values) and you need a bit more involved logic to manipulate those than regular arrays. See [1] for memory layout definition.
>>>> 
>>>> What are you looking to accomplish? What access patterns are you expecting?
>>>> 
>>>> Rok
>>>> 
>>>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>>>> 
>>>> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>>>>> Hi Rok,
>>>>> 
>>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need to build a table with rows which include a field of this type. I don't see a related example in the test module. I'm doing something like:
>>>>> 
>>>>> schema = pyarrow.schema(fields, metadata=metadata)
>>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>>>> 
>>>>> where fields is a list of tuples of the form (field_name, pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a SparseCSRMatrix field? Or will this not work?
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py 
>>>>>> 
>>>>>> Rok
>>>>>> 
>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>>>>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>>>> 
>>>>>>> *import *numpy *as *np
>>>>>>> *import *pyarrow *as *pa
>>>>>>> *from *scipy.sparse *import *csr_matrix
>>>>>>> 
>>>>>>> 
>>>>>>> a = np.random.rand(100)
>>>>>>> a[a < .9] = 0.0
>>>>>>> s = csr_matrix(a)
>>>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>>>> 
>>>>>>> 
>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned here <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>. But how do I use that? Is there documentation for that class?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It looks like there is "experimental" support using the C++ API <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>. When was this introduced? I see in the code base here <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> Cython sparse array classes. Can these be accessed using the Python API. Are they included in the 8.0.0 release? Is there any other support for sparse arrays/tensors in the Python API? Are there good examples for any of this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>> 

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Hi David,  
  
Are there any good examples for the [first
section](https://arrow.apache.org/docs/python/extending_types.html#controlling-
conversion-to-pyarrow-array-with-the-arrow-array-protocol) of your reference
[1]: Controlling conversion to pyarrow.Array with the __arrow_array__
protocol?  
  
I find examples of creating an extension array using an extension type with
explicit code in
[test_extension_type.py](https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py),
e.g. in test_ext_array_basics. I'm thinking it might be possible to have the
array type inferred by pyarrow.array() or pyarrow.Table.from_arrays() using a
extension array type as suggested there. Am I right about this? If so is there
a good example? I haven't been able to get this to work.  
  
For the record, here is what I can do.  

    
    
    l = list()
    for i in range(4):
        s = csr_matrix(random_dense())
        struct = [('shape', s.shape),
                  ('keys', s.data),
                  ('indexes', s.indices)]
        l.append(struct)
    struct_type = pa.struct([('shape', pa.list_(pa.int32())),
                              ('keys', pa.list_(pa.float64())),
                              ('indexes', pa.list_(pa.int64()))])
    arrow_array = pa.array(l,struct_type)
    extension_array = pa.ExtensionArray.from_storage(SparseStructType(), arrow_array)
    
    class SparseStructType(pa.PyExtensionType):
        storage_type = pa.struct([('shape', pa.list_(pa.int32())),
                                  ('keys', pa.list_(pa.float64())),
                                  ('indexes', pa.list_(pa.int64()))])
        def __init__(self):
            pa.PyExtensionType.__init__(self,self.storage_type)
    
        def __reduce__(self):
            return SparseStructType, ()

  
I would like to be able to do something like  
  

    
    
    extension_array = pa.array(l,SparseStructType())

  
having the extension type of the array inferred by pa.array. Is that possible?  
  
Thanks,  
David  
  

On 7/6/2022 4:26 PM, David Li wrote:  

> If I'm not mistaken, what you want is basically an extension type [1] for
> tensors, so you can have a column where each row contains a tensor/matrix.
> This has been discussed for quite some time [2].  
>
>
>  
>
>
> Incidentally, you can keep the three-field representation but pack it into a
> single toplevel field with the Struct type.  
>
>
>  
>
>
> [1]: <https://arrow.apache.org/docs/python/extending_types.html>  
>
>
> [2]: <https://issues.apache.org/jira/browse/ARROW-1614>
>
>  
>
>
> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:  
>
>

>> I have tabular data with one record field of type scipy.sparse.csr_matrix.
I want to convert this tabular data to a pyarrow table. I had been first
converting the csr_matrix first to a custom representation using three fields
(shape, keys, indices) and building the pyarrow table using a schema with the
types of these fields and table data with a separate list for each field (and
each list having one entry per input record). I was hoping I could use a
single pyarrow.SparseCSRMatrix field instead of the custom three field
representation. Is that possible? Incidentally, the shape of the csr_matrix is
typically (1,N) where N may vary for different records. But I don't think
"typically (1,N)" matters. It would work with variable shape (M,N). The shape
field has type pyarrow.List with value_type = pyarrow.int32().  
>
>>

>>  
>
>>

>>  
>
>>

>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:  
>
>>

>>> Hey David,  
>
>>>

>>>  
>
>>>

>>> I don't think Table is designed in a way that you could "populate" it with
a 2D tensor. It should rather be populated with a collection of equal length
arrays.  
>
>>>

>>> Sparse CSR tensor on the other hand is composed of three arrays (indices,
indptr, values) and you need a bit more involved logic to manipulate those
than regular arrays. See [1] for memory layout definition.  
>
>>>

>>>  
>
>>>

>>> What are you looking to accomplish? What access patterns are you
expecting?  
>
>>>

>>>  
>
>>>

>>> Rok  
>
>>>

>>>  
>
>>>

>>> [1] <https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>  
>
>>>

>>>  
>
>>>

>>> On Wed, Jul 6, 2022 at 10:48 PM dl
<[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>>>

>>>> Hi Rok,  
>
>>>>

>>>>  
>
>>>>

>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
need to build a table with rows which include a field of this type. I don't
see a related example in the test module. I'm doing something like:  
>
>>>>

>>>>  
>
>>>>

>>>> schema = pyarrow.schema(fields, metadata=metadata)  
>
>>>>

>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>
>>>>

>>>>  
>
>>>>

>>>> where fields is a list of tuples of the form (field_name, pyarrow_type),
e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
SparseCSRMatrix field? Or will this not work?  
>
>>>>

>>>>  
>
>>>>

>>>> Thanks,  
>
>>>>

>>>> David  
>
>>>>

>>>>  
>
>>>>

>>>>  
>
>>>>

>>>>  
>
>>>>

>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>>>

>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> Rok  
>
>>>>>

>>>>>  
>
>>>>>

>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>>>

>>>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>>>

>>>>>>  
>
>>>>>>  
>>>>>>  
>>>>>>     **import** numpy **as** np

>>>>>>     **import** pyarrow **as** pa

>>>>>>     **from** scipy.sparse **import** csr_matrix

>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>     a = np.random.rand(100)

>>>>>>     a[a < .9] = 0.0

>>>>>>     s = csr_matrix(a)

>>>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>>>  
>>>>>>  
>>>>>>

>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>>  
>
>>>>>>

>>>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>>>

>>>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>
>>>>>>>

>>>>>>>  
>
>>>>>>>

>>>>>>>  
>
>>>>>>>

>>>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> Hi,  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It
looks like there is ["experimental" support using the C++
API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
tensors). When was this introduced? I see in the code base
[here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
Cython sparse array classes. Can these be accessed using the Python API. Are
they included in the 8.0.0 release? Is there any other support for sparse
arrays/tensors in the Python API? Are there good examples for any of this, in
particular for using the 8.0.0 Python API to create sparse tensors?  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>> Thanks,  
>
>>>>>>>>

>>>>>>>> David  
>
>>>>>>>>

>>>>>>>>  
>
>>>>>>>>

>>>>>>>>  
>
>
>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
> Not being familiar with SciPy/NumPy APIs off the top of my head: won't
that create a PyArrow array whose rows are the individual values of the
matrix?

I believe so.

> Is that what's desired, one matrix/array, or is it one matrix/row?

I am not sure :). David Lee can you elaborate?

On Thu, Jul 7, 2022 at 2:08 AM David Li <li...@apache.org> wrote:

> Not being familiar with SciPy/NumPy APIs off the top of my head: won't
> that create a PyArrow array whose rows are the individual values of the
> matrix? Is that what's desired, one matrix/array, or is it one matrix/row?
>
> On Wed, Jul 6, 2022, at 19:38, Rok Mihevc wrote:
>
> If you're starting with a single (1,N) scipy.csr_matrix and just want to
> go to an array you can also:
>
> scipy_csr_matrix = csr_matrix((data, indices, indptr), shape=shape)
> sparse_tensor = pa.SparseCSRMatrix.from_scipy(scipy_csr_matrix)
> arr = pa.array(sparse_tensor.to_tensor().to_numpy()[0])
>
> But that assumes 1-dimension and goes to dense representation.
>
> On Thu, Jul 7, 2022 at 1:27 AM David Li <li...@apache.org> wrote:
>
>
> If I'm not mistaken, what you want is basically an extension type [1] for
> tensors, so you can have a column where each row contains a tensor/matrix.
> This has been discussed for quite some time [2].
>
> Incidentally, you can keep the three-field representation but pack it into
> a single toplevel field with the Struct type.
>
> [1]: https://arrow.apache.org/docs/python/extending_types.html
> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>
> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>
> I have tabular data with one record field of type scipy.sparse.csr_matrix.
> I want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field  instead of the custom
> three field representation. Is that possible? Incidentally, the shape of
> the csr_matrix is typically (1,N) where N may vary for different records.
> But I don't think "typically (1,N)" matters. It would work with variable
> shape (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().
>
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>
> Hey David,
>
> I don't think Table is designed in a way that you could "populate" it with
> a 2D tensor. It should rather be populated with a collection of equal
> length arrays.
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
> What are you looking to accomplish? What access patterns are you expecting?
>
> Rok
>
> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>
> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>
> Hi Rok,
>
> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:
>
> schema = pyarrow.schema(fields, metadata=metadata)
> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>
> where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?
>
> Thanks,
> David
>
>
>
> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>
> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>
> Rok
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>
> So, I guess this is supported in 8.0.0. I can do this:
>
> *import *numpy *as *np*import *pyarrow *as *pa*from *scipy.sparse *import *csr_matrix
>
> a = np.random.rand(100)
> a[a < .9] = 0.0
> s = csr_matrix(a)
> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>
> Now, how do I use that to build a pyarrow table? Stay tuned...
>
>
> On 7/1/2022 8:19 AM, dl wrote:
>
> I find pyarrow.SparseCSRMatrix mentioned here
> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
> But how do I use that? Is there documentation for that class?
>
>
> On 7/1/2022 7:47 AM, dl wrote:
>
>
> Hi,
>
> I'm trying to understand support for sparse tensors in Arrow. It looks
> like there is "experimental" support using the C++ API
> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
> When was this introduced? I see in the code base here
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
> Cython sparse array classes. Can these be accessed using the Python API.
> Are they included in the 8.0.0 release? Is there any other support for
> sparse arrays/tensors in the Python API? Are there good examples for any of
> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>
> Thanks,
> David
>
>
>
>
>

Re: support for sparse tensors

Posted by David Li <li...@apache.org>.
Not being familiar with SciPy/NumPy APIs off the top of my head: won't that create a PyArrow array whose rows are the individual values of the matrix? Is that what's desired, one matrix/array, or is it one matrix/row?

On Wed, Jul 6, 2022, at 19:38, Rok Mihevc wrote:
> If you're starting with a single (1,N) scipy.csr_matrix and just want to go to an array you can also:
> 
> scipy_csr_matrix = csr_matrix((data, indices, indptr), shape=shape)
> sparse_tensor = pa.SparseCSRMatrix.from_scipy(scipy_csr_matrix)
> arr = pa.array(sparse_tensor.to_tensor().to_numpy()[0])
> 
> But that assumes 1-dimension and goes to dense representation.
> 
> On Thu, Jul 7, 2022 at 1:27 AM David Li <li...@apache.org> wrote:
>> __
>> If I'm not mistaken, what you want is basically an extension type [1] for tensors, so you can have a column where each row contains a tensor/matrix. This has been discussed for quite some time [2].
>> 
>> Incidentally, you can keep the three-field representation but pack it into a single toplevel field with the Struct type. 
>> 
>> [1]: https://arrow.apache.org/docs/python/extending_types.html
>> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>> 
>> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>>> I have tabular data with one record field of type scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow table. I had been first converting the csr_matrix first to a custom representation using three fields (shape, keys, indices) and building the pyarrow table using a schema with the types of these fields and table data with a separate list for each field (and each list having one entry per input record). I was hoping I could use a single pyarrow.SparseCSRMatrix field  instead of the custom three field representation. Is that possible? Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary for different records. But I don't think "typically (1,N)" matters. It would work with variable shape (M,N). The shape field has type pyarrow.List with value_type = pyarrow.int32().
>>> 
>>> 
>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>>>> Hey David, 
>>>> 
>>>> I don't think Table is designed in a way that you could "populate" it with a 2D tensor. It should rather be populated with a collection of equal length arrays.
>>>> Sparse CSR tensor on the other hand is composed of three arrays (indices, indptr, values) and you need a bit more involved logic to manipulate those than regular arrays. See [1] for memory layout definition.
>>>> 
>>>> What are you looking to accomplish? What access patterns are you expecting?
>>>> 
>>>> Rok
>>>> 
>>>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>>>> 
>>>> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>>>>> Hi Rok,
>>>>> 
>>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need to build a table with rows which include a field of this type. I don't see a related example in the test module. I'm doing something like:
>>>>> 
>>>>> schema = pyarrow.schema(fields, metadata=metadata)
>>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>>>> 
>>>>> where fields is a list of tuples of the form (field_name, pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a SparseCSRMatrix field? Or will this not work?
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py 
>>>>>> 
>>>>>> Rok
>>>>>> 
>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>>>>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>>>> 
>>>>>>> *import *numpy *as *np
>>>>>>> *import *pyarrow *as *pa
>>>>>>> *from *scipy.sparse *import *csr_matrix
>>>>>>> 
>>>>>>> a = np.random.rand(100)
>>>>>>> a[a < .9] = 0.0
>>>>>>> s = csr_matrix(a)
>>>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>>>> 
>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned here <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>. But how do I use that? Is there documentation for that class?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It looks like there is "experimental" support using the C++ API <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>. When was this introduced? I see in the code base here <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> Cython sparse array classes. Can these be accessed using the Python API. Are they included in the 8.0.0 release? Is there any other support for sparse arrays/tensors in the Python API? Are there good examples for any of this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>> 

Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
If you're starting with a single (1,N) scipy.csr_matrix and just want to go
to an array you can also:

scipy_csr_matrix = csr_matrix((data, indices, indptr), shape=shape)
sparse_tensor = pa.SparseCSRMatrix.from_scipy(scipy_csr_matrix)
arr = pa.array(sparse_tensor.to_tensor().to_numpy()[0])

But that assumes 1-dimension and goes to dense representation.

On Thu, Jul 7, 2022 at 1:27 AM David Li <li...@apache.org> wrote:

> If I'm not mistaken, what you want is basically an extension type [1] for
> tensors, so you can have a column where each row contains a tensor/matrix.
> This has been discussed for quite some time [2].
>
> Incidentally, you can keep the three-field representation but pack it into
> a single toplevel field with the Struct type.
>
> [1]: https://arrow.apache.org/docs/python/extending_types.html
> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>
> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>
> I have tabular data with one record field of type scipy.sparse.csr_matrix.
> I want to convert this tabular data to a pyarrow table. I had been first
> converting the csr_matrix first to a custom representation using three
> fields (shape, keys, indices) and building the pyarrow table using a schema
> with the types of these fields and table data with a separate list for each
> field (and each list having one entry per input record). I was hoping I
> could use a single pyarrow.SparseCSRMatrix field  instead of the custom
> three field representation. Is that possible? Incidentally, the shape of
> the csr_matrix is typically (1,N) where N may vary for different records.
> But I don't think "typically (1,N)" matters. It would work with variable
> shape (M,N). The shape field has type pyarrow.List with value_type =
> pyarrow.int32().
>
>
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>
> Hey David,
>
> I don't think Table is designed in a way that you could "populate" it with
> a 2D tensor. It should rather be populated with a collection of equal
> length arrays.
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
> What are you looking to accomplish? What access patterns are you expecting?
>
> Rok
>
> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>
> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>
> Hi Rok,
>
> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:
>
> schema = pyarrow.schema(fields, metadata=metadata)
> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>
> where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?
>
> Thanks,
> David
>
>
>
> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>
> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>
> Rok
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>
> So, I guess this is supported in 8.0.0. I can do this:
>
> *import *numpy *as *np*import *pyarrow *as *pa*from *scipy.sparse *import *csr_matrix
>
> a = np.random.rand(100)
> a[a < .9] = 0.0
> s = csr_matrix(a)
> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>
> Now, how do I use that to build a pyarrow table? Stay tuned...
>
>
> On 7/1/2022 8:19 AM, dl wrote:
>
> I find pyarrow.SparseCSRMatrix mentioned here
> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
> But how do I use that? Is there documentation for that class?
>
>
> On 7/1/2022 7:47 AM, dl wrote:
>
>
> Hi,
>
> I'm trying to understand support for sparse tensors in Arrow. It looks
> like there is "experimental" support using the C++ API
> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
> When was this introduced? I see in the code base here
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
> Cython sparse array classes. Can these be accessed using the Python API.
> Are they included in the 8.0.0 release? Is there any other support for
> sparse arrays/tensors in the Python API? Are there good examples for any of
> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>
> Thanks,
> David
>
>
>
>

Re: support for sparse tensors

Posted by David Li <li...@apache.org>.
If I'm not mistaken, what you want is basically an extension type [1] for tensors, so you can have a column where each row contains a tensor/matrix. This has been discussed for quite some time [2].

Incidentally, you can keep the three-field representation but pack it into a single toplevel field with the Struct type. 

[1]: https://arrow.apache.org/docs/python/extending_types.html
[2]: https://issues.apache.org/jira/browse/ARROW-1614

On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
> I have tabular data with one record field of type scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow table. I had been first converting the csr_matrix first to a custom representation using three fields (shape, keys, indices) and building the pyarrow table using a schema with the types of these fields and table data with a separate list for each field (and each list having one entry per input record). I was hoping I could use a single pyarrow.SparseCSRMatrix field  instead of the custom three field representation. Is that possible? Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary for different records. But I don't think "typically (1,N)" matters. It would work with variable shape (M,N). The shape field has type pyarrow.List with value_type = pyarrow.int32().
> 
> 
> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>> Hey David, 
>> 
>> I don't think Table is designed in a way that you could "populate" it with a 2D tensor. It should rather be populated with a collection of equal length arrays.
>> Sparse CSR tensor on the other hand is composed of three arrays (indices, indptr, values) and you need a bit more involved logic to manipulate those than regular arrays. See [1] for memory layout definition.
>> 
>> What are you looking to accomplish? What access patterns are you expecting?
>> 
>> Rok
>> 
>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>> 
>> On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:
>>> Hi Rok,
>>> 
>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need to build a table with rows which include a field of this type. I don't see a related example in the test module. I'm doing something like:
>>> 
>>> schema = pyarrow.schema(fields, metadata=metadata)
>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>> 
>>> where fields is a list of tuples of the form (field_name, pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a SparseCSRMatrix field? Or will this not work?
>>> 
>>> Thanks,
>>> David
>>> 
>>> 
>>> 
>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py 
>>>> 
>>>> Rok
>>>> 
>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>> 
>>>>> *import *numpy *as *np
>>>>> *import *pyarrow *as *pa
>>>>> *from *scipy.sparse *import *csr_matrix
>>>>> 
>>>>> a = np.random.rand(100)
>>>>> a[a < .9] = 0.0
>>>>> s = csr_matrix(a)
>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>> 
>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>> 
>>>>> 
>>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>>> I find pyarrow.SparseCSRMatrix mentioned here <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>. But how do I use that? Is there documentation for that class?
>>>>>> 
>>>>>> 
>>>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It looks like there is "experimental" support using the C++ API <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>. When was this introduced? I see in the code base here <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> Cython sparse array classes. Can these be accessed using the Python API. Are they included in the 8.0.0 release? Is there any other support for sparse arrays/tensors in the Python API? Are there good examples for any of this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> 
>>>>>>> 

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
I have tabular data with one record field of type scipy.sparse.csr_matrix. I
want to convert this tabular data to a pyarrow table. I had been first
converting the csr_matrix first to a custom representation using three fields
(shape, keys, indices) and building the pyarrow table using a schema with the
types of these fields and table data with a separate list for each field (and
each list having one entry per input record). I was hoping I could use a
single pyarrow.SparseCSRMatrix field instead of the custom three field
representation. Is that possible? Incidentally, the shape of the csr_matrix is
typically (1,N) where N may vary for different records. But I don't think
"typically (1,N)" matters. It would work with variable shape (M,N). The shape
field has type pyarrow.List with value_type = pyarrow.int32().  
  

On 7/6/2022 2:53 PM, Rok Mihevc wrote:  

> Hey David,
>
>  
>
>
> I don't think Table is designed in a way that you could "populate" it with a
> 2D tensor. It should rather be populated with a collection of equal length
> arrays.
>
> Sparse CSR tensor on the other hand is composed of three arrays (indices,
> indptr, values) and you need a bit more involved logic to manipulate those
> than regular arrays. See [1] for memory layout definition.
>
>  
>
>
> What are you looking to accomplish? What access patterns are you expecting?
>
>  
>
>
> Rok
>
>  
>
>
> [1] <https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs>
>
>  
>
>
> On Wed, Jul 6, 2022 at 10:48 PM dl
> <[dydxlaw@yahoo.com](mailto:dydxlaw@yahoo.com)> wrote:  
>
>

>> Hi Rok,  
>  
>  What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:  
>  
>  schema = pyarrow.schema(fields, metadata=metadata)  
>  table = pyarrow.Table.from_arrays(table_data, schema=schema)  
>  
>  where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>

>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:  
>
>>

>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
<https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>

>>>

>>>  
>
>>>

>>> Rok

>>>

>>>  
>
>>>

>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user
<[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>>>

>>>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>>>  
>>>>  
>>>>     import numpy as np

>>>>     import pyarrow as pa

>>>>     from scipy.sparse import csr_matrix

>>>>  
>>>>  
>>>>  
>>>>     a = np.random.rand(100)

>>>>     a[a < .9] = 0.0

>>>>     s = csr_matrix(a)

>>>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>>>  
>>>>

>>>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>  
>
>>>>

>>>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>>>

>>>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>  
>
>>>>>

>>>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>>>

>>>>>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>>>

>>>>>  
>
>>>>

>>>>  
>
>>

>>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
Hey David,

I don't think Table is designed in a way that you could "populate" it with
a 2D tensor. It should rather be populated with a collection of equal
length arrays.
Sparse CSR tensor on the other hand is composed of three arrays (indices,
indptr, values) and you need a bit more involved logic to manipulate those
than regular arrays. See [1] for memory layout definition.

What are you looking to accomplish? What access patterns are you expecting?

Rok

[1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs

On Wed, Jul 6, 2022 at 10:48 PM dl <dy...@yahoo.com> wrote:

> Hi Rok,
>
> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I
> need to build a table with rows which include a field of this type. I don't
> see a related example in the test module. I'm doing something like:
>
> schema = pyarrow.schema(fields, metadata=metadata)
> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>
> where fields is a list of tuples of the form (field_name, pyarrow_type),
> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a
> SparseCSRMatrix field? Or will this not work?
>
> Thanks,
> David
>
>
> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>
> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>
> Rok
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:
>
>> So, I guess this is supported in 8.0.0. I can do this:
>>
>> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix
>>
>> a = np.random.rand(100)
>> a[a < .9] = 0.0
>> s = csr_matrix(a)
>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>
>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>
>> On 7/1/2022 8:19 AM, dl wrote:
>>
>> I find pyarrow.SparseCSRMatrix mentioned here
>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>> But how do I use that? Is there documentation for that class?
>>
>> On 7/1/2022 7:47 AM, dl wrote:
>>
>>
>> Hi,
>>
>> I'm trying to understand support for sparse tensors in Arrow. It looks
>> like there is "experimental" support using the C++ API
>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>> When was this introduced? I see in the code base here
>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
>> Cython sparse array classes. Can these be accessed using the Python API.
>> Are they included in the 8.0.0 release? Is there any other support for
>> sparse arrays/tensors in the Python API? Are there good examples for any of
>> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>>
>> Thanks,
>> David
>>
>>
>>
>>
>>
>

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
Hi Rok,  
  
What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need
to build a table with rows which include a field of this type. I don't see a
related example in the test module. I'm doing something like:  
  
schema = pyarrow.schema(fields, metadata=metadata)  
table = pyarrow.Table.from_arrays(table_data, schema=schema)  
  
where fields is a list of tuples of the form (field_name, pyarrow_type), e.g.
('field1', pyarrow.string()). What should pyarrow_type be for a
SparseCSRMatrix field? Or will this not work?  
  
Thanks,  
David  
  
  

On 7/1/2022 9:18 AM, Rok Mihevc wrote:  

> We lack pyarow sparse tensor documentation (PRs welcome), so tests are
> perhaps most extensive description of what is doable:
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py>
>
>  
>
>
> Rok
>
>  
>
>
> On Fri, Jul 1, 2022 at 5:38 PM dl via user
> <[user@arrow.apache.org](mailto:user@arrow.apache.org)> wrote:  
>
>

>> So, I guess this is supported in 8.0.0. I can do this:  
>
>>  
>>  
>>     import numpy as np

>>     import pyarrow as pa

>>     from scipy.sparse import csr_matrix

>>  
>>  
>>  
>>     a = np.random.rand(100)

>>     a[a < .9] = 0.0

>>     s = csr_matrix(a)

>>     arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)

>>  
>>

>> Now, how do I use that to build a pyarrow table? Stay tuned...  
>  
>
>>

>> On 7/1/2022 8:19 AM, dl wrote:  
>
>>

>>> I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
>  
>
>>>

>>> On 7/1/2022 7:47 AM, dl wrote:  
>
>>>

>>>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>>>

>>>  
>
>>

>>  
>

  


Re: support for sparse tensors

Posted by Rok Mihevc <ro...@gmail.com>.
We lack pyarow sparse tensor documentation (PRs welcome), so tests are
perhaps most extensive description of what is doable:
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py

Rok

On Fri, Jul 1, 2022 at 5:38 PM dl via user <us...@arrow.apache.org> wrote:

> So, I guess this is supported in 8.0.0. I can do this:
>
> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix
>
> a = np.random.rand(100)
> a[a < .9] = 0.0
> s = csr_matrix(a)
> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>
> Now, how do I use that to build a pyarrow table? Stay tuned...
>
> On 7/1/2022 8:19 AM, dl wrote:
>
> I find pyarrow.SparseCSRMatrix mentioned here
> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
> But how do I use that? Is there documentation for that class?
>
> On 7/1/2022 7:47 AM, dl wrote:
>
>
> Hi,
>
> I'm trying to understand support for sparse tensors in Arrow. It looks
> like there is "experimental" support using the C++ API
> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
> When was this introduced? I see in the code base here
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
> Cython sparse array classes. Can these be accessed using the Python API.
> Are they included in the 8.0.0 release? Is there any other support for
> sparse arrays/tensors in the Python API? Are there good examples for any of
> this, in particular for using the 8.0.0 Python API to create sparse tensors?
>
> Thanks,
> David
>
>
>
>
>

Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
So, I guess this is supported in 8.0.0. I can do this:  

    
    
    import numpy as np
    import pyarrow as pa
    from scipy.sparse import csr_matrix
    
    
    
    a = np.random.rand(100)
    a[a < .9] = 0.0
    s = csr_matrix(a)
    arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
    

Now, how do I use that to build a pyarrow table? Stay tuned...  
  

On 7/1/2022 8:19 AM, dl wrote:  

> I find pyarrow.SparseCSRMatrix mentioned
> [here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
> But how do I use that? Is there documentation for that class?  
>  
>
>
> On 7/1/2022 7:47 AM, dl wrote:  
>
>

>>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>
>
>  
>

  


Re: support for sparse tensors

Posted by dl via user <us...@arrow.apache.org>.
I find pyarrow.SparseCSRMatrix mentioned
[here](https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix).
But how do I use that? Is there documentation for that class?  
  

On 7/1/2022 7:47 AM, dl wrote:  

>  
>  Hi,  
>  
>  I'm trying to understand support for sparse tensors in Arrow. It looks like
> there is ["experimental" support using the C++
> API](https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-
> tensors). When was this introduced? I see in the code base
> [here](https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi)
> Cython sparse array classes. Can these be accessed using the Python API. Are
> they included in the 8.0.0 release? Is there any other support for sparse
> arrays/tensors in the Python API? Are there good examples for any of this,
> in particular for using the 8.0.0 Python API to create sparse tensors?  
>  
>  Thanks,  
>  David  
>  
>  
>