You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "krishna deepak (Jira)" <ji...@apache.org> on 2021/09/10 14:55:00 UTC
[jira] [Comment Edited] (ARROW-13939) how to do resampling of arrow table using cython

    [ https://issues.apache.org/jira/browse/ARROW-13939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413228#comment-17413228 ] 

krishna deepak edited comment on ARROW-13939 at 9/10/21, 2:54 PM:
------------------------------------------------------------------

I tried the following to start with 
{code:python}

    cdef shared_ptr[CTable] table = pyarrow_unwrap_table(obj)
    cdef CTable *table_ptr = table.get()
    cdef list timeframes = [3, 5, 15]

    if table_ptr == NULL:
        raise TypeError("not an array")

    cdef CChunkedArray *column1
    cdef CChunkedArray *column2
    cdef CArray *array1
    cdef CArray *array2
    cdef int num_rows = table_ptr.num_rows()
    cdef shared_ptr[CSchema] schema = table_ptr.schema()

    cdef CResult[shared_ptr[CScalar]] val

    column1 = table_ptr.column(0).get()
    column2 = table_ptr.column(1).get()

    chunk_i = 0
    while True:
        array1 = column1.chunk(chunk_i).get()

        if array1 == NULL:
            break

        length = array1.length()
        val = array1.GetScalar(0)
        if val.ok():
            val.ValueOrDie()
{code}

This gives the following error upon compiling


{code:java}
Object of type 'CResult[shared_ptr[CScalar]]' has no attribute 'ValueOrDie'
{code}

Contradictory to the documentation. 

[~willjones127] [~westonpace]


was (Author: krsna):

{code:python}

    cdef shared_ptr[CTable] table = pyarrow_unwrap_table(obj)
    cdef CTable *table_ptr = table.get()
    cdef list timeframes = [3, 5, 15]

    if table_ptr == NULL:
        raise TypeError("not an array")

    cdef CChunkedArray *column1
    cdef CChunkedArray *column2
    cdef CArray *array1
    cdef CArray *array2
    cdef int num_rows = table_ptr.num_rows()
    cdef shared_ptr[CSchema] schema = table_ptr.schema()

    cdef CResult[shared_ptr[CScalar]] val

    column1 = table_ptr.column(0).get()
    column2 = table_ptr.column(1).get()

    chunk_i = 0
    while True:
        array1 = column1.chunk(chunk_i).get()

        if array1 == NULL:
            break

        length = array1.length()
        val = array1.GetScalar(0)
        if val.ok():
            val.ValueOrDie()
{code}

This gives the following error upon compiling


{code:java}
Object of type 'CResult[shared_ptr[CScalar]]' has no attribute 'ValueOrDie'
{code}

Contradictory to the documentation. 

[~willjones127] [~westonpace]

> how to do resampling of arrow table using cython
> ------------------------------------------------
>
>                 Key: ARROW-13939
>                 URL: https://issues.apache.org/jira/browse/ARROW-13939
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Python
>            Reporter: krishna deepak
>            Priority: Minor
>
> Please can someone point me to resources, how to write a resampling code in cython for Arrow table.
>  # Will iterating the whole table be slow in cython?
>  # which is the best to use to append new elements to. Is there a way i create an empty table of same schema and keep appending to it. Or should I use vectors/list and then pass them to create a table.
> Performance is very important for me. Any help is highly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)