You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by H G <za...@gmail.com> on 2022/07/04 10:44:09 UTC

[Python] iloc equivalent for selection by position and setting values?

iloc equivalent for selection by position and setting values?

import pyarrow as pa
import pandas as pd
df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
                   'n_legs': [2, 4, 5, 100],
                   'animals': ["Flamingo", "Horse", "Brittle stars", None]})
table = pa.Table.from_pandas(df)

df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we perform
this in pyarrow?

I did open this on github, but I assume it is not the forum for queries.

Thanks

Re: [Python] iloc equivalent for selection by position and setting values?

Posted by Rok Mihevc <ro...@gmail.com>.
I believe currently updating array values is not possible by design. Using
the approach Michael pointed out you can create a new array to replace the
old one.
See this discussion [1] for more nuance.

Rok

[1] https://lists.apache.org/thread/kph2sk0nqc0yfcb39dmjmh3ljg4dpyfx

On Mon, Jul 4, 2022 at 1:15 PM Michael <mi...@gmail.com>
wrote:

> This section of the cookbook might help:
>
> https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask
>
> Also these methods in the compute module.
>
>
> https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing
> https://arrow.apache.org/docs/python/api/compute.html#selections
>
> Not at my computer, so apologies for not giving a direct example. I think
> coalesce might be the method you need.
>
>
> On Mon, Jul 4, 2022 at 12:44 PM H G <za...@gmail.com> wrote:
>
>> iloc equivalent for selection by position and setting values?
>>
>> import pyarrow as pa
>> import pandas as pd
>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
>>                    'n_legs': [2, 4, 5, 100],
>>                    'animals': ["Flamingo", "Horse", "Brittle stars",
>> None]})
>> table = pa.Table.from_pandas(df)
>>
>> df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we
>> perform this in pyarrow?
>>
>> I did open this on github, but I assume it is not the forum for queries.
>>
>> Thanks
>>
> --
>
> Michael
>

Re: [Python] iloc equivalent for selection by position and setting values?

Posted by Michael <mi...@gmail.com>.
Maybe this?

import pyarrow as pa
import pyarrow.compute as pc
import pandas as pd

df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
                   'n_legs': [2, 4, 5, 100],
                   'animals': ["Flamingo", "Horse", "Brittle stars", None]})
table = pa.Table.from_pandas(df)

old_animals = table.column("animals")  # similar to df.loc[:, 'animals']

new_animals = pc.fill_null(old_animals, "new_value")
table2 = table.set_column(2, "animals", new_animals)


I'm not entirely sure whether this solution makes a copy of the original
table or not. If so, you may want to drop the old column and add a new
column rather than creating table2.

I based this on this:
https://arrow.apache.org/cookbook/py/data.html#replacing-a-column-in-an-existing-table



On Mon, Jul 4, 2022 at 5:14 PM H G <za...@gmail.com> wrote:

> Thanks for the input. Filtering is possible to get the null value using
> table.filter(table['animals'].is_null())
>
> However, I am struggling to set value to this filter. Any suggestions?
>
> On Mon, 4 Jul 2022 at 16:45, Michael <mi...@gmail.com>
> wrote:
>
>> This section of the cookbook might help:
>>
>> https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask
>>
>> Also these methods in the compute module.
>>
>>
>> https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing
>> https://arrow.apache.org/docs/python/api/compute.html#selections
>>
>> Not at my computer, so apologies for not giving a direct example. I think
>> coalesce might be the method you need.
>>
>>
>> On Mon, Jul 4, 2022 at 12:44 PM H G <za...@gmail.com> wrote:
>>
>>> iloc equivalent for selection by position and setting values?
>>>
>>> import pyarrow as pa
>>> import pandas as pd
>>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
>>>                    'n_legs': [2, 4, 5, 100],
>>>                    'animals': ["Flamingo", "Horse", "Brittle stars",
>>> None]})
>>> table = pa.Table.from_pandas(df)
>>>
>>> df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we
>>> perform this in pyarrow?
>>>
>>> I did open this on github, but I assume it is not the forum for queries.
>>>
>>> Thanks
>>>
>> --
>>
>> Michael
>>
>

Re: [Python] iloc equivalent for selection by position and setting values?

Posted by "Lee, David" <Da...@blackrock.com>.
For some reason the fill_null() compute function is missing from the latest docs.

https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_compute.py

Look at the unit tests in test_fill_null().

Sent from my iPad

On Jul 4, 2022, at 8:15 AM, H G <za...@gmail.com> wrote:



External Email: Use caution with links and attachments


Thanks for the input. Filtering is possible to get the null value using
table.filter(table['animals'].is_null())

However, I am struggling to set value to this filter. Any suggestions?

On Mon, 4 Jul 2022 at 16:45, Michael <mi...@gmail.com>> wrote:
This section of the cookbook might help:
https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask<https://urldefense.com/v3/__https://arrow.apache.org/cookbook/py/data.html*filtering-arrays-using-a-mask__;Iw!!KSjYCgUGsB4!cvBXun8VADWdMPHICK36UXk7SCwLLS9c0pMC18SIcA2Sk2W21DsColPqr1T8f4MMXrNpWsW_w6YiPjU5$>

Also these methods in the compute module.

https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing<https://urldefense.com/v3/__https://arrow.apache.org/docs/python/api/compute.html*selecting-multiplexing__;Iw!!KSjYCgUGsB4!cvBXun8VADWdMPHICK36UXk7SCwLLS9c0pMC18SIcA2Sk2W21DsColPqr1T8f4MMXrNpWsW_w-slSjC8$>
https://arrow.apache.org/docs/python/api/compute.html#selections<https://urldefense.com/v3/__https://arrow.apache.org/docs/python/api/compute.html*selections__;Iw!!KSjYCgUGsB4!cvBXun8VADWdMPHICK36UXk7SCwLLS9c0pMC18SIcA2Sk2W21DsColPqr1T8f4MMXrNpWsW_w-cd4a01$>

Not at my computer, so apologies for not giving a direct example. I think coalesce might be the method you need.


On Mon, Jul 4, 2022 at 12:44 PM H G <za...@gmail.com>> wrote:
iloc equivalent for selection by position and setting values?

import pyarrow as pa
import pandas as pd
df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
                   'n_legs': [2, 4, 5, 100],
                   'animals': ["Flamingo", "Horse", "Brittle stars", None]})
table = pa.Table.from_pandas(df)

df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we perform this in pyarrow?

I did open this on github, but I assume it is not the forum for queries.

Thanks
--

Michael

This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. See http://www.blackrock.com/corporate/compliance/email-disclaimers for further information.  Please refer to http://www.blackrock.com/corporate/compliance/privacy-policy for more information about BlackRock’s Privacy Policy.


For a list of BlackRock's office addresses worldwide, see http://www.blackrock.com/corporate/about-us/contacts-locations.

© 2022 BlackRock, Inc. All rights reserved.

Re: [Python] iloc equivalent for selection by position and setting values?

Posted by H G <za...@gmail.com>.
Thanks for the input. Filtering is possible to get the null value using
table.filter(table['animals'].is_null())

However, I am struggling to set value to this filter. Any suggestions?

On Mon, 4 Jul 2022 at 16:45, Michael <mi...@gmail.com>
wrote:

> This section of the cookbook might help:
>
> https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask
>
> Also these methods in the compute module.
>
>
> https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing
> https://arrow.apache.org/docs/python/api/compute.html#selections
>
> Not at my computer, so apologies for not giving a direct example. I think
> coalesce might be the method you need.
>
>
> On Mon, Jul 4, 2022 at 12:44 PM H G <za...@gmail.com> wrote:
>
>> iloc equivalent for selection by position and setting values?
>>
>> import pyarrow as pa
>> import pandas as pd
>> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
>>                    'n_legs': [2, 4, 5, 100],
>>                    'animals': ["Flamingo", "Horse", "Brittle stars",
>> None]})
>> table = pa.Table.from_pandas(df)
>>
>> df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we
>> perform this in pyarrow?
>>
>> I did open this on github, but I assume it is not the forum for queries.
>>
>> Thanks
>>
> --
>
> Michael
>

Re: [Python] iloc equivalent for selection by position and setting values?

Posted by Michael <mi...@gmail.com>.
This section of the cookbook might help:
https://arrow.apache.org/cookbook/py/data.html#filtering-arrays-using-a-mask

Also these methods in the compute module.

https://arrow.apache.org/docs/python/api/compute.html#selecting-multiplexing
https://arrow.apache.org/docs/python/api/compute.html#selections

Not at my computer, so apologies for not giving a direct example. I think
coalesce might be the method you need.


On Mon, Jul 4, 2022 at 12:44 PM H G <za...@gmail.com> wrote:

> iloc equivalent for selection by position and setting values?
>
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame({'year': [2020, 2022, 2019, 2021],
>                    'n_legs': [2, 4, 5, 100],
>                    'animals': ["Flamingo", "Horse", "Brittle stars",
> None]})
> table = pa.Table.from_pandas(df)
>
> df.loc[df["animals"].isnull(), "animals"] = "new_value" # how do we
> perform this in pyarrow?
>
> I did open this on github, but I assume it is not the forum for queries.
>
> Thanks
>
-- 

Michael