You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/16 14:26:22 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #34588: [C++][Python] Add compute kernel for "dictionary_decode"

jorisvandenbossche opened a new issue, #34588:
URL: https://github.com/apache/arrow/issues/34588

   The pyarrow DictionaryArray class has a `dictionary_decode()` method, as a counterpart of the `dictionary_encode()`. However, this is not an actual kernel, but just implemented manually (it's also just a simple "take" of course):
   
   https://github.com/apache/arrow/blob/fe88d9ad5c346786842913c8d2a369db099b5406/python/pyarrow/array.pxi#L2506-L2510
   
   It would be nice to make an actual kernel for this, just like we have a "dictionary_encode" kernel (and eg also "run_end_encode" and "run_end_decode" kernels). Among other things, this makes it available as a function in pyarrow.compute (and for example ensures you can call this on a chunked array more easily).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1559792209

   What is the point of this? Not every function needs to be wrapped in a compute function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1559963696

   > (since `Array.dictionary_decode()` already exists
   
   That might be what you meant, but to be nitpicky: that actually doesn't exist, it's only `DictionaryArray.dictionary_decode()` that exists (which touches on my point that up to now we haven't really been adding type-specific methods on ChunkedArray, since we only have a single chunked class and not type-specific subclasses)
   
   Another potential (consistency) argument: `run_end_decode` also exists as a compute kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"
URL: https://github.com/apache/arrow/issues/34588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1504276524

   > Hi, where is the code of dictionary_encode in cpp?
   
   This can be done today with the cast function (and I'm pretty sure that is how its done when we need to decode in Acero):
   
   >>> x = pa.array(["a", "a", "b"], pa.dictionary(pa.int8(), pa.string()))
   >>> x
   <pyarrow.lib.DictionaryArray object at 0x7f9eaaf60900>
   
   -- dictionary:
     [
       "a",
       "b"
     ]
   -- indices:
     [
       0,
       0,
       1
     ]
   >>> pc.cast(x, pa.string())
   <pyarrow.lib.StringArray object at 0x7f9eaaf80b20>
   [
     "a",
     "a",
     "b"
   ]
   
   I agree that an explicit `dictionary_decode` alias would be nice.  Maybe we can use a MetaFunction?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1559965062

   Hmm, I see. `pc.dictionary_decode` kind of makes sense, then, even though it's a bit of anti-pattern to make everything a compute function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1559959332

   Adding `ChunkedArray.dictionary_decode()` would sound a bit more logical (since `Array.dictionary_decode()` already exists) than creating a wrapper compute function, IMHO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1559954037

   Quoting the last part of my top post: 
   
   > Among other things, this makes it available as a function in pyarrow.compute (and for example ensures you can call this on a chunked array more easily).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] R-JunmingChen commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "R-JunmingChen (via GitHub)" <gi...@apache.org>.
R-JunmingChen commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1502671735

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] R-JunmingChen commented on issue #34588: [C++][Python] Add compute kernel for "dictionary_decode"

Posted by "R-JunmingChen (via GitHub)" <gi...@apache.org>.
R-JunmingChen commented on issue #34588:
URL: https://github.com/apache/arrow/issues/34588#issuecomment-1486666937

   Hi, where is the code of dictionary_encode in cpp?
   I search for arrow/cpp/src/arrow/compute and there is no such file like vector_dictionary_encode


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org