You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 02:12:59 UTC

[GitHub] [arrow] westonpace edited a comment on pull request #8984: ARROW-5336: [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries

westonpace edited a comment on pull request #8984:
URL: https://github.com/apache/arrow/pull/8984#issuecomment-758347141


   @jorisvandenbossche It's pretty close but there are a few differences.
   
   - The pandas code allows the index type to expand (e.g. from uint8_t to uint16_t).  In fact, it looks like it always sets it to int32_t.  Also, arrow doesn't allow dictionary indices to be negative.
   - The pandas code puts -1 in the map for a null value.  Arrow uses null in the validity bitmap for the indices array and/or null as an item in the dictionary itself with a valid index (both arrow approaches are legal but the pandas approach is neither of those)
   
   I'll defer to @pitrou if we want to combine them but it seems simpler to just leave them separate for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org