You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "paddymul (via GitHub)" <gi...@apache.org> on 2024/03/14 19:47:16 UTC

[I] Examples of serializing arrow from python through ipywidgets to JS [arrow]

paddymul opened a new issue, #40561:
URL: https://github.com/apache/arrow/issues/40561

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   I do not know how to use ipywidgets traitlets.Bytes or even base64 encoding to generate an arrow table that will reify in JS.  
   
   lets start with Base64 
   
   ```python
   import pyarrow as pa
   import base64
   df = pd.DataFrame({'a':[10,50], 'b': ['paddy', 'margaret']})
   
   table = pa.Table.from_pandas(df)
   batch = table.to_batches()[0]
   pabuffer = batch.serialize()
   pabuffer
   
   b64_table = base64.b64encode(pabuffer.to_pybytes())
   print(base_64_table)
   ```
   
   given the following JS code, that string doesn't reify properly
   ```typescript
   export function base64ToBytes(base64:string) {
       const binString = atob(base64);
       //@ts-ignore
       return Uint8Array.from(binString, (m) => m.codePointAt(0));
   }
   
       const b64bytes = base64ToBytes(base64table)
       const t2 = tableFromIPC(b64bytes);
   ```
   I get an error of `TypeError: this.schema is undefined`
   
   I can get an arrow table to reify when I do base64 in JS and use that string.
   
   
   
   I have made a sample repo to play with, benchmark, and document dataframe serialization.  https://github.com/paddymul/df_cereal
   
   I'm eager to collaborate and help with documentation for using arrow-js.  I think it solves a lot of serialization problems for performance and typing vs json.  It's a little hard to approach as is, because the docs are written for someone who understands arrow, vs someone who knows less about arrow, but wants to use it for serialization.
   
   
   
   
   
   ### Component(s)
   
   JavaScript, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Examples of serializing arrow from python through ipywidgets to JS [arrow]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on issue #40561:
URL: https://github.com/apache/arrow/issues/40561#issuecomment-1998581800

   To be clear, you can also use Arrow IPC, but I use Parquet so that file sizes are smaller for when users are in remote Python environments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Examples of serializing arrow from python through ipywidgets to JS [arrow]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on issue #40561:
URL: https://github.com/apache/arrow/issues/40561#issuecomment-1998579095

   https://github.com/developmentseed/lonboard serializes all array or table data via Arrow and Parquet. The Python side [defines traits to validate and store](https://github.com/developmentseed/lonboard/blob/8e97480d79790df1b35f14b758fe1635f0450b8b/lonboard/traits.py#L113) `Table`, `Array`, and `ChunkedArray` objects. Each of those has [custom serializers](https://github.com/developmentseed/lonboard/blob/8e97480d79790df1b35f14b758fe1635f0450b8b/lonboard/traits.py#L142) which [serialize to Parquet](https://github.com/developmentseed/lonboard/blob/8e97480d79790df1b35f14b758fe1635f0450b8b/lonboard/_serialization.py#L70-L72) (for my uses, a list of Parquet files rather than one Parquet file with many chunks). Then in JS it [parses those Parquet buffers to an Arrow JS table](https://github.com/developmentseed/lonboard/blob/8e97480d79790df1b35f14b758fe1635f0450b8b/src/model/layer.ts#L48).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Examples of serializing arrow from python through ipywidgets to JS [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #40561:
URL: https://github.com/apache/arrow/issues/40561#issuecomment-1998346078

   cc @kylebarron in case you have done things like this or might have pointers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org