You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Andrew Clancy <ni...@achren.org> on 2020/12/14 21:32:31 UTC

[javascript] cant get timestamps in arrow 2.0

Hi all,

I have a simple feather file created via a pandas to_feather with a
datetime64[ns] column, and cannot get timestamps in javascript
apache-arrow@2.0.0

See this notebook:
https://observablehq.com/@nite/apache-arrow-timestamp-investigation

I'm guessing I'm missing something, has anyone got any suggestions, or
decent examples of reading a file created in pandas? I've seen in examples
of apache-arrow@0.3.1 where dates stored as an array of 2 ints.

File was created with:

import pandas as pd
pd.read_parquet('sample.parquet')
df.to_feather('sample-seconds.feather')

Final Q: I'm assuming this is the best place for this question? Happy to
post elsewhere if there's any other forums, or if this should be a JIRA
ticket?

Thanks!
Andy

Re: [javascript] cant get timestamps in arrow 2.0

Posted by Andrew Clancy <ni...@achren.org>.
Yeah I figure this is a browser javascript limitation - anything with
access to core zip libraries on a machine should be able to implement this
fairly cheaply.
I'm surprised that browsers dont provide c++ zip/unzip apis via javascript
yet - jszip/pako etc all fall over unzipping > 500mb in my recent
investigations (and are slow)

On Fri, 18 Dec 2020 at 04:26, Jacob Quinn <qu...@gmail.com> wrote:

>  Today, I think only C++ (and libraries that bind to it) have compression
>> implemented.  I think a new PR for java was just opened in the last few
>> days.
>>
>
> Note the Julia implementation (Arrow.jl) supports compressing when writing
> and decompressing when reading. (Not that it really helps for the
> javascript side of things here, but just wanted to point it out as the
> Julia code is relatively new to the arrow project).
>
> On Thu, Dec 17, 2020 at 2:10 PM Andrew Clancy <ni...@achren.org> wrote:
>
>> Yep - that's where I was expecting it!
>> These guys appear to implement decompression using pako:
>> https://github.com/usnistgov/jsfive - might be a good route to look
>> into.
>>
>>
>>
>> On Thu, 17 Dec 2020 at 19:19, Micah Kornfield <em...@gmail.com>
>> wrote:
>>
>>> I don't know the support for the compression codecs in Javascript, but i
>>> don't think anyone has attempted to implement them.
>>>
>>> I couldn't find the compression feature listed on the library status
>>> docs [1].
>>>
>>> But we should add a line item for it.  Today, I think only C++ (and
>>> libraries that bind to it) have compression implemented.  I think a new PR
>>> for java was just opened in the last few days.
>>>
>>> [1] https://arrow.apache.org/docs/status.html
>>>
>>> On Thu, Dec 17, 2020 at 10:10 AM Andrew Clancy <ni...@achren.org> wrote:
>>>
>>>> So, I figured out the issue here - I had to remove compression from the
>>>> pyarrow feather.write_feather(compression='uncompressed'). Is there
>>>> any way to read a compressed feather file in arrow js?
>>>> See the comment under the first answer here:
>>>> https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955
>>>> I couldn't find anything in the arrow docs or notebooks on this - I'm
>>>> assuming that's related to javascript compression libraries being so
>>>> limited.
>>>>
>>>>
>>>> On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <ni...@achren.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a simple feather file created via a pandas to_feather with a
>>>>> datetime64[ns] column, and cannot get timestamps in javascript
>>>>> apache-arrow@2.0.0
>>>>>
>>>>> See this notebook:
>>>>> https://observablehq.com/@nite/apache-arrow-timestamp-investigation
>>>>>
>>>>> I'm guessing I'm missing something, has anyone got any suggestions, or
>>>>> decent examples of reading a file created in pandas? I've seen in examples
>>>>> of apache-arrow@0.3.1 where dates stored as an array of 2 ints.
>>>>>
>>>>> File was created with:
>>>>>
>>>>> import pandas as pd
>>>>> pd.read_parquet('sample.parquet')
>>>>> df.to_feather('sample-seconds.feather')
>>>>>
>>>>> Final Q: I'm assuming this is the best place for this question?
>>>>> Happy to post elsewhere if there's any other forums, or if this should be a
>>>>> JIRA ticket?
>>>>>
>>>>> Thanks!
>>>>> Andy
>>>>>
>>>>

Re: [javascript] cant get timestamps in arrow 2.0

Posted by Jacob Quinn <qu...@gmail.com>.
>
>  Today, I think only C++ (and libraries that bind to it) have compression
> implemented.  I think a new PR for java was just opened in the last few
> days.
>

Note the Julia implementation (Arrow.jl) supports compressing when writing
and decompressing when reading. (Not that it really helps for the
javascript side of things here, but just wanted to point it out as the
Julia code is relatively new to the arrow project).

On Thu, Dec 17, 2020 at 2:10 PM Andrew Clancy <ni...@achren.org> wrote:

> Yep - that's where I was expecting it!
> These guys appear to implement decompression using pako:
> https://github.com/usnistgov/jsfive - might be a good route to look into.
>
>
>
> On Thu, 17 Dec 2020 at 19:19, Micah Kornfield <em...@gmail.com>
> wrote:
>
>> I don't know the support for the compression codecs in Javascript, but i
>> don't think anyone has attempted to implement them.
>>
>> I couldn't find the compression feature listed on the library status docs
>> [1].
>>
>> But we should add a line item for it.  Today, I think only C++ (and
>> libraries that bind to it) have compression implemented.  I think a new PR
>> for java was just opened in the last few days.
>>
>> [1] https://arrow.apache.org/docs/status.html
>>
>> On Thu, Dec 17, 2020 at 10:10 AM Andrew Clancy <ni...@achren.org> wrote:
>>
>>> So, I figured out the issue here - I had to remove compression from the
>>> pyarrow feather.write_feather(compression='uncompressed'). Is there any
>>> way to read a compressed feather file in arrow js?
>>> See the comment under the first answer here:
>>> https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955
>>> I couldn't find anything in the arrow docs or notebooks on this - I'm
>>> assuming that's related to javascript compression libraries being so
>>> limited.
>>>
>>>
>>> On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <ni...@achren.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a simple feather file created via a pandas to_feather with a
>>>> datetime64[ns] column, and cannot get timestamps in javascript
>>>> apache-arrow@2.0.0
>>>>
>>>> See this notebook:
>>>> https://observablehq.com/@nite/apache-arrow-timestamp-investigation
>>>>
>>>> I'm guessing I'm missing something, has anyone got any suggestions, or
>>>> decent examples of reading a file created in pandas? I've seen in examples
>>>> of apache-arrow@0.3.1 where dates stored as an array of 2 ints.
>>>>
>>>> File was created with:
>>>>
>>>> import pandas as pd
>>>> pd.read_parquet('sample.parquet')
>>>> df.to_feather('sample-seconds.feather')
>>>>
>>>> Final Q: I'm assuming this is the best place for this question?
>>>> Happy to post elsewhere if there's any other forums, or if this should be a
>>>> JIRA ticket?
>>>>
>>>> Thanks!
>>>> Andy
>>>>
>>>

Re: [javascript] cant get timestamps in arrow 2.0

Posted by Andrew Clancy <ni...@achren.org>.
Yep - that's where I was expecting it!
These guys appear to implement decompression using pako:
https://github.com/usnistgov/jsfive - might be a good route to look into.



On Thu, 17 Dec 2020 at 19:19, Micah Kornfield <em...@gmail.com> wrote:

> I don't know the support for the compression codecs in Javascript, but i
> don't think anyone has attempted to implement them.
>
> I couldn't find the compression feature listed on the library status docs
> [1].
>
> But we should add a line item for it.  Today, I think only C++ (and
> libraries that bind to it) have compression implemented.  I think a new PR
> for java was just opened in the last few days.
>
> [1] https://arrow.apache.org/docs/status.html
>
> On Thu, Dec 17, 2020 at 10:10 AM Andrew Clancy <ni...@achren.org> wrote:
>
>> So, I figured out the issue here - I had to remove compression from the
>> pyarrow feather.write_feather(compression='uncompressed'). Is there any
>> way to read a compressed feather file in arrow js?
>> See the comment under the first answer here:
>> https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955
>> I couldn't find anything in the arrow docs or notebooks on this - I'm
>> assuming that's related to javascript compression libraries being so
>> limited.
>>
>>
>> On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <ni...@achren.org> wrote:
>>
>>> Hi all,
>>>
>>> I have a simple feather file created via a pandas to_feather with a
>>> datetime64[ns] column, and cannot get timestamps in javascript
>>> apache-arrow@2.0.0
>>>
>>> See this notebook:
>>> https://observablehq.com/@nite/apache-arrow-timestamp-investigation
>>>
>>> I'm guessing I'm missing something, has anyone got any suggestions, or
>>> decent examples of reading a file created in pandas? I've seen in examples
>>> of apache-arrow@0.3.1 where dates stored as an array of 2 ints.
>>>
>>> File was created with:
>>>
>>> import pandas as pd
>>> pd.read_parquet('sample.parquet')
>>> df.to_feather('sample-seconds.feather')
>>>
>>> Final Q: I'm assuming this is the best place for this question? Happy to
>>> post elsewhere if there's any other forums, or if this should be a JIRA
>>> ticket?
>>>
>>> Thanks!
>>> Andy
>>>
>>

Re: [javascript] cant get timestamps in arrow 2.0

Posted by Micah Kornfield <em...@gmail.com>.
I don't know the support for the compression codecs in Javascript, but i
don't think anyone has attempted to implement them.

I couldn't find the compression feature listed on the library status docs
[1].

But we should add a line item for it.  Today, I think only C++ (and
libraries that bind to it) have compression implemented.  I think a new PR
for java was just opened in the last few days.

[1] https://arrow.apache.org/docs/status.html

On Thu, Dec 17, 2020 at 10:10 AM Andrew Clancy <ni...@achren.org> wrote:

> So, I figured out the issue here - I had to remove compression from the
> pyarrow feather.write_feather(compression='uncompressed'). Is there any
> way to read a compressed feather file in arrow js?
> See the comment under the first answer here:
> https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955
> I couldn't find anything in the arrow docs or notebooks on this - I'm
> assuming that's related to javascript compression libraries being so
> limited.
>
>
> On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <ni...@achren.org> wrote:
>
>> Hi all,
>>
>> I have a simple feather file created via a pandas to_feather with a
>> datetime64[ns] column, and cannot get timestamps in javascript
>> apache-arrow@2.0.0
>>
>> See this notebook:
>> https://observablehq.com/@nite/apache-arrow-timestamp-investigation
>>
>> I'm guessing I'm missing something, has anyone got any suggestions, or
>> decent examples of reading a file created in pandas? I've seen in examples
>> of apache-arrow@0.3.1 where dates stored as an array of 2 ints.
>>
>> File was created with:
>>
>> import pandas as pd
>> pd.read_parquet('sample.parquet')
>> df.to_feather('sample-seconds.feather')
>>
>> Final Q: I'm assuming this is the best place for this question? Happy to
>> post elsewhere if there's any other forums, or if this should be a JIRA
>> ticket?
>>
>> Thanks!
>> Andy
>>
>

Re: [javascript] cant get timestamps in arrow 2.0

Posted by Andrew Clancy <ni...@achren.org>.
So, I figured out the issue here - I had to remove compression from the
pyarrow feather.write_feather(compression='uncompressed'). Is there any way
to read a compressed feather file in arrow js?
See the comment under the first answer here:
https://stackoverflow.com/questions/64629670/how-to-write-a-pandas-dataframe-to-arrow-file/64648955#64648955
I couldn't find anything in the arrow docs or notebooks on this - I'm
assuming that's related to javascript compression libraries being so
limited.


On Mon, 14 Dec 2020 at 21:32, Andrew Clancy <ni...@achren.org> wrote:

> Hi all,
>
> I have a simple feather file created via a pandas to_feather with a
> datetime64[ns] column, and cannot get timestamps in javascript
> apache-arrow@2.0.0
>
> See this notebook:
> https://observablehq.com/@nite/apache-arrow-timestamp-investigation
>
> I'm guessing I'm missing something, has anyone got any suggestions, or
> decent examples of reading a file created in pandas? I've seen in examples
> of apache-arrow@0.3.1 where dates stored as an array of 2 ints.
>
> File was created with:
>
> import pandas as pd
> pd.read_parquet('sample.parquet')
> df.to_feather('sample-seconds.feather')
>
> Final Q: I'm assuming this is the best place for this question? Happy to
> post elsewhere if there's any other forums, or if this should be a JIRA
> ticket?
>
> Thanks!
> Andy
>