You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Julian Hyde <jh...@gmail.com> on 2021/02/07 18:14:00 UTC

Arrow papers

A couple of interesting Arrow-related papers have appeared at conferences recently:
Integrating Lightweight Compression Capabilities into Apache Arrow [1]
Magpie: Python at Speed and Scale using Cloud Backends [2]

I’m sharing them so that people are aware of the evolving state-of-the-art.

Julian

[1] https://www.researchgate.net/publication/342996896_Integrating_Lightweight_Compression_Capabilities_into_Apache_Arrow <https://www.researchgate.net/publication/342996896_Integrating_Lightweight_Compression_Capabilities_into_Apache_Arrow>

[2] http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf <http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf> 

Re: Arrow papers

Posted by Antoine Pitrou <an...@python.org>.
Hi Johan,

Le 08/02/2021 à 10:06, Johan Peltenburg - EWI a écrit :
> 
> Would it make sense to keep a list of peer reviewed open access academic literature strongly related to Arrow on the website?
> 
> Perhaps it could be on a special page under the community menu, similar to the "Powered By" page.

Yes that would definitely make sense.  The website pages are maintained
in this repository:
https://github.com/apache/arrow-site/

I'm not sure in which file your suggestion would go, but perhaps someone
else with better knowledge of the Website can help you.

Regards

Antoine.

Re: Arrow papers

Posted by Johan Peltenburg - EWI <J....@tudelft.nl>.
Hi,


Thanks for the links, Julian.


Out of curiosity I checked Google Scholar for "Apache Arrow" hits per publication year.


Year, Hits

2016, 10
2017, 20
2018, 35
2019, 92
2020, 124

Interest in academia is naturally growing with the project.


Would it make sense to keep a list of peer reviewed open access academic literature strongly related to Arrow on the website?

Perhaps it could be on a special page under the community menu, similar to the "Powered By" page.


It would be a great way for new researchers to quickly get up to par with the things people have been doing for/with Arrow from an academic perspective.

I think such a list would be most helpful if it's curated, making sure it links papers that e.g. don't just briefly mention Arrow, but actually do something significant with/to it.

If the community thinks this is a good idea, I would be happy to volunteer.


Kind regards,


Johan


________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Sunday, February 7, 2021 20:30
To: dev
Subject: Re: Arrow papers

Thanks for sharing these. I was aware of the Microsoft Magpie paper
but not the TU Dresden paper. It would be great to see some academic
groups engage in adding in-memory compression / encodings to the Arrow
format properly in collaboration with the Apache community.

On Sun, Feb 7, 2021 at 12:14 PM Julian Hyde <jh...@gmail.com> wrote:
>
> A couple of interesting Arrow-related papers have appeared at conferences recently:
> Integrating Lightweight Compression Capabilities into Apache Arrow [1]
> Magpie: Python at Speed and Scale using Cloud Backends [2]
>
> I’m sharing them so that people are aware of the evolving state-of-the-art.
>
> Julian
>
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.researchgate.net_publication_342996896-5FIntegrating-5FLightweight-5FCompression-5FCapabilities-5Finto-5FApache-5FArrow&d=DwIFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=XIcjc1Mr6wryJ3EUrVDrSLLrP3rU30gDzZBR590gEEc&m=8hrXfSeGWqfPo1mIjOec_D_SHVuy62Hz-AKjl0FJeVE&s=moXm2B3KjFdjwpBIJcMIC3jhp8K7N4Ki8TgbJZ4bI_M&e=  <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.researchgate.net_publication_342996896-5FIntegrating-5FLightweight-5FCompression-5FCapabilities-5Finto-5FApache-5FArrow&d=DwIFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=XIcjc1Mr6wryJ3EUrVDrSLLrP3rU30gDzZBR590gEEc&m=8hrXfSeGWqfPo1mIjOec_D_SHVuy62Hz-AKjl0FJeVE&s=moXm2B3KjFdjwpBIJcMIC3jhp8K7N4Ki8TgbJZ4bI_M&e= >
>
> [2] https://urldefense.proofpoint.com/v2/url?u=http-3A__cidrdb.org_cidr2021_papers_cidr2021-5Fpaper08.pdf&d=DwIFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=XIcjc1Mr6wryJ3EUrVDrSLLrP3rU30gDzZBR590gEEc&m=8hrXfSeGWqfPo1mIjOec_D_SHVuy62Hz-AKjl0FJeVE&s=njisstayyoM-q-ZNKtj4ZZIk1ZxnPgA3xyVlpgNbcyM&e=  <https://urldefense.proofpoint.com/v2/url?u=http-3A__cidrdb.org_cidr2021_papers_cidr2021-5Fpaper08.pdf&d=DwIFaQ&c=XYzUhXBD2cD-CornpT4QE19xOJBbRy-TBPLK0X9U2o8&r=XIcjc1Mr6wryJ3EUrVDrSLLrP3rU30gDzZBR590gEEc&m=8hrXfSeGWqfPo1mIjOec_D_SHVuy62Hz-AKjl0FJeVE&s=njisstayyoM-q-ZNKtj4ZZIk1ZxnPgA3xyVlpgNbcyM&e= >

Re: Arrow papers

Posted by Wes McKinney <we...@gmail.com>.
Thanks for sharing these. I was aware of the Microsoft Magpie paper
but not the TU Dresden paper. It would be great to see some academic
groups engage in adding in-memory compression / encodings to the Arrow
format properly in collaboration with the Apache community.

On Sun, Feb 7, 2021 at 12:14 PM Julian Hyde <jh...@gmail.com> wrote:
>
> A couple of interesting Arrow-related papers have appeared at conferences recently:
> Integrating Lightweight Compression Capabilities into Apache Arrow [1]
> Magpie: Python at Speed and Scale using Cloud Backends [2]
>
> I’m sharing them so that people are aware of the evolving state-of-the-art.
>
> Julian
>
> [1] https://www.researchgate.net/publication/342996896_Integrating_Lightweight_Compression_Capabilities_into_Apache_Arrow <https://www.researchgate.net/publication/342996896_Integrating_Lightweight_Compression_Capabilities_into_Apache_Arrow>
>
> [2] http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf <http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf>