You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@poi.apache.org by Jörn Franke <jo...@gmail.com> on 2017/01/08 11:11:15 UTC

Using Apache POI on Hadoop/Spark

Dear all,

I released a first version of an open source library that uses Apache POI
to read/write Excel files on Hadoop/Spark/etc.:
https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/


Feel free to comment or propose suggestions via Github issues.

Thank you!

best regsards

Re: Using Apache POI on Hadoop/Spark

Posted by Jörn Franke <jo...@gmail.com>.

Not sure, but why do you need such a large export? You are not going to open it in Excel?

> On 22 Jan 2017, at 23:35, SriniMurthy <sr...@gmail.com> wrote:
> 
> One of my major challenges in large exports is the fact I have to keep worksheet in memory. Though there is option to stream, with XSSF worksheet option, extremely large exports do cause some headaches with Xml related issues. It would be awesome if I could find a way to break  the writing piece up as well 
> 
> -----Original Message-----
> From: "Jörn Franke" <jo...@gmail.com>
> Sent: ‎1/‎22/‎2017 1:46 PM
> To: "POI Users List" <us...@poi.apache.org>
> Subject: Re: Using Apache POI on Hadoop/Spark
> 
> Hi Dominik,
> 
> Thanks a lot that looks good. I hope I can also support POI eventually - it is also a great piece of software.
> 
> All the best 
> 
>> On 22 Jan 2017, at 22:24, Dominik Stadler <do...@gmx.at> wrote:
>> 
>> Hi,
>> 
>> Thanks for sharing, a very well-built and documented piece of software!
>> 
>> I have now added a link to it at
>> http://poi.apache.org/related-projects.html#HadoopOffice, let me know if
>> you would like to add some more project-description there.
>> 
>> Dominik.
>> 
>>> On Sun, Jan 8, 2017 at 12:11 PM, Jörn Franke <jo...@gmail.com> wrote:
>>> 
>>> Dear all,
>>> 
>>> I released a first version of an open source library that uses Apache POI
>>> to read/write Excel files on Hadoop/Spark/etc.:
>>> https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-
>>> documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/
>>> 
>>> 
>>> Feel free to comment or propose suggestions via Github issues.
>>> 
>>> Thank you!
>>> 
>>> best regsards
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

RE: Using Apache POI on Hadoop/Spark

Posted by SriniMurthy <sr...@gmail.com>.

One of my major challenges in large exports is the fact I have to keep worksheet in memory. Though there is option to stream, with XSSF worksheet option, extremely large exports do cause some headaches with Xml related issues. It would be awesome if I could find a way to break  the writing piece up as well 

-----Original Message-----
From: "Jörn Franke" <jo...@gmail.com>
Sent: ‎1/‎22/‎2017 1:46 PM
To: "POI Users List" <us...@poi.apache.org>
Subject: Re: Using Apache POI on Hadoop/Spark

Hi Dominik,

Thanks a lot that looks good. I hope I can also support POI eventually - it is also a great piece of software.

All the best 

> On 22 Jan 2017, at 22:24, Dominik Stadler <do...@gmx.at> wrote:
> 
> Hi,
> 
> Thanks for sharing, a very well-built and documented piece of software!
> 
> I have now added a link to it at
> http://poi.apache.org/related-projects.html#HadoopOffice, let me know if
> you would like to add some more project-description there.
> 
> Dominik.
> 
>> On Sun, Jan 8, 2017 at 12:11 PM, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> Dear all,
>> 
>> I released a first version of an open source library that uses Apache POI
>> to read/write Excel files on Hadoop/Spark/etc.:
>> https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-
>> documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/
>> 
>> 
>> Feel free to comment or propose suggestions via Github issues.
>> 
>> Thank you!
>> 
>> best regsards
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Using Apache POI on Hadoop/Spark

Posted by Jörn Franke <jo...@gmail.com>.

Hi Dominik,

Thanks a lot that looks good. I hope I can also support POI eventually - it is also a great piece of software.

All the best 

> On 22 Jan 2017, at 22:24, Dominik Stadler <do...@gmx.at> wrote:
> 
> Hi,
> 
> Thanks for sharing, a very well-built and documented piece of software!
> 
> I have now added a link to it at
> http://poi.apache.org/related-projects.html#HadoopOffice, let me know if
> you would like to add some more project-description there.
> 
> Dominik.
> 
>> On Sun, Jan 8, 2017 at 12:11 PM, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> Dear all,
>> 
>> I released a first version of an open source library that uses Apache POI
>> to read/write Excel files on Hadoop/Spark/etc.:
>> https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-
>> documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/
>> 
>> 
>> Feel free to comment or propose suggestions via Github issues.
>> 
>> Thank you!
>> 
>> best regsards
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Re: Using Apache POI on Hadoop/Spark

Posted by Dominik Stadler <do...@gmx.at>.

Hi,

Thanks for sharing, a very well-built and documented piece of software!

I have now added a link to it at
http://poi.apache.org/related-projects.html#HadoopOffice, let me know if
you would like to add some more project-description there.

Dominik.

On Sun, Jan 8, 2017 at 12:11 PM, Jörn Franke <jo...@gmail.com> wrote:

> Dear all,
>
> I released a first version of an open source library that uses Apache POI
> to read/write Excel files on Hadoop/Spark/etc.:
> https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-
> documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/
>
>
> Feel free to comment or propose suggestions via Github issues.
>
> Thank you!
>
> best regsards
>

Re: Using Apache POI on Hadoop/Spark

Posted by Srinivas Murthy <sr...@gmail.com>.

Sweet!

On Sun, Jan 8, 2017 at 3:11 AM, Jörn Franke <jo...@gmail.com> wrote:

> Dear all,
>
> I released a first version of an open source library that uses Apache POI
> to read/write Excel files on Hadoop/Spark/etc.:
> https://snippetessay.wordpress.com/2017/01/08/readingwriting-excel-
> documents-with-the-hadoopoffice-library-on-hadoop-and-spark-first-release/
>
>
> Feel free to comment or propose suggestions via Github issues.
>
> Thank you!
>
> best regsards
>