You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Abdulrahman Kaitoua <ab...@windowslive.com> on 2016/11/03 08:10:41 UTC

Question

Dears,

I would like to get more information from you in order for me to use Arrow and be able to contribute in the near future.

What i see in Arrow that i can read and write Arrow files (from the vector test classes), i did not see tests for sending data over a network. As i understood from the project proposal (correct me if i am wrong.), that i can write Arrow Array from somewhere and read from somewhere else, this means that Arrow would be such a centralised server that hold a state and engines will connect to it to write Arrow Arrays and other engines will read (like in the picture bellow). How far Arrow from having this centralised system, where we are now?

I am working on an application which is about moving data while changing the schema in between the source and the destination. Like moving the data from Apache Spark to Apache Flink and in between change the schema.

Regards,
[cid:76C48A02-6D9A-4B48-9952-F992A981414B]
------------------------------------------------------
Abdulrahman Kaitoua
Ph.D. Candidate at Politecnico di Milano
Department of Electronics, Information and Bioengineering
Piazza Leonardo da Vinci 32 - 20133 Milano, Italy
Tel. Lab: +39 02 2399 3631




Re: Question

Posted by Julien Le Dem <ju...@dremio.com>.
Hi all,
Just to clarify. Yes Arrow intends to define network protocols.
The file format is merely the network messages in a file.
We are also looking into IPC. Inter-process communication using shared
memory.


On Thu, Nov 3, 2016 at 5:51 AM, Donald Foss <do...@gmail.com> wrote:

> Abdulrahman, your schema diagram did not come through, at least not in a
> way I could view it in Mac Mail.  Looking at the message source, I don’t
> see the specified Content ID [cid] or inline data element for the graphic.
>
> Generally speaking, I believe the Arrow project defines data structures,
> file formats, and in-memory processing methods and various corresponding
> properties.  As a project follower who eagerly tests nightlies and comments
> on new release candidates, I definitely do not speak for anyone other than
> myself.  That said, I do not believe that Apache Arrow intended to include
> network and messaging protocols.  There are a large number of those
> available, from the ever popular 0MQ, to what is becoming my new favorite,
> Cap’n Proto (https://github.com/sandstorm-io/capnproto <
> https://github.com/sandstorm-io/capnproto>), along with it’s with it’s
> Java compatibility repository (https://github.com/sandstorm-io/capnproto <
> https://github.com/sandstorm-io/capnproto>).  Note that I have no
> relationship without that project except technical jealousy.
>
> Side note: FWIW, even though I don’t know exactly what you’re doing, if
> it’s streaming, I generally go with Flink.
>
> —Donald
>
> > On Nov 3, 2016, at 4:10 AM, Abdulrahman Kaitoua <
> abdk.1000@windowslive.com> wrote:
> >
> >
> > Dears,
> >
> > I would like to get more information from you in order for me to use
> Arrow and be able to contribute in the near future.
> >
> > What i see in Arrow that i can read and write Arrow files (from the
> vector test classes), i did not see tests for sending data over a network.
> As i understood from the project proposal (correct me if i am wrong.), that
> i can write Arrow Array from somewhere and read from somewhere else, this
> means that Arrow would be such a centralised server that hold a state and
> engines will connect to it to write Arrow Arrays and other engines will
> read (like in the picture bellow). How far Arrow from having this
> centralised system, where we are now?
> >
> > I am working on an application which is about moving data while changing
> the schema in between the source and the destination. Like moving the data
> from Apache Spark to Apache Flink and in between change the schema.
> >
> > Regards,
> >
> > ------------------------------------------------------
> > Abdulrahman Kaitoua
> >
> > Ph.D. Candidate at Politecnico di Milano
> >
> > Department of Electronics, Information and Bioengineering
> > Piazza Leonardo da Vinci 32 - 20133 Milano, Italy
> >
> > Tel. Lab: +39 02 2399 3631
> >
> >
> >
>
>


-- 
Julien

Re: Question

Posted by Donald Foss <do...@gmail.com>.
Abdulrahman, your schema diagram did not come through, at least not in a way I could view it in Mac Mail.  Looking at the message source, I don’t see the specified Content ID [cid] or inline data element for the graphic. 

Generally speaking, I believe the Arrow project defines data structures, file formats, and in-memory processing methods and various corresponding properties.  As a project follower who eagerly tests nightlies and comments on new release candidates, I definitely do not speak for anyone other than myself.  That said, I do not believe that Apache Arrow intended to include network and messaging protocols.  There are a large number of those available, from the ever popular 0MQ, to what is becoming my new favorite, Cap’n Proto (https://github.com/sandstorm-io/capnproto <https://github.com/sandstorm-io/capnproto>), along with it’s with it’s Java compatibility repository (https://github.com/sandstorm-io/capnproto <https://github.com/sandstorm-io/capnproto>).  Note that I have no relationship without that project except technical jealousy.

Side note: FWIW, even though I don’t know exactly what you’re doing, if it’s streaming, I generally go with Flink.

—Donald

> On Nov 3, 2016, at 4:10 AM, Abdulrahman Kaitoua <ab...@windowslive.com> wrote:
> 
> 
> Dears,
> 
> I would like to get more information from you in order for me to use Arrow and be able to contribute in the near future. 
> 
> What i see in Arrow that i can read and write Arrow files (from the vector test classes), i did not see tests for sending data over a network. As i understood from the project proposal (correct me if i am wrong.), that i can write Arrow Array from somewhere and read from somewhere else, this means that Arrow would be such a centralised server that hold a state and engines will connect to it to write Arrow Arrays and other engines will read (like in the picture bellow). How far Arrow from having this centralised system, where we are now?
> 
> I am working on an application which is about moving data while changing the schema in between the source and the destination. Like moving the data from Apache Spark to Apache Flink and in between change the schema.
> 
> Regards, 
>   
> ------------------------------------------------------
> Abdulrahman Kaitoua
> 
> Ph.D. Candidate at Politecnico di Milano
> 
> Department of Electronics, Information and Bioengineering 
> Piazza Leonardo da Vinci 32 - 20133 Milano, Italy 
> 
> Tel. Lab: +39 02 2399 3631 
> 
> 
>