You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tajo.apache.org by Yu Dongmin <mi...@gmail.com> on 2014/01/22 12:19:10 UTC

Tajo C++ worker

Hello,

I'm very glad to hear that Tajo is getting stable and production ready.

Here's another news. Some of us might have have seen some Hyunsik's presentations commenting about an experimental project, c++ tajo worker.


Recently, I've been working on the c++ worker, even though it is a little bit far from behind schedule than what I expected, 
it is able to communicate with Tajo Master and Query Master successfully. 

This is never a replacement of Java worker, but an exchangeable worker. We could use Java workers as is, C++ workers only, or mixed workers. 

It is designed as a vectorized execution engine hoping to process certain types of data structures very efficiently. 

These are the supported features right now.

- Reading and parsing csv files in the hadoop data node.
- Filtering rows within LLVM code generated evaluation 
- Simple scalar functions
- Simple group by aggregation functions

Now I'm working on 'order by' clause and doing profiling to get expected performance.

Working on these stuffs, I hope community to allow creating a new git branch, say native_worker, cplus_worker or a nicer name, to make this project together.  


There's still a long way to go on this project, but it could be improved with help of the Tajo community.


Thanks
Min

Re: Tajo C++ worker

Posted by Jinho Kim <ji...@gmail.com>.

His first name is Dongmin. HaHa

Thank you! For your great work!



--Jinho
Best regards


2014/1/23 Jihoon Son <gh...@gmail.com>

> Hi Yu,
>
> I'm very excited for integrating vecterization and compilation techniques
> to Tajo!! As Hyunsik said, it would be great to proceed the work in a new
> branch after review, because this is a new feature for Tajo.
>
> I'll expect that your work is finished and submitted to the master branch
> as soon as possible.
>
> Thanks!
> Jihoon
>
>
> 2014/1/23 JaeHwa Jung <jh...@gruter.com>
>
> > Awesome!
> >
> > Thanks guys.
> > Your contributions will improve tajo more powerful. :)
> >
> >
> >
> > 2014/1/23 Hyunsik Choi <hy...@apache.org>
> >
> > > Hi Yu,
> > >
> > > Thank you for your contribution. I have been looking forward to seeing
> > > this work. Could you submit a patch to proper issues or create a jira?
> > > We will create a new branch for that after review.
> > >
> > > Hi folks,
> > >
> > > Some of us know Yu in offline. After the discussion early this year,
> > > he has mostly worked on Tajo C++ worker. I also contributed a small
> > > part about JIT code generation via LLVM, and participated in the
> > > design. Another contributor, Hyoung Jun, also contributed some module.
> > > Its main objective is to build a high performance query engine to
> > > maximize hardware utilization by using SIMD and cache conscious
> > > algorithm. It is under heavy development. As he mentioned, it would be
> > > a long journey for the future.
> > >
> > > - hyunsik
> > >
> > > On Wed, Jan 22, 2014 at 8:19 PM, Yu Dongmin <mi...@gmail.com> wrote:
> > > > Hello,
> > > >
> > > > I'm very glad to hear that Tajo is getting stable and production
> ready.
> > > >
> > > > Here's another news. Some of us might have have seen some Hyunsik's
> > > presentations commenting about an experimental project, c++ tajo
> worker.
> > > >
> > > >
> > > > Recently, I've been working on the c++ worker, even though it is a
> > > little bit far from behind schedule than what I expected,
> > > > it is able to communicate with Tajo Master and Query Master
> > successfully.
> > > >
> > > > This is never a replacement of Java worker, but an exchangeable
> worker.
> > > We could use Java workers as is, C++ workers only, or mixed workers.
> > > >
> > > > It is designed as a vectorized execution engine hoping to process
> > > certain types of data structures very efficiently.
> > > >
> > > > These are the supported features right now.
> > > >
> > > > - Reading and parsing csv files in the hadoop data node.
> > > > - Filtering rows within LLVM code generated evaluation
> > > > - Simple scalar functions
> > > > - Simple group by aggregation functions
> > > >
> > > > Now I'm working on 'order by' clause and doing profiling to get
> > expected
> > > performance.
> > > >
> > > > Working on these stuffs, I hope community to allow creating a new git
> > > branch, say native_worker, cplus_worker or a nicer name, to make this
> > > project together.
> > > >
> > > >
> > > > There's still a long way to go on this project, but it could be
> > improved
> > > with help of the Tajo community.
> > > >
> > > >
> > > > Thanks
> > > > Min
> > >
> >
> >
> >
> > --
> > Thanks,
> > Jaehwa Jung
> > Bigdata Platform Team
> > Gruter
> >
>
>
>
> --
> Jihoon Son
>
> Database & Information Systems Group,
> Prof. Yon Dohn Chung Lab.
> Dept. of Computer Science & Engineering,
> Korea University
> 1, 5-ga, Anam-dong, Seongbuk-gu,
> Seoul, 136-713, Republic of Korea
>
> Tel : +82-2-3290-3580
> E-mail : jihoonson@korea.ac.kr
>

Re: Tajo C++ worker

Posted by Jihoon Son <gh...@gmail.com>.

Hi Yu,

I'm very excited for integrating vecterization and compilation techniques
to Tajo!! As Hyunsik said, it would be great to proceed the work in a new
branch after review, because this is a new feature for Tajo.

I'll expect that your work is finished and submitted to the master branch
as soon as possible.

Thanks!
Jihoon


2014/1/23 JaeHwa Jung <jh...@gruter.com>

> Awesome!
>
> Thanks guys.
> Your contributions will improve tajo more powerful. :)
>
>
>
> 2014/1/23 Hyunsik Choi <hy...@apache.org>
>
> > Hi Yu,
> >
> > Thank you for your contribution. I have been looking forward to seeing
> > this work. Could you submit a patch to proper issues or create a jira?
> > We will create a new branch for that after review.
> >
> > Hi folks,
> >
> > Some of us know Yu in offline. After the discussion early this year,
> > he has mostly worked on Tajo C++ worker. I also contributed a small
> > part about JIT code generation via LLVM, and participated in the
> > design. Another contributor, Hyoung Jun, also contributed some module.
> > Its main objective is to build a high performance query engine to
> > maximize hardware utilization by using SIMD and cache conscious
> > algorithm. It is under heavy development. As he mentioned, it would be
> > a long journey for the future.
> >
> > - hyunsik
> >
> > On Wed, Jan 22, 2014 at 8:19 PM, Yu Dongmin <mi...@gmail.com> wrote:
> > > Hello,
> > >
> > > I'm very glad to hear that Tajo is getting stable and production ready.
> > >
> > > Here's another news. Some of us might have have seen some Hyunsik's
> > presentations commenting about an experimental project, c++ tajo worker.
> > >
> > >
> > > Recently, I've been working on the c++ worker, even though it is a
> > little bit far from behind schedule than what I expected,
> > > it is able to communicate with Tajo Master and Query Master
> successfully.
> > >
> > > This is never a replacement of Java worker, but an exchangeable worker.
> > We could use Java workers as is, C++ workers only, or mixed workers.
> > >
> > > It is designed as a vectorized execution engine hoping to process
> > certain types of data structures very efficiently.
> > >
> > > These are the supported features right now.
> > >
> > > - Reading and parsing csv files in the hadoop data node.
> > > - Filtering rows within LLVM code generated evaluation
> > > - Simple scalar functions
> > > - Simple group by aggregation functions
> > >
> > > Now I'm working on 'order by' clause and doing profiling to get
> expected
> > performance.
> > >
> > > Working on these stuffs, I hope community to allow creating a new git
> > branch, say native_worker, cplus_worker or a nicer name, to make this
> > project together.
> > >
> > >
> > > There's still a long way to go on this project, but it could be
> improved
> > with help of the Tajo community.
> > >
> > >
> > > Thanks
> > > Min
> >
>
>
>
> --
> Thanks,
> Jaehwa Jung
> Bigdata Platform Team
> Gruter
>



-- 
Jihoon Son

Database & Information Systems Group,
Prof. Yon Dohn Chung Lab.
Dept. of Computer Science & Engineering,
Korea University
1, 5-ga, Anam-dong, Seongbuk-gu,
Seoul, 136-713, Republic of Korea

Tel : +82-2-3290-3580
E-mail : jihoonson@korea.ac.kr

Re: Tajo C++ worker

Posted by JaeHwa Jung <jh...@gruter.com>.

Awesome!

Thanks guys.
Your contributions will improve tajo more powerful. :)



2014/1/23 Hyunsik Choi <hy...@apache.org>

> Hi Yu,
>
> Thank you for your contribution. I have been looking forward to seeing
> this work. Could you submit a patch to proper issues or create a jira?
> We will create a new branch for that after review.
>
> Hi folks,
>
> Some of us know Yu in offline. After the discussion early this year,
> he has mostly worked on Tajo C++ worker. I also contributed a small
> part about JIT code generation via LLVM, and participated in the
> design. Another contributor, Hyoung Jun, also contributed some module.
> Its main objective is to build a high performance query engine to
> maximize hardware utilization by using SIMD and cache conscious
> algorithm. It is under heavy development. As he mentioned, it would be
> a long journey for the future.
>
> - hyunsik
>
> On Wed, Jan 22, 2014 at 8:19 PM, Yu Dongmin <mi...@gmail.com> wrote:
> > Hello,
> >
> > I'm very glad to hear that Tajo is getting stable and production ready.
> >
> > Here's another news. Some of us might have have seen some Hyunsik's
> presentations commenting about an experimental project, c++ tajo worker.
> >
> >
> > Recently, I've been working on the c++ worker, even though it is a
> little bit far from behind schedule than what I expected,
> > it is able to communicate with Tajo Master and Query Master successfully.
> >
> > This is never a replacement of Java worker, but an exchangeable worker.
> We could use Java workers as is, C++ workers only, or mixed workers.
> >
> > It is designed as a vectorized execution engine hoping to process
> certain types of data structures very efficiently.
> >
> > These are the supported features right now.
> >
> > - Reading and parsing csv files in the hadoop data node.
> > - Filtering rows within LLVM code generated evaluation
> > - Simple scalar functions
> > - Simple group by aggregation functions
> >
> > Now I'm working on 'order by' clause and doing profiling to get expected
> performance.
> >
> > Working on these stuffs, I hope community to allow creating a new git
> branch, say native_worker, cplus_worker or a nicer name, to make this
> project together.
> >
> >
> > There's still a long way to go on this project, but it could be improved
> with help of the Tajo community.
> >
> >
> > Thanks
> > Min
>



-- 
Thanks,
Jaehwa Jung
Bigdata Platform Team
Gruter

Re: Tajo C++ worker

Posted by Hyunsik Choi <hy...@apache.org>.

Hi Yu,

Thank you for your contribution. I have been looking forward to seeing
this work. Could you submit a patch to proper issues or create a jira?
We will create a new branch for that after review.

Hi folks,

Some of us know Yu in offline. After the discussion early this year,
he has mostly worked on Tajo C++ worker. I also contributed a small
part about JIT code generation via LLVM, and participated in the
design. Another contributor, Hyoung Jun, also contributed some module.
Its main objective is to build a high performance query engine to
maximize hardware utilization by using SIMD and cache conscious
algorithm. It is under heavy development. As he mentioned, it would be
a long journey for the future.

- hyunsik

On Wed, Jan 22, 2014 at 8:19 PM, Yu Dongmin <mi...@gmail.com> wrote:
> Hello,
>
> I'm very glad to hear that Tajo is getting stable and production ready.
>
> Here's another news. Some of us might have have seen some Hyunsik's presentations commenting about an experimental project, c++ tajo worker.
>
>
> Recently, I've been working on the c++ worker, even though it is a little bit far from behind schedule than what I expected,
> it is able to communicate with Tajo Master and Query Master successfully.
>
> This is never a replacement of Java worker, but an exchangeable worker. We could use Java workers as is, C++ workers only, or mixed workers.
>
> It is designed as a vectorized execution engine hoping to process certain types of data structures very efficiently.
>
> These are the supported features right now.
>
> - Reading and parsing csv files in the hadoop data node.
> - Filtering rows within LLVM code generated evaluation
> - Simple scalar functions
> - Simple group by aggregation functions
>
> Now I'm working on 'order by' clause and doing profiling to get expected performance.
>
> Working on these stuffs, I hope community to allow creating a new git branch, say native_worker, cplus_worker or a nicer name, to make this project together.
>
>
> There's still a long way to go on this project, but it could be improved with help of the Tajo community.
>
>
> Thanks
> Min

Re: Tajo C++ worker

Posted by CharSyam <ch...@gmail.com>.

IT's cool

2014년 1월 22일 수요일, Yu Dongmin<mi...@gmail.com>님이 작성한 메시지:

> Hello,
>
> I'm very glad to hear that Tajo is getting stable and production ready.
>
> Here's another news. Some of us might have have seen some Hyunsik's
> presentations commenting about an experimental project, c++ tajo worker.
>
>
> Recently, I've been working on the c++ worker, even though it is a little
> bit far from behind schedule than what I expected,
> it is able to communicate with Tajo Master and Query Master successfully.
>
> This is never a replacement of Java worker, but an exchangeable worker. We
> could use Java workers as is, C++ workers only, or mixed workers.
>
> It is designed as a vectorized execution engine hoping to process certain
> types of data structures very efficiently.
>
> These are the supported features right now.
>
> - Reading and parsing csv files in the hadoop data node.
> - Filtering rows within LLVM code generated evaluation
> - Simple scalar functions
> - Simple group by aggregation functions
>
> Now I'm working on 'order by' clause and doing profiling to get expected
> performance.
>
> Working on these stuffs, I hope community to allow creating a new git
> branch, say native_worker, cplus_worker or a nicer name, to make this
> project together.
>
>
> There's still a long way to go on this project, but it could be improved
> with help of the Tajo community.
>
>
> Thanks
> Min