You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tajo.apache.org by Hyunsik Choi <hy...@apache.org> on 2014/09/01 19:22:10 UTC
Re: Tajo History, Technical Background

Hi Chris,

I'm sorry for late response. Recently, some of Tajo committers
including me went to workshop. I'll give answers sequentially.

1. As you mentioned, we should update our roadmap. I'll update it.
2. DAG is a direct acyclic graph which is used to represent a data
flow and processing steps for each SQL query. DAG controller plays a
important role to control stages and synchronize nodes which process
each stage. Query class is for it.
3. BMT is a benchmark test.
4. You may find the mean of GC.

Here is some materials you may need.
* https://cwiki.apache.org/confluence/display/TAJO/Presentations
* http://dl.acm.org/citation.cfm?id=2511134
* http://dbserver.korea.ac.kr/~hyunsik/papers/Tajo_Poster_ICDE_2013.png

If you have any question, feel free ask us anything. Also, you can
free free to contribute anything like code, documentation, QA, or user
feedbacks.

Best regards,
Hyunsik

On Fri, Aug 29, 2014 at 9:34 PM, Christian Schwabe
<Ch...@gmx.com> wrote:
> Hello Hyunsik,
>
> sorry for that stupid question at page 21. The acronym GC is determined
> Garbage Collection in this context.
>
> Best regards,
> Chris
>
> Am 29.08.2014 um 14:02 schrieb Christian Schwabe
> <Ch...@gmx.com>:
>
>
> Hello Hyunsik
>
> I've found this presentation
> (http://www.slideshare.net/gruter/hadoop-summit-2014-query-optimization-and-jitbased-vectorized-execution-in-apache-tajo)
> which explained detailed the processing for Tajo a while ago, but wanted to
> first deal with the basics of Tajo. I think to have understood this now and
> would now like to ask more detailed questions.
> However, still unanswered questions stay op to this presentation I would
> like to clarify here.
>
> Page 6: Can you explain in more detail what exact tasks the modules in the
> Tajo Master have?
> Page 7: What is a "DAG"-Controller? What does the shortcut "DAG" means? Can
> you explain the figure in more details what exactly happens in every step?
> Page 12: What is a "BMT"-Controller? What does the shortcut "BMT" means?
> Page 21: What is a "GC"-Controller? What does the shortcut "GC" means?
>
> I thank you and your team for your versatile help and that they could answer
> all questions
>  I had in the past.
>
>
> P.S.: While I am writing currently on my thesis, but at the same I would
> like to gave something back to this support I have also receivedfrom this
> community. Is it possible to accept smaller tasks, such as grooming the
> Documentation or other things are accessible to me?
>
>
> Warm regards,
> Chris
>
>
>
> Am 27.08.2014 11:03:56, schrieb Christian Schwabe:
>
> Hello Hyunsik,
>
> Thank you very for your detailed descriptions of the creation of Tajo.
> Tajo became to an Apache Top-Level Project in March 2014. What exactly mean
> this status? What added value does this mean for you?
> The current progress of Tajo is very promising. What exactly did you have
> done for the near future?
>
> On the roadmap (http://wiki.apache.org/tajo/Roadmap) all entries are
> outdated. This is quite a problem for the rapid progress of Tajo. The
> documentation and transparency should not lose sight of ;)
>
> Warm regards,
> Chris
>
>
> Am 26.08.2014 um 04:42 schrieb Hyunsik Choi <hy...@apache.org>:
>
> Hi,
>
> I'm sorry for late. My name is Hyunsik Choi who is one of the
> founders of Tajo and now is the PMC chair of Tajo project.
>
> I'm going to explain the origin of Tajo. It was a research project in
> Database Lab., Korea University. It started in May, 2010. At the first
> time, we started it as an alternative to Hive. We designed Tajo to
> take advantages of both shared-nothing parallel database and
> specialized distributed data processing systems, like MapReduce,
> Dryad, and Dremel.
>
> Jihoon Son and I mainly had worked on Tajo prototype. Later, Tajo
> became the subject of my Ph.D. dissertation. At that time, I were also
> working on some paper work, Parallel data processing with MapReduce: a
> survey, ACM SIGMOD Record 2011
> (http://dl.acm.org/citation.cfm?id=2094118). I were investigating lots
> of distributed processing systems and learned many things from them.
> So, I made an effort to reflect great design considerations of other
> distributed processing systems to the design of Tajo.
>
> At the first time, the design goals were scalability, high throughput,
> advanced query optimization, and fault tolerance. So far, we still
> have pursued them.
>
> Since 2013, Gruter, a big data company, have supported Tajo project,
> and it is employing some full time contributors (i.e., 3 PMC and one
> committer), including me.
>
> As you mentioned, Tajo documentation does not follow the current
> status of Tajo project because Tajo is very rapidly evolving and we do
> not have contributors enough to update continuously documentations.
> We've just periodically updated the documentation for each release. We
> are recruiting contributors for code and documentation.
>
> Q. How did you come to the name of Tajo?
>
> When we decided to propose Tajo as an ASF incubation project, the
> members in the DB Lab. voted for proper name suited for Hadoop eco
> systems. We wanted to use some animal name like other systems in
> Hadoop eco system. Finally, we chose Tajo, meaning Ostrich in Korean.
>
> If you have more questions about Tajo, feel free to ask anything.
>
> Best regards,
> Hyunsik
>
> On Sun, Aug 24, 2014 at 2:29 AM, Hyunsik Choi <hyunsik@apache.orgwrote:
>
> Hi Chris,
>
> Nice question! Tajo also has interesting history. I'll give the
> details of history tomorrow because here is too late :)
>
> Best regards,
> Hyunsik
>
> On Sat, Aug 23, 2014 at 1:28 AM, Christian Schwabe
> <Ch...@gmx.com> wrote:
>
> Hello everyone,
>
> For about three months now I am dealing with Tajo. Here, I received an
> insight into the documentation especially now know how to start with Tajo,
> which error it can be committed, have made me an overview of the Jira
> tickets and read existing documentation.
>
> I'm fascinated by how fast this community has grown and how far you're come
> previously and caused the potential Tajo.
> What I would like to employ me now closer is the historical and technical
> view of Tajo.
> That means I ask myself questions like: How did you come to the name of
> Tajo? When was indeed set the first milestone? Everywhere I read the year
> 2013. But is this actually the first time at which the first time was
> thought about Tajo? Who is the initiator of this project?
> Above all technical processes would be interested in me and certainly other
> very much. Apart from a few presentations on tajo.apache.org >> News there
> is little documentation, or I have not found it yet.
> In addition to the Jira tickets and documentation
> (https://cwiki.apache.org/confluence/display/TAJO/Apache+TAJO+Home,
> http://tajo.apache.org/docs/0.8.0/index.html ) I have the impression that
> her somewhat neglected transparency in addition to the rapid technological
> developments. This is only my own personal opinion and does not criticize
> any individual.
> I appreciate your work very much and can understand as Computer Science with
> Business what it means for a development work.
>
> Can you give me more information on the points mentioned above?
>
> P.S.: I hope I was not misunderstood. I want to look more behind the scenes
> of Tajo and learn to understand the technical background and the birth and
> historical development of Tajo.
>
> Best regards,
> Chris
>
>