You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tajo.apache.org by Eli Reisman <ap...@gmail.com> on 2013/11/10 21:02:53 UTC

Re: [DISCUSS] next roadmap

+1 for bumping Hadoop/YARN compatibility to Hadoop 2.2.x GA release, Giraph
recently made the same move. The Hadoop 2 alpha and beta lines changed the
API's too often and starting with Hadoop 2.2 those API's to interact with
YARN are guaranteed to stay stable long-term for the first time.

I think because of this (and the low adoption of YARN clusters during the
Hadoop 2-alpha and beta periods) most users with YARN clusters will move to
Hadoop 2.2 quickly.

This _does_ break backward compatibility with earlier Hadoop 2 lines to
some extent, which is something to consider, but again I think its worth it
since the new API will be stable and we can really build on it. The
abstractions in the new API also wrap/abstract away some potential errors
client applications can make when negotiating resources with a YARN cluster.



On Thu, Oct 24, 2013 at 9:12 PM, Hyunsik Choi <hy...@apache.org> wrote:

> I've rearranged the your suggestions, and then I've updated the
> RoadMap page (https://wiki.apache.org/tajo/Roadmap).
> If you have any suggestions or any ideas, feel free to ask me or
> modify the wiki.
>
> Cheers,
> Hyunsik Choi
>
> On Fri, Oct 25, 2013 at 11:58 AM, Hyunsik Choi <hy...@apache.org> wrote:
> > That's good. If you guys agree with this, we will plan that the next
> > version is 0.8.
> >
> > On Thu, Oct 24, 2013 at 10:15 AM, Jihoon Son <gh...@gmail.com> wrote:
> >> +1
> >>
> >> I agree with that.
> >>
> >> Jihoon
> >>
> >> 2013/10/24 Hyunsik Choi <hy...@apache.org>
> >>
> >>> Many projects jump according to their maturities. For example, Giraph
> >>> has had two releases: 0.1 and 1.0, and Drill already released 1.0-M1
> >>> as the first release.
> >>>
> >>> If we keep the current development progress, it is expected that we
> >>> can announce GA of Tajo in the early next year. So, the version jump
> >>> looks reasonable for me. Actually, Tajo has been developed since 2010.
> >>> In some aspects, 0.2 is too low.
> >>>
> >>> However, this release was already fixed to 0.2. So, if other guys
> >>> agree, the next version could have higher version close to 1.0.
> >>>
> >>> - hyunsik
> >>>
> >>> On Thu, Oct 24, 2013 at 12:00 AM, Jihoon Son <gh...@gmail.com>
> wrote:
> >>> > Keuntae
> >>> >
> >>> > You have a point, but I think that the version 0.2 is appropriate
> for the
> >>> > FIRST release.
> >>> > How about other guys?
> >>> >
> >>> > Jihoon
> >>> >
> >>> > 2013/10/23 ktpark <si...@gmail.com>
> >>> >
> >>> >> I agree with you and I think current version number (0.2) is too
> low to
> >>> >> appeal the maturity of Tajo :)
> >>> >> How about raise the version number more for general availability?
> >>> >>
> >>> >> And, window functions will be greatly helpful in boosting Tajo
> usage in
> >>> my
> >>> >> company.
> >>> >>
> >>> >> 2013. 10. 23., 오후 12:21, Jinho Kim <ji...@gmail.com> 작성:
> >>> >>
> >>> >> > +1
> >>> >> >
> >>> >> > I agree with us and we need to bump up the version of hadoop
> 2.2.0 GA
> >>> >> >
> >>> >> > --Jinho
> >>> >> > Best regards
> >>> >> >
> >>> >> >
> >>> >> > 2013/10/23 Hyunsik Choi <hy...@apache.org>
> >>> >> >
> >>> >> >> +1
> >>> >> >> Since Tajo also aims at one of Hadoop eco-systems, HCatalog is
> also
> >>> >> >> very important. Actually, I'm looking forward to this feature.
> >>> >> >>
> >>> >> >> - hyunsik
> >>> >> >>
> >>> >> >> On Wed, Oct 23, 2013 at 11:36 AM, JaeHwa Jung <jhjung@gruter.com
> >
> >>> >> wrote:
> >>> >> >>> Hyunsik.
> >>> >> >>>
> >>> >> >>> I also agree with you.
> >>> >> >>>
> >>> >> >>> And I think that HCatalogInterface is very important feature.
> >>> >> >>> I'm expecting lots of hive users to use Tajo easily with it.
> >>> >> >>>
> >>> >> >>> I already began to work for it,
> >>> >> >>> and the JIRA issue about it is as follows:
> >>> >> >>> https://issues.apache.org/jira/browse/TAJO-16.
> >>> >> >>>
> >>> >> >>> Thanks,
> >>> >> >>> Hyunsik
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> 2013/10/23 Jihoon Son <gh...@gmail.com>
> >>> >> >>>
> >>> >> >>>> Hyunsik,
> >>> >> >>>>
> >>> >> >>>> I totally agree with you.
> >>> >> >>>> "CREATE TABLE ... PARTITION BY" will be very useful for uses.
> >>> >> >>>>
> >>> >> >>>> Also, it is important to support group by extensions (GROUP BY
> >>> CUBE,
> >>> >> >> ROLL
> >>> >> >>>> UP, GROUPING SETS).
> >>> >> >>>> The relative issue is already created at
> >>> >> >>>> https://issues.apache.org/jira/browse/TAJO-256,
> >>> >> >>>> and I started some required works to support it.
> >>> >> >>>>
> >>> >> >>>> Thanks,
> >>> >> >>>> Jihoon
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> 2013/10/23 Hyunsik Choi <hy...@apache.org>
> >>> >> >>>>
> >>> >> >>>>> Hi folks,
> >>> >> >>>>>
> >>> >> >>>>> I would like to discuss the next roadmap. There are many
> TODOs. We
> >>> >> >>>>> need to make an priority order of features. Could you suggest
> >>> higher
> >>> >> >>>>> priority features?
> >>> >> >>>>>
> >>> >> >>>>> I think "CREATE TABLE ... PARTITIONED BY" is very important
> >>> feature
> >>> >> >>>>> for large-scale data. Also, more SQL functions should be
> added to
> >>> >> >>>>> Tajo.
> >>> >> >>>>>
> >>> >> >>>>> Feel free to suggest your ideas. After this discuss, I'll
> write
> >>> the
> >>> >> >>>>> discussion result to Roadmap wiki page.
> >>> >> >>>>>
> >>> >> >>>>> - hyunsik
> >>> >> >>>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>>
> >>> >> >>>> --
> >>> >> >>>> Jihoon Son
> >>> >> >>>>
> >>> >> >>>> Database & Information Systems Group,
> >>> >> >>>> Prof. Yon Dohn Chung Lab.
> >>> >> >>>> Dept. of Computer Science & Engineering,
> >>> >> >>>> Korea University
> >>> >> >>>> 1, 5-ga, Anam-dong, Seongbuk-gu,
> >>> >> >>>> Seoul, 136-713, Republic of Korea
> >>> >> >>>>
> >>> >> >>>> Tel : +82-2-3290-3580
> >>> >> >>>> E-mail : jihoonson@korea.ac.kr
> >>> >> >>>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> ----------------------------
> >>> >> >>> 정재화 / jhjung@gruter.com
> >>> >> >>> (주)그루터
> >>> >> >>> www.gruter.com
> >>> >> >>> Cloud, Search and Social
> >>> >> >>> ----------------------------
> >>> >> >>
> >>> >>
> >>> >>
> >>> >
> >>> >
> >>> > --
> >>> > Jihoon Son
> >>> >
> >>> > Database & Information Systems Group,
> >>> > Prof. Yon Dohn Chung Lab.
> >>> > Dept. of Computer Science & Engineering,
> >>> > Korea University
> >>> > 1, 5-ga, Anam-dong, Seongbuk-gu,
> >>> > Seoul, 136-713, Republic of Korea
> >>> >
> >>> > Tel : +82-2-3290-3580
> >>> > E-mail : jihoonson@korea.ac.kr
> >>>
> >>
> >>
> >>
> >> --
> >> Jihoon Son
> >>
> >> Database & Information Systems Group,
> >> Prof. Yon Dohn Chung Lab.
> >> Dept. of Computer Science & Engineering,
> >> Korea University
> >> 1, 5-ga, Anam-dong, Seongbuk-gu,
> >> Seoul, 136-713, Republic of Korea
> >>
> >> Tel : +82-2-3290-3580
> >> E-mail : jihoonson@korea.ac.kr
>

Re: [DISCUSS] next roadmap

Posted by Hyunsik Choi <hy...@apache.org>.

Hi Eli,

Thank you for great advice.

As far as I know, there are few users who use Tajo in a yarn cluster
mode. In my opinion, we do not need to keep backward compatibility. I
also think that the new API which provides wrapper and helper classes
will simplify Tajo's Yarn components.

In addition, I'm considering two Yarn cluster modes. The first mode is
on-demand mode. In the on-demand mode, QueryMaster plays a role of an
ApplicationMaster. In Tajo, a query consists of multiple execution
blocks (i.e., steps). For each execution block, QueryMaster asks
YarnResourceManager to allocate necessary resources. Workers are
launched on allocated containers and are reused within an execution
block. The current implementation is based on this mode. This mode has
some initial overhead.

The second mode is a reserve mode. This is a concept similar to Llama
(http://cloudera.github.io/llama/) and Hoya
(http://hortonworks.com/blog/introducing-hoya-hbase-on-yarn/). In this
mode, an AM launches TajoMaster and Workers. They works until they
shutdown. This mode shows very low latency response times.

Ultimately, both modes would be useful depending on use cases. But, we
have only limited development resources now. We need to concentrate on
one mode right now. So, I'm thinking which one is more common usage in
real environments. If you have any idea, feel free to share your idea.

Best regards,
Hyunsik Choi

On Mon, Nov 11, 2013 at 5:02 AM, Eli Reisman <ap...@gmail.com> wrote:
> +1 for bumping Hadoop/YARN compatibility to Hadoop 2.2.x GA release, Giraph
> recently made the same move. The Hadoop 2 alpha and beta lines changed the
> API's too often and starting with Hadoop 2.2 those API's to interact with
> YARN are guaranteed to stay stable long-term for the first time.
>
> I think because of this (and the low adoption of YARN clusters during the
> Hadoop 2-alpha and beta periods) most users with YARN clusters will move to
> Hadoop 2.2 quickly.
>
> This _does_ break backward compatibility with earlier Hadoop 2 lines to
> some extent, which is something to consider, but again I think its worth it
> since the new API will be stable and we can really build on it. The
> abstractions in the new API also wrap/abstract away some potential errors
> client applications can make when negotiating resources with a YARN cluster.
>
>
>
> On Thu, Oct 24, 2013 at 9:12 PM, Hyunsik Choi <hy...@apache.org> wrote:
>
>> I've rearranged the your suggestions, and then I've updated the
>> RoadMap page (https://wiki.apache.org/tajo/Roadmap).
>> If you have any suggestions or any ideas, feel free to ask me or
>> modify the wiki.
>>
>> Cheers,
>> Hyunsik Choi
>>
>> On Fri, Oct 25, 2013 at 11:58 AM, Hyunsik Choi <hy...@apache.org> wrote:
>> > That's good. If you guys agree with this, we will plan that the next
>> > version is 0.8.
>> >
>> > On Thu, Oct 24, 2013 at 10:15 AM, Jihoon Son <gh...@gmail.com> wrote:
>> >> +1
>> >>
>> >> I agree with that.
>> >>
>> >> Jihoon
>> >>
>> >> 2013/10/24 Hyunsik Choi <hy...@apache.org>
>> >>
>> >>> Many projects jump according to their maturities. For example, Giraph
>> >>> has had two releases: 0.1 and 1.0, and Drill already released 1.0-M1
>> >>> as the first release.
>> >>>
>> >>> If we keep the current development progress, it is expected that we
>> >>> can announce GA of Tajo in the early next year. So, the version jump
>> >>> looks reasonable for me. Actually, Tajo has been developed since 2010.
>> >>> In some aspects, 0.2 is too low.
>> >>>
>> >>> However, this release was already fixed to 0.2. So, if other guys
>> >>> agree, the next version could have higher version close to 1.0.
>> >>>
>> >>> - hyunsik
>> >>>
>> >>> On Thu, Oct 24, 2013 at 12:00 AM, Jihoon Son <gh...@gmail.com>
>> wrote:
>> >>> > Keuntae
>> >>> >
>> >>> > You have a point, but I think that the version 0.2 is appropriate
>> for the
>> >>> > FIRST release.
>> >>> > How about other guys?
>> >>> >
>> >>> > Jihoon
>> >>> >
>> >>> > 2013/10/23 ktpark <si...@gmail.com>
>> >>> >
>> >>> >> I agree with you and I think current version number (0.2) is too
>> low to
>> >>> >> appeal the maturity of Tajo :)
>> >>> >> How about raise the version number more for general availability?
>> >>> >>
>> >>> >> And, window functions will be greatly helpful in boosting Tajo
>> usage in
>> >>> my
>> >>> >> company.
>> >>> >>
>> >>> >> 2013. 10. 23., 오후 12:21, Jinho Kim <ji...@gmail.com> 작성:
>> >>> >>
>> >>> >> > +1
>> >>> >> >
>> >>> >> > I agree with us and we need to bump up the version of hadoop
>> 2.2.0 GA
>> >>> >> >
>> >>> >> > --Jinho
>> >>> >> > Best regards
>> >>> >> >
>> >>> >> >
>> >>> >> > 2013/10/23 Hyunsik Choi <hy...@apache.org>
>> >>> >> >
>> >>> >> >> +1
>> >>> >> >> Since Tajo also aims at one of Hadoop eco-systems, HCatalog is
>> also
>> >>> >> >> very important. Actually, I'm looking forward to this feature.
>> >>> >> >>
>> >>> >> >> - hyunsik
>> >>> >> >>
>> >>> >> >> On Wed, Oct 23, 2013 at 11:36 AM, JaeHwa Jung <jhjung@gruter.com
>> >
>> >>> >> wrote:
>> >>> >> >>> Hyunsik.
>> >>> >> >>>
>> >>> >> >>> I also agree with you.
>> >>> >> >>>
>> >>> >> >>> And I think that HCatalogInterface is very important feature.
>> >>> >> >>> I'm expecting lots of hive users to use Tajo easily with it.
>> >>> >> >>>
>> >>> >> >>> I already began to work for it,
>> >>> >> >>> and the JIRA issue about it is as follows:
>> >>> >> >>> https://issues.apache.org/jira/browse/TAJO-16.
>> >>> >> >>>
>> >>> >> >>> Thanks,
>> >>> >> >>> Hyunsik
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> 2013/10/23 Jihoon Son <gh...@gmail.com>
>> >>> >> >>>
>> >>> >> >>>> Hyunsik,
>> >>> >> >>>>
>> >>> >> >>>> I totally agree with you.
>> >>> >> >>>> "CREATE TABLE ... PARTITION BY" will be very useful for uses.
>> >>> >> >>>>
>> >>> >> >>>> Also, it is important to support group by extensions (GROUP BY
>> >>> CUBE,
>> >>> >> >> ROLL
>> >>> >> >>>> UP, GROUPING SETS).
>> >>> >> >>>> The relative issue is already created at
>> >>> >> >>>> https://issues.apache.org/jira/browse/TAJO-256,
>> >>> >> >>>> and I started some required works to support it.
>> >>> >> >>>>
>> >>> >> >>>> Thanks,
>> >>> >> >>>> Jihoon
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> 2013/10/23 Hyunsik Choi <hy...@apache.org>
>> >>> >> >>>>
>> >>> >> >>>>> Hi folks,
>> >>> >> >>>>>
>> >>> >> >>>>> I would like to discuss the next roadmap. There are many
>> TODOs. We
>> >>> >> >>>>> need to make an priority order of features. Could you suggest
>> >>> higher
>> >>> >> >>>>> priority features?
>> >>> >> >>>>>
>> >>> >> >>>>> I think "CREATE TABLE ... PARTITIONED BY" is very important
>> >>> feature
>> >>> >> >>>>> for large-scale data. Also, more SQL functions should be
>> added to
>> >>> >> >>>>> Tajo.
>> >>> >> >>>>>
>> >>> >> >>>>> Feel free to suggest your ideas. After this discuss, I'll
>> write
>> >>> the
>> >>> >> >>>>> discussion result to Roadmap wiki page.
>> >>> >> >>>>>
>> >>> >> >>>>> - hyunsik
>> >>> >> >>>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>>
>> >>> >> >>>> --
>> >>> >> >>>> Jihoon Son
>> >>> >> >>>>
>> >>> >> >>>> Database & Information Systems Group,
>> >>> >> >>>> Prof. Yon Dohn Chung Lab.
>> >>> >> >>>> Dept. of Computer Science & Engineering,
>> >>> >> >>>> Korea University
>> >>> >> >>>> 1, 5-ga, Anam-dong, Seongbuk-gu,
>> >>> >> >>>> Seoul, 136-713, Republic of Korea
>> >>> >> >>>>
>> >>> >> >>>> Tel : +82-2-3290-3580
>> >>> >> >>>> E-mail : jihoonson@korea.ac.kr
>> >>> >> >>>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>>
>> >>> >> >>> --
>> >>> >> >>> ----------------------------
>> >>> >> >>> 정재화 / jhjung@gruter.com
>> >>> >> >>> (주)그루터
>> >>> >> >>> www.gruter.com
>> >>> >> >>> Cloud, Search and Social
>> >>> >> >>> ----------------------------
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Jihoon Son
>> >>> >
>> >>> > Database & Information Systems Group,
>> >>> > Prof. Yon Dohn Chung Lab.
>> >>> > Dept. of Computer Science & Engineering,
>> >>> > Korea University
>> >>> > 1, 5-ga, Anam-dong, Seongbuk-gu,
>> >>> > Seoul, 136-713, Republic of Korea
>> >>> >
>> >>> > Tel : +82-2-3290-3580
>> >>> > E-mail : jihoonson@korea.ac.kr
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Jihoon Son
>> >>
>> >> Database & Information Systems Group,
>> >> Prof. Yon Dohn Chung Lab.
>> >> Dept. of Computer Science & Engineering,
>> >> Korea University
>> >> 1, 5-ga, Anam-dong, Seongbuk-gu,
>> >> Seoul, 136-713, Republic of Korea
>> >>
>> >> Tel : +82-2-3290-3580
>> >> E-mail : jihoonson@korea.ac.kr
>>