You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Nam Đỗ Duy via user <us...@kylin.apache.org> on 2023/12/01 03:10:34 UTC

Pinot/Kylin/Druid quick comparision

Dear Xiaoxiang,
Sirs/Madams,

May I post my boss's question:

What are the pros and cons of the OLAP platform Kylin compared to Pinot and
Druid?

Please kindly let me know

Thank you very much and best regards

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you Li Yang, I think the development of version 5 would be hard
work for you but the impact is big so please keep me posted!

All the best

On Thu, Mar 14, 2024 at 10:51 AM Li Yang <li...@apache.org> wrote:

> Nam,
>
> We are planning to release a kylin5-beta around March or April. The GA of
> kylin5 would be around July this year if everything goes well.
>
> Cheers
> Yang
>
> On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hello Xiaoxiang,
>>
>> How are you, my boss is very interested in Kylin 5. so he would like to
>> know when Kylin 5 will be released...could you please provide an
>> estimation?
>>
>> Thank you very much and best regards
>>
>>
>>
>>
>>
>> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>
>> > Good morning Xiaoxiang, hope you are well
>> >
>> > 1. JDBC source is a feature which in development, it will be supported
>> > later.
>> >
>> > ===============
>> >
>> > May I know when will the JDBC be available? as well as is there any
>> change
>> > in Kylin 5 release date
>> >
>> > Thank you and best regards
>> >
>> >
>> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> >> 1. JDBC source is a feature which in development, it will be supported
>> >> later.
>> >>
>> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> >> (I will let you know.)
>> >>
>> >> 3. I think ranger and Kerberos are not doing the same things, one for
>> >> authentication, one for authorization. So they cannot replace each
>> other.
>> >> Ranger can integrate with Kerberos, please check ranger's website for
>> >> information.
>> >>
>> >> ------------------------
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >>
>> >> > Thank you Xiaoxiang for your reply
>> >> >
>> >> > ————————————-
>> >> > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > ————————————-
>> >> > Yes: please answer to help me clear this headache:
>> >> >
>> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse
>> ?
>> >> If
>> >> > not then do we have any work around?
>> >> >
>> >> > 2. My team is using kerberos for authentication, do you have any
>> >> > document/casestudy about integrating kerberos with kylin 4.x and
>> kylin
>> >> 5.x
>> >> >
>> >> > 3. Should we use apache ranger instead of kerberos for authentication
>> >> and
>> >> > for security purposes?
>> >> >
>> >> > Thank you again
>> >> >
>> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >
>> >> > > I guess the release date should be 2024/01 .
>> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > >
>> >> > > ------------------------
>> >> > > With warm regard
>> >> > > Xiaoxiang Yu
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> > wrote:
>> >> > >
>> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> >> > already
>> >> > >> so there is no time for you to comment. Next time I will send you
>> in
>> >> > >> advance. The meeting result was that we will implement both druid
>> and
>> >> > >> kylin
>> >> > >> in the next couple of projects because of its realtime feature.
>> Hope
>> >> > that
>> >> > >> kylin will have same feature soon.
>> >> > >>
>> >> > >> May I ask when will you release kylin 5.0?
>> >> > >>
>> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> > >>
>> >> > >> > Since 2018 there are a lot of new features and code refactor.
>> >> > >> > If you like, you can share your ppt to me privately, maybe I can
>> >> > >> > give some comments.
>> >> > >> >
>> >> > >> > Here is the reference of advantages of Kylin since 2018:
>> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> >> > >> > -
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >> > >> >
>> >> > >> > ------------------------
>> >> > >> > With warm regard
>> >> > >> > Xiaoxiang Yu
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> > >> wrote:
>> >> > >> >
>> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
>> and
>> >> > >> Druid in
>> >> > >> >> my team.
>> >> > >> >>
>> >> > >> >> I found this article and would like you to update me the
>> >> advantages
>> >> > of
>> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> >> released)
>> >> > >> >>
>> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1
>> of
>> >> 2)?
>> >> > >> >> <
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn>
>> wrote:
>> >> > >> >>
>> >> > >> >> > Thank you very much for your prompt response, I still have
>> >> several
>> >> > >> >> > questions to seek for your help later.
>> >> > >> >> >
>> >> > >> >> > Best regards and have a good day
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> > wrote:
>> >> > >> >> >
>> >> > >> >> >> Done. Github branch changed to kylin5.
>> >> > >> >> >>
>> >> > >> >> >> ------------------------
>> >> > >> >> >> With warm regard
>> >> > >> >> >> Xiaoxiang Yu
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <
>> xxyu@apache.org>
>> >> > >> wrote:
>> >> > >> >> >>
>> >> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > >> >> >> > ------------------------
>> >> > >> >> >> > With warm regard
>> >> > >> >> >> > Xiaoxiang Yu
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>> >> > <namdd@vnpay.vn.invalid
>> >> > >> >
>> >> > >> >> >> wrote:
>> >> > >> >> >> >
>> >> > >> >> >> >> Thank you Xiaoxiang, please update me when you have
>> changed
>> >> > your
>> >> > >> >> >> default
>> >> > >> >> >> >> branch. In case people are impressed by the numbers then
>> I
>> >> hope
>> >> > >> to
>> >> > >> >> turn
>> >> > >> >> >> >> this situation to reverse direction.
>> >> > >> >> >> >>
>> >> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
>> >> xxyu@apache.org>
>> >> > >> >> wrote:
>> >> > >> >> >> >>
>> >> > >> >> >> >>> The default branch is for 4.X which is a maintained
>> branch,
>> >> > the
>> >> > >> >> active
>> >> > >> >> >> >>> branch is kylin5.
>> >> > >> >> >> >>> I will change the default branch to kylin5 later.
>> >> > >> >> >> >>>
>> >> > >> >> >> >>> ------------------------
>> >> > >> >> >> >>> With warm regard
>> >> > >> >> >> >>> Xiaoxiang Yu
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>
>> >> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> >> > >> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>> wrote:
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> Can you see the atttached photo
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> My boss asked that why druid commit code regularly but
>> >> kylin
>> >> > >> had
>> >> > >> >> not
>> >> > >> >> >> >>>> been committed since July
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <
>> xxyu@apache.org
>> >> >
>> >> > >> wrote:
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>>> I think so.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> Response time is not the only factor to make a
>> decision.
>> >> > Kylin
>> >> > >> >> could
>> >> > >> >> >> >>>>> be cheaper
>> >> > >> >> >> >>>>> when the query pattern is suitable for the Kylin
>> model,
>> >> and
>> >> > >> Kylin
>> >> > >> >> >> can
>> >> > >> >> >> >>>>> guarantee
>> >> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker
>> in
>> >> an
>> >> > ad
>> >> > >> hoc
>> >> > >> >> >> >>>>> query scenario.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them
>> together to
>> >> > >> provide
>> >> > >> >> >> >>>>> unified data analytics services for their customers.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> ------------------------
>> >> > >> >> >> >>>>> With warm regard
>> >> > >> >> >> >>>>> Xiaoxiang Yu
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> >> > >> <namdd@vnpay.vn.invalid
>> >> > >> >> >
>> >> > >> >> >> >>>>> wrote:
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> In case my client uses cloud computing service like
>> gcp
>> >> or
>> >> > >> aws,
>> >> > >> >> >> which
>> >> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
>> >> > clickhouse
>> >> > >> >> >> (incase
>> >> > >> >> >> >>>>>> of
>> >> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
>> >> been
>> >> > >> done
>> >> > >> >> once
>> >> > >> >> >> >>>>>> and
>> >> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses
>> less
>> >> > cloud
>> >> > >> >> >> >>>>>> computation,
>> >> > >> >> >> >>>>>> is that true)?
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
>> >> > xxyu@apache.org
>> >> > >> >
>> >> > >> >> >> wrote:
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> > Following text is part of an article(
>> >> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>>
>> >> > >> >> >>
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> ===============================================================================
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with
>> fixed
>> >> > modes
>> >> > >> >> >> because
>> >> > >> >> >> >>>>>> of its
>> >> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
>> >> by,
>> >> > and
>> >> > >> >> where
>> >> > >> >> >> >>>>>> condition
>> >> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger
>> the
>> >> > data
>> >> > >> >> >> volume
>> >> > >> >> >> >>>>>> is, the
>> >> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> >> > >> particular,
>> >> > >> >> >> >>>>>> Kylin is
>> >> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
>> >> de-emphasis
>> >> > >> >> (count
>> >> > >> >> >> >>>>>> distinct),
>> >> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
>> >> advantages
>> >> > in
>> >> > >> >> >> >>>>>> de-weighting
>> >> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
>> >> scenarios
>> >> > are
>> >> > >> >> >> >>>>>> especially
>> >> > >> >> >> >>>>>> > huge, and it is used in a large number of
>> scenarios,
>> >> such
>> >> > >> as
>> >> > >> >> >> >>>>>> Dashboard, all
>> >> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
>> >> > statistics,
>> >> > >> >> and
>> >> > >> >> >> user
>> >> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
>> >> etc.
>> >> > use
>> >> > >> >> Kylin
>> >> > >> >> >> >>>>>> to build
>> >> > >> >> >> >>>>>> > their data service platforms, providing millions to
>> >> tens
>> >> > of
>> >> > >> >> >> >>>>>> millions of
>> >> > >> >> >> >>>>>> > queries per day, and most of the queries can be
>> >> completed
>> >> > >> >> within
>> >> > >> >> >> 2
>> >> > >> >> >> >>>>>> - 3
>> >> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
>> >> high
>> >> > >> >> >> concurrency
>> >> > >> >> >> >>>>>> > scenario.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has
>> high
>> >> > >> >> computing
>> >> > >> >> >> >>>>>> power and
>> >> > >> >> >> >>>>>> > is more suitable when the query request is more
>> >> flexible,
>> >> > >> or
>> >> > >> >> when
>> >> > >> >> >> >>>>>> there is
>> >> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
>> >> > Scenarios
>> >> > >> >> >> >>>>>> include: very
>> >> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
>> >> > combined
>> >> > >> >> with
>> >> > >> >> >> the
>> >> > >> >> >> >>>>>> user
>> >> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency
>> of
>> >> > >> complex
>> >> > >> >> >> >>>>>> on-the-spot
>> >> > >> >> >> >>>>>> > query and so on. If the amount of data and access
>> is
>> >> > large,
>> >> > >> >> you
>> >> > >> >> >> >>>>>> need to
>> >> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
>> >> > higher
>> >> > >> >> >> >>>>>> challenge for
>> >> > >> >> >> >>>>>> > operation and maintenance.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > If some queries are very flexible but infrequent,
>> it
>> >> is
>> >> > >> more
>> >> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
>> >> number
>> >> > >> of
>> >> > >> >> >> >>>>>> queries is
>> >> > >> >> >> >>>>>> > small, even if each query consumes a lot of
>> >> computational
>> >> > >> >> >> >>>>>> resources, it is
>> >> > >> >> >> >>>>>> > still cost-effective overall. If some queries have
>> a
>> >> > fixed
>> >> > >> >> >> pattern
>> >> > >> >> >> >>>>>> and the
>> >> > >> >> >> >>>>>> > query volume is large, it is more suitable for
>> Kylin,
>> >> > >> because
>> >> > >> >> the
>> >> > >> >> >> >>>>>> query
>> >> > >> >> >> >>>>>> > volume is large, and by using large computational
>> >> > >> resources to
>> >> > >> >> >> save
>> >> > >> >> >> >>>>>> the
>> >> > >> >> >> >>>>>> > results, the upfront computational cost can be
>> >> amortized
>> >> > >> over
>> >> > >> >> >> each
>> >> > >> >> >> >>>>>> query,
>> >> > >> >> >> >>>>>> > so it is the most economical.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > ------------------------
>> >> > >> >> >> >>>>>> > With warm regard
>> >> > >> >> >> >>>>>> > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> >> > >> >> <namdd@vnpay.vn.invalid
>> >> > >> >> >> >
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time
>> streaming
>> >> > >> feature.
>> >> > >> >> >> >>>>>> That's
>> >> > >> >> >> >>>>>> >> great.
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
>> >> team:
>> >> > >> >> >> clickhouse
>> >> > >> >> >> >>>>>> offered
>> >> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
>> >> > millisecond
>> >> > >> >> which
>> >> > >> >> >> is
>> >> > >> >> >> >>>>>> faster
>> >> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
>> >> calculating 1
>> >> > >> >> billion
>> >> > >> >> >> >>>>>> rows in
>> >> > >> >> >> >>>>>> >> 2.9
>> >> > >> >> >> >>>>>> >> seconds)
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin
>> over
>> >> > >> >> clickhouse
>> >> > >> >> >> so
>> >> > >> >> >> >>>>>> that I
>> >> > >> >> >> >>>>>> >> can defend my demonstration.
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> >> > >> xxyu@apache.org
>> >> > >> >> >
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
>> >> analytics,
>> >> > >> the
>> >> > >> >> >> reason
>> >> > >> >> >> >>>>>> here is
>> >> > >> >> >> >>>>>> >> > that
>> >> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
>> >> segment
>> >> > >> >> build,
>> >> > >> >> >> is
>> >> > >> >> >> >>>>>> that
>> >> > >> >> >> >>>>>> >> > correct?"
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > You are correct.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
>> >> > work-around
>> >> > >> of
>> >> > >> >> >> >>>>>> combination
>> >> > >> >> >> >>>>>> >> of
>> >> > >> >> >> >>>>>> >> > ... "
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
>> >> streaming(coding is
>> >> > >> >> >> completed
>> >> > >> >> >> >>>>>> but not
>> >> > >> >> >> >>>>>> >> > released),
>> >> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
>> >> minutes(that is
>> >> > >> my
>> >> > >> >> >> >>>>>> estimation
>> >> > >> >> >> >>>>>> >> but I
>> >> > >> >> >> >>>>>> >> > am
>> >> > >> >> >> >>>>>> >> > quite certain about it).
>> >> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a
>> job
>> >> and
>> >> > >> do
>> >> > >> >> >> >>>>>> micro-batch
>> >> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
>> >> price is
>> >> > >> that
>> >> > >> >> >> you
>> >> > >> >> >> >>>>>> need to
>> >> > >> >> >> >>>>>> >> run
>> >> > >> >> >> >>>>>> >> > and monitor a long-running
>> >> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming,
>> so
>> >> you
>> >> > >> need
>> >> > >> >> >> >>>>>> knowledge of
>> >> > >> >> >> >>>>>> >> > it.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
>> >> your
>> >> > >> >> customers
>> >> > >> >> >> >>>>>> >> > can tolerate?
>> >> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok
>> for
>> >> > most
>> >> > >> >> >> cases.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > ------------------------
>> >> > >> >> >> >>>>>> >> > With warm regard
>> >> > >> >> >> >>>>>> >> > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>>>>> >> wrote:
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > > Druid is better in
>> >> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > ==========================
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > In this important scenario of realtime
>> alalytics,
>> >> > the
>> >> > >> >> reason
>> >> > >> >> >> >>>>>> here is
>> >> > >> >> >> >>>>>> >> that
>> >> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
>> >> > segment
>> >> > >> >> build,
>> >> > >> >> >> >>>>>> is that
>> >> > >> >> >> >>>>>> >> > > correct?
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
>> >> work-around
>> >> > of
>> >> > >> >> >> >>>>>> combination of
>> >> > >> >> >> >>>>>> >> :
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB
>> update) to
>> >> > >> provide
>> >> > >> >> >> >>>>>> >> > > realtime capability ?
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime
>> DB
>> >> > >> update)
>> >> > >> >> and
>> >> > >> >> >> >>>>>> >> integrate it
>> >> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> > >> >> >> xxyu@apache.org>
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years
>> ago(I
>> >> > don't
>> >> > >> >> know
>> >> > >> >> >> too
>> >> > >> >> >> >>>>>> much
>> >> > >> >> >> >>>>>> >> about
>> >> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> >> > >> features
>> >> > >> >> >> that I
>> >> > >> >> >> >>>>>> know
>> >> > >> >> >> >>>>>> >> are :
>> >> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > Here are some cases you should consider
>> using
>> >> > Druid
>> >> > >> >> other
>> >> > >> >> >> >>>>>> than Kylin
>> >> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to
>> compare
>> >> the
>> >> > >> >> Druid
>> >> > >> >> >> >>>>>> which I
>> >> > >> >> >> >>>>>> >> used
>> >> > >> >> >> >>>>>> >> > two
>> >> > >> >> >> >>>>>> >> > > > years ago):
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka
>> etc.
>> >> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
>> >> result,
>> >> > I
>> >> > >> >> think
>> >> > >> >> >> >>>>>> Druid had
>> >> > >> >> >> >>>>>> >> > > better
>> >> > >> >> >> >>>>>> >> > > > response time for small queries two years
>> ago.)
>> >> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
>> >> want to
>> >> > >> use
>> >> > >> >> the
>> >> > >> >> >> >>>>>> >> K8S/public
>> >> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment
>> platform.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
>> >> which
>> >> > >> Kylin
>> >> > >> >> >> could
>> >> > >> >> >> >>>>>> be
>> >> > >> >> >> >>>>>> >> better,
>> >> > >> >> >> >>>>>> >> > > > like:
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > - Better performance for complex/big
>> queries.
>> >> > Kylin
>> >> > >> can
>> >> > >> >> >> have
>> >> > >> >> >> >>>>>> a more
>> >> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
>> >> > >> >> >> >>>>>> >> > > >   Index for queries containing different
>> >> `Group By
>> >> > >> >> >> >>>>>> dimensions`.
>> >> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
>> >> moment)
>> >> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website
>> did
>> >> > not
>> >> > >> >> show
>> >> > >> >> >> it
>> >> > >> >> >> >>>>>> supports
>> >> > >> >> >> >>>>>> >> > ODBC
>> >> > >> >> >> >>>>>> >> > > > well)
>> >> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
>> >> than
>> >> > >> Druid.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
>> >> about
>> >> > >> it.
>> >> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share
>> your
>> >> > >> >> opinion.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > ------------------------
>> >> > >> >> >> >>>>>> >> > > > With warm regard
>> >> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>>>>> >> > > wrote:
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
>> >> platform
>> >> > >> Kylin
>> >> > >> >> >> >>>>>> compared to
>> >> > >> >> >> >>>>>> >> > Pinot
>> >> > >> >> >> >>>>>> >> > > >> and
>> >> > >> >> >> >>>>>> >> > > >> Druid?
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> Please kindly let me know
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >>
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you Li Yang, I think the development of version 5 would be hard
work for you but the impact is big so please keep me posted!

All the best

On Thu, Mar 14, 2024 at 10:51 AM Li Yang <li...@apache.org> wrote:

> Nam,
>
> We are planning to release a kylin5-beta around March or April. The GA of
> kylin5 would be around July this year if everything goes well.
>
> Cheers
> Yang
>
> On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hello Xiaoxiang,
>>
>> How are you, my boss is very interested in Kylin 5. so he would like to
>> know when Kylin 5 will be released...could you please provide an
>> estimation?
>>
>> Thank you very much and best regards
>>
>>
>>
>>
>>
>> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>
>> > Good morning Xiaoxiang, hope you are well
>> >
>> > 1. JDBC source is a feature which in development, it will be supported
>> > later.
>> >
>> > ===============
>> >
>> > May I know when will the JDBC be available? as well as is there any
>> change
>> > in Kylin 5 release date
>> >
>> > Thank you and best regards
>> >
>> >
>> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> >> 1. JDBC source is a feature which in development, it will be supported
>> >> later.
>> >>
>> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> >> (I will let you know.)
>> >>
>> >> 3. I think ranger and Kerberos are not doing the same things, one for
>> >> authentication, one for authorization. So they cannot replace each
>> other.
>> >> Ranger can integrate with Kerberos, please check ranger's website for
>> >> information.
>> >>
>> >> ------------------------
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >>
>> >> > Thank you Xiaoxiang for your reply
>> >> >
>> >> > ————————————-
>> >> > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > ————————————-
>> >> > Yes: please answer to help me clear this headache:
>> >> >
>> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse
>> ?
>> >> If
>> >> > not then do we have any work around?
>> >> >
>> >> > 2. My team is using kerberos for authentication, do you have any
>> >> > document/casestudy about integrating kerberos with kylin 4.x and
>> kylin
>> >> 5.x
>> >> >
>> >> > 3. Should we use apache ranger instead of kerberos for authentication
>> >> and
>> >> > for security purposes?
>> >> >
>> >> > Thank you again
>> >> >
>> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >
>> >> > > I guess the release date should be 2024/01 .
>> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> >> feature)?
>> >> > >
>> >> > > ------------------------
>> >> > > With warm regard
>> >> > > Xiaoxiang Yu
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> > wrote:
>> >> > >
>> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> >> > already
>> >> > >> so there is no time for you to comment. Next time I will send you
>> in
>> >> > >> advance. The meeting result was that we will implement both druid
>> and
>> >> > >> kylin
>> >> > >> in the next couple of projects because of its realtime feature.
>> Hope
>> >> > that
>> >> > >> kylin will have same feature soon.
>> >> > >>
>> >> > >> May I ask when will you release kylin 5.0?
>> >> > >>
>> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> > >>
>> >> > >> > Since 2018 there are a lot of new features and code refactor.
>> >> > >> > If you like, you can share your ppt to me privately, maybe I can
>> >> > >> > give some comments.
>> >> > >> >
>> >> > >> > Here is the reference of advantages of Kylin since 2018:
>> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> >> > >> > -
>> >> > >> >
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >> > >> >
>> >> > >> > ------------------------
>> >> > >> > With warm regard
>> >> > >> > Xiaoxiang Yu
>> >> > >> >
>> >> > >> >
>> >> > >> >
>> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> > >> wrote:
>> >> > >> >
>> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
>> and
>> >> > >> Druid in
>> >> > >> >> my team.
>> >> > >> >>
>> >> > >> >> I found this article and would like you to update me the
>> >> advantages
>> >> > of
>> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> >> released)
>> >> > >> >>
>> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1
>> of
>> >> 2)?
>> >> > >> >> <
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn>
>> wrote:
>> >> > >> >>
>> >> > >> >> > Thank you very much for your prompt response, I still have
>> >> several
>> >> > >> >> > questions to seek for your help later.
>> >> > >> >> >
>> >> > >> >> > Best regards and have a good day
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> > wrote:
>> >> > >> >> >
>> >> > >> >> >> Done. Github branch changed to kylin5.
>> >> > >> >> >>
>> >> > >> >> >> ------------------------
>> >> > >> >> >> With warm regard
>> >> > >> >> >> Xiaoxiang Yu
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <
>> xxyu@apache.org>
>> >> > >> wrote:
>> >> > >> >> >>
>> >> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > >> >> >> > ------------------------
>> >> > >> >> >> > With warm regard
>> >> > >> >> >> > Xiaoxiang Yu
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> >
>> >> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>> >> > <namdd@vnpay.vn.invalid
>> >> > >> >
>> >> > >> >> >> wrote:
>> >> > >> >> >> >
>> >> > >> >> >> >> Thank you Xiaoxiang, please update me when you have
>> changed
>> >> > your
>> >> > >> >> >> default
>> >> > >> >> >> >> branch. In case people are impressed by the numbers then
>> I
>> >> hope
>> >> > >> to
>> >> > >> >> turn
>> >> > >> >> >> >> this situation to reverse direction.
>> >> > >> >> >> >>
>> >> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
>> >> xxyu@apache.org>
>> >> > >> >> wrote:
>> >> > >> >> >> >>
>> >> > >> >> >> >>> The default branch is for 4.X which is a maintained
>> branch,
>> >> > the
>> >> > >> >> active
>> >> > >> >> >> >>> branch is kylin5.
>> >> > >> >> >> >>> I will change the default branch to kylin5 later.
>> >> > >> >> >> >>>
>> >> > >> >> >> >>> ------------------------
>> >> > >> >> >> >>> With warm regard
>> >> > >> >> >> >>> Xiaoxiang Yu
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>
>> >> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> >> > >> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>> wrote:
>> >> > >> >> >> >>>
>> >> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> Can you see the atttached photo
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> My boss asked that why druid commit code regularly but
>> >> kylin
>> >> > >> had
>> >> > >> >> not
>> >> > >> >> >> >>>> been committed since July
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <
>> xxyu@apache.org
>> >> >
>> >> > >> wrote:
>> >> > >> >> >> >>>>
>> >> > >> >> >> >>>>> I think so.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> Response time is not the only factor to make a
>> decision.
>> >> > Kylin
>> >> > >> >> could
>> >> > >> >> >> >>>>> be cheaper
>> >> > >> >> >> >>>>> when the query pattern is suitable for the Kylin
>> model,
>> >> and
>> >> > >> Kylin
>> >> > >> >> >> can
>> >> > >> >> >> >>>>> guarantee
>> >> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker
>> in
>> >> an
>> >> > ad
>> >> > >> hoc
>> >> > >> >> >> >>>>> query scenario.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them
>> together to
>> >> > >> provide
>> >> > >> >> >> >>>>> unified data analytics services for their customers.
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> ------------------------
>> >> > >> >> >> >>>>> With warm regard
>> >> > >> >> >> >>>>> Xiaoxiang Yu
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> >> > >> <namdd@vnpay.vn.invalid
>> >> > >> >> >
>> >> > >> >> >> >>>>> wrote:
>> >> > >> >> >> >>>>>
>> >> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> In case my client uses cloud computing service like
>> gcp
>> >> or
>> >> > >> aws,
>> >> > >> >> >> which
>> >> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
>> >> > clickhouse
>> >> > >> >> >> (incase
>> >> > >> >> >> >>>>>> of
>> >> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
>> >> been
>> >> > >> done
>> >> > >> >> once
>> >> > >> >> >> >>>>>> and
>> >> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses
>> less
>> >> > cloud
>> >> > >> >> >> >>>>>> computation,
>> >> > >> >> >> >>>>>> is that true)?
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
>> >> > xxyu@apache.org
>> >> > >> >
>> >> > >> >> >> wrote:
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>> > Following text is part of an article(
>> >> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>>
>> >> > >> >> >>
>> >> > >> >>
>> >> > >>
>> >> >
>> >>
>> ===============================================================================
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with
>> fixed
>> >> > modes
>> >> > >> >> >> because
>> >> > >> >> >> >>>>>> of its
>> >> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
>> >> by,
>> >> > and
>> >> > >> >> where
>> >> > >> >> >> >>>>>> condition
>> >> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger
>> the
>> >> > data
>> >> > >> >> >> volume
>> >> > >> >> >> >>>>>> is, the
>> >> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> >> > >> particular,
>> >> > >> >> >> >>>>>> Kylin is
>> >> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
>> >> de-emphasis
>> >> > >> >> (count
>> >> > >> >> >> >>>>>> distinct),
>> >> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
>> >> advantages
>> >> > in
>> >> > >> >> >> >>>>>> de-weighting
>> >> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
>> >> scenarios
>> >> > are
>> >> > >> >> >> >>>>>> especially
>> >> > >> >> >> >>>>>> > huge, and it is used in a large number of
>> scenarios,
>> >> such
>> >> > >> as
>> >> > >> >> >> >>>>>> Dashboard, all
>> >> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
>> >> > statistics,
>> >> > >> >> and
>> >> > >> >> >> user
>> >> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
>> >> etc.
>> >> > use
>> >> > >> >> Kylin
>> >> > >> >> >> >>>>>> to build
>> >> > >> >> >> >>>>>> > their data service platforms, providing millions to
>> >> tens
>> >> > of
>> >> > >> >> >> >>>>>> millions of
>> >> > >> >> >> >>>>>> > queries per day, and most of the queries can be
>> >> completed
>> >> > >> >> within
>> >> > >> >> >> 2
>> >> > >> >> >> >>>>>> - 3
>> >> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
>> >> high
>> >> > >> >> >> concurrency
>> >> > >> >> >> >>>>>> > scenario.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has
>> high
>> >> > >> >> computing
>> >> > >> >> >> >>>>>> power and
>> >> > >> >> >> >>>>>> > is more suitable when the query request is more
>> >> flexible,
>> >> > >> or
>> >> > >> >> when
>> >> > >> >> >> >>>>>> there is
>> >> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
>> >> > Scenarios
>> >> > >> >> >> >>>>>> include: very
>> >> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
>> >> > combined
>> >> > >> >> with
>> >> > >> >> >> the
>> >> > >> >> >> >>>>>> user
>> >> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency
>> of
>> >> > >> complex
>> >> > >> >> >> >>>>>> on-the-spot
>> >> > >> >> >> >>>>>> > query and so on. If the amount of data and access
>> is
>> >> > large,
>> >> > >> >> you
>> >> > >> >> >> >>>>>> need to
>> >> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
>> >> > higher
>> >> > >> >> >> >>>>>> challenge for
>> >> > >> >> >> >>>>>> > operation and maintenance.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > If some queries are very flexible but infrequent,
>> it
>> >> is
>> >> > >> more
>> >> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
>> >> number
>> >> > >> of
>> >> > >> >> >> >>>>>> queries is
>> >> > >> >> >> >>>>>> > small, even if each query consumes a lot of
>> >> computational
>> >> > >> >> >> >>>>>> resources, it is
>> >> > >> >> >> >>>>>> > still cost-effective overall. If some queries have
>> a
>> >> > fixed
>> >> > >> >> >> pattern
>> >> > >> >> >> >>>>>> and the
>> >> > >> >> >> >>>>>> > query volume is large, it is more suitable for
>> Kylin,
>> >> > >> because
>> >> > >> >> the
>> >> > >> >> >> >>>>>> query
>> >> > >> >> >> >>>>>> > volume is large, and by using large computational
>> >> > >> resources to
>> >> > >> >> >> save
>> >> > >> >> >> >>>>>> the
>> >> > >> >> >> >>>>>> > results, the upfront computational cost can be
>> >> amortized
>> >> > >> over
>> >> > >> >> >> each
>> >> > >> >> >> >>>>>> query,
>> >> > >> >> >> >>>>>> > so it is the most economical.
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > ------------------------
>> >> > >> >> >> >>>>>> > With warm regard
>> >> > >> >> >> >>>>>> > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> >> > >> >> <namdd@vnpay.vn.invalid
>> >> > >> >> >> >
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time
>> streaming
>> >> > >> feature.
>> >> > >> >> >> >>>>>> That's
>> >> > >> >> >> >>>>>> >> great.
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
>> >> team:
>> >> > >> >> >> clickhouse
>> >> > >> >> >> >>>>>> offered
>> >> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
>> >> > millisecond
>> >> > >> >> which
>> >> > >> >> >> is
>> >> > >> >> >> >>>>>> faster
>> >> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
>> >> calculating 1
>> >> > >> >> billion
>> >> > >> >> >> >>>>>> rows in
>> >> > >> >> >> >>>>>> >> 2.9
>> >> > >> >> >> >>>>>> >> seconds)
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin
>> over
>> >> > >> >> clickhouse
>> >> > >> >> >> so
>> >> > >> >> >> >>>>>> that I
>> >> > >> >> >> >>>>>> >> can defend my demonstration.
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> >> > >> xxyu@apache.org
>> >> > >> >> >
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
>> >> analytics,
>> >> > >> the
>> >> > >> >> >> reason
>> >> > >> >> >> >>>>>> here is
>> >> > >> >> >> >>>>>> >> > that
>> >> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
>> >> segment
>> >> > >> >> build,
>> >> > >> >> >> is
>> >> > >> >> >> >>>>>> that
>> >> > >> >> >> >>>>>> >> > correct?"
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > You are correct.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
>> >> > work-around
>> >> > >> of
>> >> > >> >> >> >>>>>> combination
>> >> > >> >> >> >>>>>> >> of
>> >> > >> >> >> >>>>>> >> > ... "
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
>> >> streaming(coding is
>> >> > >> >> >> completed
>> >> > >> >> >> >>>>>> but not
>> >> > >> >> >> >>>>>> >> > released),
>> >> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
>> >> minutes(that is
>> >> > >> my
>> >> > >> >> >> >>>>>> estimation
>> >> > >> >> >> >>>>>> >> but I
>> >> > >> >> >> >>>>>> >> > am
>> >> > >> >> >> >>>>>> >> > quite certain about it).
>> >> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a
>> job
>> >> and
>> >> > >> do
>> >> > >> >> >> >>>>>> micro-batch
>> >> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
>> >> price is
>> >> > >> that
>> >> > >> >> >> you
>> >> > >> >> >> >>>>>> need to
>> >> > >> >> >> >>>>>> >> run
>> >> > >> >> >> >>>>>> >> > and monitor a long-running
>> >> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming,
>> so
>> >> you
>> >> > >> need
>> >> > >> >> >> >>>>>> knowledge of
>> >> > >> >> >> >>>>>> >> > it.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
>> >> your
>> >> > >> >> customers
>> >> > >> >> >> >>>>>> >> > can tolerate?
>> >> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok
>> for
>> >> > most
>> >> > >> >> >> cases.
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > ------------------------
>> >> > >> >> >> >>>>>> >> > With warm regard
>> >> > >> >> >> >>>>>> >> > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>>>>> >> wrote:
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >> > > Druid is better in
>> >> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > ==========================
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > In this important scenario of realtime
>> alalytics,
>> >> > the
>> >> > >> >> reason
>> >> > >> >> >> >>>>>> here is
>> >> > >> >> >> >>>>>> >> that
>> >> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
>> >> > segment
>> >> > >> >> build,
>> >> > >> >> >> >>>>>> is that
>> >> > >> >> >> >>>>>> >> > > correct?
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
>> >> work-around
>> >> > of
>> >> > >> >> >> >>>>>> combination of
>> >> > >> >> >> >>>>>> >> :
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB
>> update) to
>> >> > >> provide
>> >> > >> >> >> >>>>>> >> > > realtime capability ?
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime
>> DB
>> >> > >> update)
>> >> > >> >> and
>> >> > >> >> >> >>>>>> >> integrate it
>> >> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> > >> >> >> xxyu@apache.org>
>> >> > >> >> >> >>>>>> wrote:
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years
>> ago(I
>> >> > don't
>> >> > >> >> know
>> >> > >> >> >> too
>> >> > >> >> >> >>>>>> much
>> >> > >> >> >> >>>>>> >> about
>> >> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> >> > >> features
>> >> > >> >> >> that I
>> >> > >> >> >> >>>>>> know
>> >> > >> >> >> >>>>>> >> are :
>> >> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > Here are some cases you should consider
>> using
>> >> > Druid
>> >> > >> >> other
>> >> > >> >> >> >>>>>> than Kylin
>> >> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to
>> compare
>> >> the
>> >> > >> >> Druid
>> >> > >> >> >> >>>>>> which I
>> >> > >> >> >> >>>>>> >> used
>> >> > >> >> >> >>>>>> >> > two
>> >> > >> >> >> >>>>>> >> > > > years ago):
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka
>> etc.
>> >> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
>> >> result,
>> >> > I
>> >> > >> >> think
>> >> > >> >> >> >>>>>> Druid had
>> >> > >> >> >> >>>>>> >> > > better
>> >> > >> >> >> >>>>>> >> > > > response time for small queries two years
>> ago.)
>> >> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
>> >> want to
>> >> > >> use
>> >> > >> >> the
>> >> > >> >> >> >>>>>> >> K8S/public
>> >> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment
>> platform.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
>> >> which
>> >> > >> Kylin
>> >> > >> >> >> could
>> >> > >> >> >> >>>>>> be
>> >> > >> >> >> >>>>>> >> better,
>> >> > >> >> >> >>>>>> >> > > > like:
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > - Better performance for complex/big
>> queries.
>> >> > Kylin
>> >> > >> can
>> >> > >> >> >> have
>> >> > >> >> >> >>>>>> a more
>> >> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
>> >> > >> >> >> >>>>>> >> > > >   Index for queries containing different
>> >> `Group By
>> >> > >> >> >> >>>>>> dimensions`.
>> >> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
>> >> moment)
>> >> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website
>> did
>> >> > not
>> >> > >> >> show
>> >> > >> >> >> it
>> >> > >> >> >> >>>>>> supports
>> >> > >> >> >> >>>>>> >> > ODBC
>> >> > >> >> >> >>>>>> >> > > > well)
>> >> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
>> >> than
>> >> > >> Druid.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
>> >> about
>> >> > >> it.
>> >> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share
>> your
>> >> > >> >> opinion.
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > ------------------------
>> >> > >> >> >> >>>>>> >> > > > With warm regard
>> >> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> > >> >> >> >>>>>> >> > > wrote:
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
>> >> platform
>> >> > >> Kylin
>> >> > >> >> >> >>>>>> compared to
>> >> > >> >> >> >>>>>> >> > Pinot
>> >> > >> >> >> >>>>>> >> > > >> and
>> >> > >> >> >> >>>>>> >> > > >> Druid?
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> Please kindly let me know
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> > >> >> >> >>>>>> >> > > >>
>> >> > >> >> >> >>>>>> >> > > >
>> >> > >> >> >> >>>>>> >> > >
>> >> > >> >> >> >>>>>> >> >
>> >> > >> >> >> >>>>>> >>
>> >> > >> >> >> >>>>>> >
>> >> > >> >> >> >>>>>>
>> >> > >> >> >> >>>>>
>> >> > >> >> >>
>> >> > >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Li Yang <li...@apache.org>.
Nam,

We are planning to release a kylin5-beta around March or April. The GA of
kylin5 would be around July this year if everything goes well.

Cheers
Yang

On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hello Xiaoxiang,
>
> How are you, my boss is very interested in Kylin 5. so he would like to
> know when Kylin 5 will be released...could you please provide an
> estimation?
>
> Thank you very much and best regards
>
>
>
>
>
> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:
>
> > Good morning Xiaoxiang, hope you are well
> >
> > 1. JDBC source is a feature which in development, it will be supported
> > later.
> >
> > ===============
> >
> > May I know when will the JDBC be available? as well as is there any
> change
> > in Kylin 5 release date
> >
> > Thank you and best regards
> >
> >
> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> >> 1. JDBC source is a feature which in development, it will be supported
> >> later.
> >>
> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> >> (I will let you know.)
> >>
> >> 3. I think ranger and Kerberos are not doing the same things, one for
> >> authentication, one for authorization. So they cannot replace each
> other.
> >> Ranger can integrate with Kerberos, please check ranger's website for
> >> information.
> >>
> >> ------------------------
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >>
> >> > Thank you Xiaoxiang for your reply
> >> >
> >> > ————————————-
> >> > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > ————————————-
> >> > Yes: please answer to help me clear this headache:
> >> >
> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
> >> If
> >> > not then do we have any work around?
> >> >
> >> > 2. My team is using kerberos for authentication, do you have any
> >> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> >> 5.x
> >> >
> >> > 3. Should we use apache ranger instead of kerberos for authentication
> >> and
> >> > for security purposes?
> >> >
> >> > Thank you again
> >> >
> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >
> >> > > I guess the release date should be 2024/01 .
> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > >
> >> > > ------------------------
> >> > > With warm regard
> >> > > Xiaoxiang Yu
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> > wrote:
> >> > >
> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
> >> > already
> >> > >> so there is no time for you to comment. Next time I will send you
> in
> >> > >> advance. The meeting result was that we will implement both druid
> and
> >> > >> kylin
> >> > >> in the next couple of projects because of its realtime feature.
> Hope
> >> > that
> >> > >> kylin will have same feature soon.
> >> > >>
> >> > >> May I ask when will you release kylin 5.0?
> >> > >>
> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> > >>
> >> > >> > Since 2018 there are a lot of new features and code refactor.
> >> > >> > If you like, you can share your ppt to me privately, maybe I can
> >> > >> > give some comments.
> >> > >> >
> >> > >> > Here is the reference of advantages of Kylin since 2018:
> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > >> > -
> >> > >> >
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> > >> >
> >> > >> > ------------------------
> >> > >> > With warm regard
> >> > >> > Xiaoxiang Yu
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> > >> wrote:
> >> > >> >
> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
> and
> >> > >> Druid in
> >> > >> >> my team.
> >> > >> >>
> >> > >> >> I found this article and would like you to update me the
> >> advantages
> >> > of
> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
> >> released)
> >> > >> >>
> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> >> 2)?
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> > >> >> >
> >> > >> >>
> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn>
> wrote:
> >> > >> >>
> >> > >> >> > Thank you very much for your prompt response, I still have
> >> several
> >> > >> >> > questions to seek for your help later.
> >> > >> >> >
> >> > >> >> > Best regards and have a good day
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Done. Github branch changed to kylin5.
> >> > >> >> >>
> >> > >> >> >> ------------------------
> >> > >> >> >> With warm regard
> >> > >> >> >> Xiaoxiang Yu
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <
> xxyu@apache.org>
> >> > >> wrote:
> >> > >> >> >>
> >> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > >> >> >> > ------------------------
> >> > >> >> >> > With warm regard
> >> > >> >> >> > Xiaoxiang Yu
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >> > <namdd@vnpay.vn.invalid
> >> > >> >
> >> > >> >> >> wrote:
> >> > >> >> >> >
> >> > >> >> >> >> Thank you Xiaoxiang, please update me when you have
> changed
> >> > your
> >> > >> >> >> default
> >> > >> >> >> >> branch. In case people are impressed by the numbers then I
> >> hope
> >> > >> to
> >> > >> >> turn
> >> > >> >> >> >> this situation to reverse direction.
> >> > >> >> >> >>
> >> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
> >> xxyu@apache.org>
> >> > >> >> wrote:
> >> > >> >> >> >>
> >> > >> >> >> >>> The default branch is for 4.X which is a maintained
> branch,
> >> > the
> >> > >> >> active
> >> > >> >> >> >>> branch is kylin5.
> >> > >> >> >> >>> I will change the default branch to kylin5 later.
> >> > >> >> >> >>>
> >> > >> >> >> >>> ------------------------
> >> > >> >> >> >>> With warm regard
> >> > >> >> >> >>> Xiaoxiang Yu
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> > >> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>> wrote:
> >> > >> >> >> >>>
> >> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Can you see the atttached photo
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> >> kylin
> >> > >> had
> >> > >> >> not
> >> > >> >> >> >>>> been committed since July
> >> > >> >> >> >>>>
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> > >> wrote:
> >> > >> >> >> >>>>
> >> > >> >> >> >>>>> I think so.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> Response time is not the only factor to make a
> decision.
> >> > Kylin
> >> > >> >> could
> >> > >> >> >> >>>>> be cheaper
> >> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> >> and
> >> > >> Kylin
> >> > >> >> >> can
> >> > >> >> >> >>>>> guarantee
> >> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in
> >> an
> >> > ad
> >> > >> hoc
> >> > >> >> >> >>>>> query scenario.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together
> to
> >> > >> provide
> >> > >> >> >> >>>>> unified data analytics services for their customers.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> ------------------------
> >> > >> >> >> >>>>> With warm regard
> >> > >> >> >> >>>>> Xiaoxiang Yu
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> >> > >> <namdd@vnpay.vn.invalid
> >> > >> >> >
> >> > >> >> >> >>>>> wrote:
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> In case my client uses cloud computing service like
> gcp
> >> or
> >> > >> aws,
> >> > >> >> >> which
> >> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> >> > clickhouse
> >> > >> >> >> (incase
> >> > >> >> >> >>>>>> of
> >> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
> >> been
> >> > >> done
> >> > >> >> once
> >> > >> >> >> >>>>>> and
> >> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses
> less
> >> > cloud
> >> > >> >> >> >>>>>> computation,
> >> > >> >> >> >>>>>> is that true)?
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> >> > xxyu@apache.org
> >> > >> >
> >> > >> >> >> wrote:
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> > Following text is part of an article(
> >> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>>
> >> > >> >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> ===============================================================================
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> >> > modes
> >> > >> >> >> because
> >> > >> >> >> >>>>>> of its
> >> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
> >> by,
> >> > and
> >> > >> >> where
> >> > >> >> >> >>>>>> condition
> >> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger
> the
> >> > data
> >> > >> >> >> volume
> >> > >> >> >> >>>>>> is, the
> >> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> >> > >> particular,
> >> > >> >> >> >>>>>> Kylin is
> >> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> >> de-emphasis
> >> > >> >> (count
> >> > >> >> >> >>>>>> distinct),
> >> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> >> advantages
> >> > in
> >> > >> >> >> >>>>>> de-weighting
> >> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
> >> scenarios
> >> > are
> >> > >> >> >> >>>>>> especially
> >> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> >> such
> >> > >> as
> >> > >> >> >> >>>>>> Dashboard, all
> >> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> >> > statistics,
> >> > >> >> and
> >> > >> >> >> user
> >> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
> >> etc.
> >> > use
> >> > >> >> Kylin
> >> > >> >> >> >>>>>> to build
> >> > >> >> >> >>>>>> > their data service platforms, providing millions to
> >> tens
> >> > of
> >> > >> >> >> >>>>>> millions of
> >> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> >> completed
> >> > >> >> within
> >> > >> >> >> 2
> >> > >> >> >> >>>>>> - 3
> >> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
> >> high
> >> > >> >> >> concurrency
> >> > >> >> >> >>>>>> > scenario.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has
> high
> >> > >> >> computing
> >> > >> >> >> >>>>>> power and
> >> > >> >> >> >>>>>> > is more suitable when the query request is more
> >> flexible,
> >> > >> or
> >> > >> >> when
> >> > >> >> >> >>>>>> there is
> >> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> >> > Scenarios
> >> > >> >> >> >>>>>> include: very
> >> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> >> > combined
> >> > >> >> with
> >> > >> >> >> the
> >> > >> >> >> >>>>>> user
> >> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency
> of
> >> > >> complex
> >> > >> >> >> >>>>>> on-the-spot
> >> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> >> > large,
> >> > >> >> you
> >> > >> >> >> >>>>>> need to
> >> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> >> > higher
> >> > >> >> >> >>>>>> challenge for
> >> > >> >> >> >>>>>> > operation and maintenance.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it
> >> is
> >> > >> more
> >> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> >> number
> >> > >> of
> >> > >> >> >> >>>>>> queries is
> >> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> >> computational
> >> > >> >> >> >>>>>> resources, it is
> >> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> >> > fixed
> >> > >> >> >> pattern
> >> > >> >> >> >>>>>> and the
> >> > >> >> >> >>>>>> > query volume is large, it is more suitable for
> Kylin,
> >> > >> because
> >> > >> >> the
> >> > >> >> >> >>>>>> query
> >> > >> >> >> >>>>>> > volume is large, and by using large computational
> >> > >> resources to
> >> > >> >> >> save
> >> > >> >> >> >>>>>> the
> >> > >> >> >> >>>>>> > results, the upfront computational cost can be
> >> amortized
> >> > >> over
> >> > >> >> >> each
> >> > >> >> >> >>>>>> query,
> >> > >> >> >> >>>>>> > so it is the most economical.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > ------------------------
> >> > >> >> >> >>>>>> > With warm regard
> >> > >> >> >> >>>>>> > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> > >> >> <namdd@vnpay.vn.invalid
> >> > >> >> >> >
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time
> streaming
> >> > >> feature.
> >> > >> >> >> >>>>>> That's
> >> > >> >> >> >>>>>> >> great.
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> >> team:
> >> > >> >> >> clickhouse
> >> > >> >> >> >>>>>> offered
> >> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> >> > millisecond
> >> > >> >> which
> >> > >> >> >> is
> >> > >> >> >> >>>>>> faster
> >> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
> >> calculating 1
> >> > >> >> billion
> >> > >> >> >> >>>>>> rows in
> >> > >> >> >> >>>>>> >> 2.9
> >> > >> >> >> >>>>>> >> seconds)
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin
> over
> >> > >> >> clickhouse
> >> > >> >> >> so
> >> > >> >> >> >>>>>> that I
> >> > >> >> >> >>>>>> >> can defend my demonstration.
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> >> > >> xxyu@apache.org
> >> > >> >> >
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> >> analytics,
> >> > >> the
> >> > >> >> >> reason
> >> > >> >> >> >>>>>> here is
> >> > >> >> >> >>>>>> >> > that
> >> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> >> segment
> >> > >> >> build,
> >> > >> >> >> is
> >> > >> >> >> >>>>>> that
> >> > >> >> >> >>>>>> >> > correct?"
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > You are correct.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> >> > work-around
> >> > >> of
> >> > >> >> >> >>>>>> combination
> >> > >> >> >> >>>>>> >> of
> >> > >> >> >> >>>>>> >> > ... "
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
> >> streaming(coding is
> >> > >> >> >> completed
> >> > >> >> >> >>>>>> but not
> >> > >> >> >> >>>>>> >> > released),
> >> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
> >> minutes(that is
> >> > >> my
> >> > >> >> >> >>>>>> estimation
> >> > >> >> >> >>>>>> >> but I
> >> > >> >> >> >>>>>> >> > am
> >> > >> >> >> >>>>>> >> > quite certain about it).
> >> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a
> job
> >> and
> >> > >> do
> >> > >> >> >> >>>>>> micro-batch
> >> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
> >> price is
> >> > >> that
> >> > >> >> >> you
> >> > >> >> >> >>>>>> need to
> >> > >> >> >> >>>>>> >> run
> >> > >> >> >> >>>>>> >> > and monitor a long-running
> >> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming,
> so
> >> you
> >> > >> need
> >> > >> >> >> >>>>>> knowledge of
> >> > >> >> >> >>>>>> >> > it.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
> >> your
> >> > >> >> customers
> >> > >> >> >> >>>>>> >> > can tolerate?
> >> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok
> for
> >> > most
> >> > >> >> >> cases.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > ------------------------
> >> > >> >> >> >>>>>> >> > With warm regard
> >> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>>>>> >> wrote:
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > > Druid is better in
> >> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > ==========================
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > In this important scenario of realtime
> alalytics,
> >> > the
> >> > >> >> reason
> >> > >> >> >> >>>>>> here is
> >> > >> >> >> >>>>>> >> that
> >> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> >> > segment
> >> > >> >> build,
> >> > >> >> >> >>>>>> is that
> >> > >> >> >> >>>>>> >> > > correct?
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> >> work-around
> >> > of
> >> > >> >> >> >>>>>> combination of
> >> > >> >> >> >>>>>> >> :
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update)
> to
> >> > >> provide
> >> > >> >> >> >>>>>> >> > > realtime capability ?
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime
> DB
> >> > >> update)
> >> > >> >> and
> >> > >> >> >> >>>>>> >> integrate it
> >> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> > >> >> >> xxyu@apache.org>
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> >> > don't
> >> > >> >> know
> >> > >> >> >> too
> >> > >> >> >> >>>>>> much
> >> > >> >> >> >>>>>> >> about
> >> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> >> > >> features
> >> > >> >> >> that I
> >> > >> >> >> >>>>>> know
> >> > >> >> >> >>>>>> >> are :
> >> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> >> > Druid
> >> > >> >> other
> >> > >> >> >> >>>>>> than Kylin
> >> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to
> compare
> >> the
> >> > >> >> Druid
> >> > >> >> >> >>>>>> which I
> >> > >> >> >> >>>>>> >> used
> >> > >> >> >> >>>>>> >> > two
> >> > >> >> >> >>>>>> >> > > > years ago):
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> >> result,
> >> > I
> >> > >> >> think
> >> > >> >> >> >>>>>> Druid had
> >> > >> >> >> >>>>>> >> > > better
> >> > >> >> >> >>>>>> >> > > > response time for small queries two years
> ago.)
> >> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
> >> want to
> >> > >> use
> >> > >> >> the
> >> > >> >> >> >>>>>> >> K8S/public
> >> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
> >> which
> >> > >> Kylin
> >> > >> >> >> could
> >> > >> >> >> >>>>>> be
> >> > >> >> >> >>>>>> >> better,
> >> > >> >> >> >>>>>> >> > > > like:
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> >> > Kylin
> >> > >> can
> >> > >> >> >> have
> >> > >> >> >> >>>>>> a more
> >> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> > >> >> >> >>>>>> >> > > >   Index for queries containing different
> >> `Group By
> >> > >> >> >> >>>>>> dimensions`.
> >> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> >> moment)
> >> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website
> did
> >> > not
> >> > >> >> show
> >> > >> >> >> it
> >> > >> >> >> >>>>>> supports
> >> > >> >> >> >>>>>> >> > ODBC
> >> > >> >> >> >>>>>> >> > > > well)
> >> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
> >> than
> >> > >> Druid.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> >> about
> >> > >> it.
> >> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share
> your
> >> > >> >> opinion.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > ------------------------
> >> > >> >> >> >>>>>> >> > > > With warm regard
> >> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>>>>> >> > > wrote:
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
> >> platform
> >> > >> Kylin
> >> > >> >> >> >>>>>> compared to
> >> > >> >> >> >>>>>> >> > Pinot
> >> > >> >> >> >>>>>> >> > > >> and
> >> > >> >> >> >>>>>> >> > > >> Druid?
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >>
> >> > >> >> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Li Yang <li...@apache.org>.
Nam,

We are planning to release a kylin5-beta around March or April. The GA of
kylin5 would be around July this year if everything goes well.

Cheers
Yang

On Tue, Mar 5, 2024 at 6:54 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hello Xiaoxiang,
>
> How are you, my boss is very interested in Kylin 5. so he would like to
> know when Kylin 5 will be released...could you please provide an
> estimation?
>
> Thank you very much and best regards
>
>
>
>
>
> On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:
>
> > Good morning Xiaoxiang, hope you are well
> >
> > 1. JDBC source is a feature which in development, it will be supported
> > later.
> >
> > ===============
> >
> > May I know when will the JDBC be available? as well as is there any
> change
> > in Kylin 5 release date
> >
> > Thank you and best regards
> >
> >
> > On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> >> 1. JDBC source is a feature which in development, it will be supported
> >> later.
> >>
> >> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> >> (I will let you know.)
> >>
> >> 3. I think ranger and Kerberos are not doing the same things, one for
> >> authentication, one for authorization. So they cannot replace each
> other.
> >> Ranger can integrate with Kerberos, please check ranger's website for
> >> information.
> >>
> >> ------------------------
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >>
> >> > Thank you Xiaoxiang for your reply
> >> >
> >> > ————————————-
> >> > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > ————————————-
> >> > Yes: please answer to help me clear this headache:
> >> >
> >> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
> >> If
> >> > not then do we have any work around?
> >> >
> >> > 2. My team is using kerberos for authentication, do you have any
> >> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> >> 5.x
> >> >
> >> > 3. Should we use apache ranger instead of kerberos for authentication
> >> and
> >> > for security purposes?
> >> >
> >> > Thank you again
> >> >
> >> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >
> >> > > I guess the release date should be 2024/01 .
> >> > > Do you have any suggestions/wishes for kylin 5(except real-time
> >> feature)?
> >> > >
> >> > > ------------------------
> >> > > With warm regard
> >> > > Xiaoxiang Yu
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> > wrote:
> >> > >
> >> > >> Thank you very much xiaoxiang, I did the presentation this morning
> >> > already
> >> > >> so there is no time for you to comment. Next time I will send you
> in
> >> > >> advance. The meeting result was that we will implement both druid
> and
> >> > >> kylin
> >> > >> in the next couple of projects because of its realtime feature.
> Hope
> >> > that
> >> > >> kylin will have same feature soon.
> >> > >>
> >> > >> May I ask when will you release kylin 5.0?
> >> > >>
> >> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> > >>
> >> > >> > Since 2018 there are a lot of new features and code refactor.
> >> > >> > If you like, you can share your ppt to me privately, maybe I can
> >> > >> > give some comments.
> >> > >> >
> >> > >> > Here is the reference of advantages of Kylin since 2018:
> >> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > >> > -
> >> > >> >
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> > >> >
> >> > >> > ------------------------
> >> > >> > With warm regard
> >> > >> > Xiaoxiang Yu
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> > >> wrote:
> >> > >> >
> >> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin
> and
> >> > >> Druid in
> >> > >> >> my team.
> >> > >> >>
> >> > >> >> I found this article and would like you to update me the
> >> advantages
> >> > of
> >> > >> >> Kylin since 2018 until now (especially with version 5 to be
> >> released)
> >> > >> >>
> >> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> >> 2)?
> >> > >> >> <
> >> > >> >>
> >> > >>
> >> >
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> > >> >> >
> >> > >> >>
> >> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn>
> wrote:
> >> > >> >>
> >> > >> >> > Thank you very much for your prompt response, I still have
> >> several
> >> > >> >> > questions to seek for your help later.
> >> > >> >> >
> >> > >> >> > Best regards and have a good day
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Done. Github branch changed to kylin5.
> >> > >> >> >>
> >> > >> >> >> ------------------------
> >> > >> >> >> With warm regard
> >> > >> >> >> Xiaoxiang Yu
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <
> xxyu@apache.org>
> >> > >> wrote:
> >> > >> >> >>
> >> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > >> >> >> > ------------------------
> >> > >> >> >> > With warm regard
> >> > >> >> >> > Xiaoxiang Yu
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> >> > <namdd@vnpay.vn.invalid
> >> > >> >
> >> > >> >> >> wrote:
> >> > >> >> >> >
> >> > >> >> >> >> Thank you Xiaoxiang, please update me when you have
> changed
> >> > your
> >> > >> >> >> default
> >> > >> >> >> >> branch. In case people are impressed by the numbers then I
> >> hope
> >> > >> to
> >> > >> >> turn
> >> > >> >> >> >> this situation to reverse direction.
> >> > >> >> >> >>
> >> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
> >> xxyu@apache.org>
> >> > >> >> wrote:
> >> > >> >> >> >>
> >> > >> >> >> >>> The default branch is for 4.X which is a maintained
> branch,
> >> > the
> >> > >> >> active
> >> > >> >> >> >>> branch is kylin5.
> >> > >> >> >> >>> I will change the default branch to kylin5 later.
> >> > >> >> >> >>>
> >> > >> >> >> >>> ------------------------
> >> > >> >> >> >>> With warm regard
> >> > >> >> >> >>> Xiaoxiang Yu
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>>
> >> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> > >> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>> wrote:
> >> > >> >> >> >>>
> >> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> Can you see the atttached photo
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> >> kylin
> >> > >> had
> >> > >> >> not
> >> > >> >> >> >>>> been committed since July
> >> > >> >> >> >>>>
> >> > >> >> >> >>>>
> >> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> > >> wrote:
> >> > >> >> >> >>>>
> >> > >> >> >> >>>>> I think so.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> Response time is not the only factor to make a
> decision.
> >> > Kylin
> >> > >> >> could
> >> > >> >> >> >>>>> be cheaper
> >> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> >> and
> >> > >> Kylin
> >> > >> >> >> can
> >> > >> >> >> >>>>> guarantee
> >> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in
> >> an
> >> > ad
> >> > >> hoc
> >> > >> >> >> >>>>> query scenario.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together
> to
> >> > >> provide
> >> > >> >> >> >>>>> unified data analytics services for their customers.
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> ------------------------
> >> > >> >> >> >>>>> With warm regard
> >> > >> >> >> >>>>> Xiaoxiang Yu
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> >> > >> <namdd@vnpay.vn.invalid
> >> > >> >> >
> >> > >> >> >> >>>>> wrote:
> >> > >> >> >> >>>>>
> >> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> In case my client uses cloud computing service like
> gcp
> >> or
> >> > >> aws,
> >> > >> >> >> which
> >> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> >> > clickhouse
> >> > >> >> >> (incase
> >> > >> >> >> >>>>>> of
> >> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
> >> been
> >> > >> done
> >> > >> >> once
> >> > >> >> >> >>>>>> and
> >> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses
> less
> >> > cloud
> >> > >> >> >> >>>>>> computation,
> >> > >> >> >> >>>>>> is that true)?
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> >> > xxyu@apache.org
> >> > >> >
> >> > >> >> >> wrote:
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>> > Following text is part of an article(
> >> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>>
> >> > >> >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> ===============================================================================
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> >> > modes
> >> > >> >> >> because
> >> > >> >> >> >>>>>> of its
> >> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
> >> by,
> >> > and
> >> > >> >> where
> >> > >> >> >> >>>>>> condition
> >> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger
> the
> >> > data
> >> > >> >> >> volume
> >> > >> >> >> >>>>>> is, the
> >> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> >> > >> particular,
> >> > >> >> >> >>>>>> Kylin is
> >> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> >> de-emphasis
> >> > >> >> (count
> >> > >> >> >> >>>>>> distinct),
> >> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> >> advantages
> >> > in
> >> > >> >> >> >>>>>> de-weighting
> >> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
> >> scenarios
> >> > are
> >> > >> >> >> >>>>>> especially
> >> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> >> such
> >> > >> as
> >> > >> >> >> >>>>>> Dashboard, all
> >> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> >> > statistics,
> >> > >> >> and
> >> > >> >> >> user
> >> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
> >> etc.
> >> > use
> >> > >> >> Kylin
> >> > >> >> >> >>>>>> to build
> >> > >> >> >> >>>>>> > their data service platforms, providing millions to
> >> tens
> >> > of
> >> > >> >> >> >>>>>> millions of
> >> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> >> completed
> >> > >> >> within
> >> > >> >> >> 2
> >> > >> >> >> >>>>>> - 3
> >> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
> >> high
> >> > >> >> >> concurrency
> >> > >> >> >> >>>>>> > scenario.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has
> high
> >> > >> >> computing
> >> > >> >> >> >>>>>> power and
> >> > >> >> >> >>>>>> > is more suitable when the query request is more
> >> flexible,
> >> > >> or
> >> > >> >> when
> >> > >> >> >> >>>>>> there is
> >> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> >> > Scenarios
> >> > >> >> >> >>>>>> include: very
> >> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> >> > combined
> >> > >> >> with
> >> > >> >> >> the
> >> > >> >> >> >>>>>> user
> >> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency
> of
> >> > >> complex
> >> > >> >> >> >>>>>> on-the-spot
> >> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> >> > large,
> >> > >> >> you
> >> > >> >> >> >>>>>> need to
> >> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> >> > higher
> >> > >> >> >> >>>>>> challenge for
> >> > >> >> >> >>>>>> > operation and maintenance.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it
> >> is
> >> > >> more
> >> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> >> number
> >> > >> of
> >> > >> >> >> >>>>>> queries is
> >> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> >> computational
> >> > >> >> >> >>>>>> resources, it is
> >> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> >> > fixed
> >> > >> >> >> pattern
> >> > >> >> >> >>>>>> and the
> >> > >> >> >> >>>>>> > query volume is large, it is more suitable for
> Kylin,
> >> > >> because
> >> > >> >> the
> >> > >> >> >> >>>>>> query
> >> > >> >> >> >>>>>> > volume is large, and by using large computational
> >> > >> resources to
> >> > >> >> >> save
> >> > >> >> >> >>>>>> the
> >> > >> >> >> >>>>>> > results, the upfront computational cost can be
> >> amortized
> >> > >> over
> >> > >> >> >> each
> >> > >> >> >> >>>>>> query,
> >> > >> >> >> >>>>>> > so it is the most economical.
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > ------------------------
> >> > >> >> >> >>>>>> > With warm regard
> >> > >> >> >> >>>>>> > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> > >> >> <namdd@vnpay.vn.invalid
> >> > >> >> >> >
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time
> streaming
> >> > >> feature.
> >> > >> >> >> >>>>>> That's
> >> > >> >> >> >>>>>> >> great.
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> >> team:
> >> > >> >> >> clickhouse
> >> > >> >> >> >>>>>> offered
> >> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> >> > millisecond
> >> > >> >> which
> >> > >> >> >> is
> >> > >> >> >> >>>>>> faster
> >> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
> >> calculating 1
> >> > >> >> billion
> >> > >> >> >> >>>>>> rows in
> >> > >> >> >> >>>>>> >> 2.9
> >> > >> >> >> >>>>>> >> seconds)
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin
> over
> >> > >> >> clickhouse
> >> > >> >> >> so
> >> > >> >> >> >>>>>> that I
> >> > >> >> >> >>>>>> >> can defend my demonstration.
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> >> > >> xxyu@apache.org
> >> > >> >> >
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> >> analytics,
> >> > >> the
> >> > >> >> >> reason
> >> > >> >> >> >>>>>> here is
> >> > >> >> >> >>>>>> >> > that
> >> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> >> segment
> >> > >> >> build,
> >> > >> >> >> is
> >> > >> >> >> >>>>>> that
> >> > >> >> >> >>>>>> >> > correct?"
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > You are correct.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> >> > work-around
> >> > >> of
> >> > >> >> >> >>>>>> combination
> >> > >> >> >> >>>>>> >> of
> >> > >> >> >> >>>>>> >> > ... "
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
> >> streaming(coding is
> >> > >> >> >> completed
> >> > >> >> >> >>>>>> but not
> >> > >> >> >> >>>>>> >> > released),
> >> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
> >> minutes(that is
> >> > >> my
> >> > >> >> >> >>>>>> estimation
> >> > >> >> >> >>>>>> >> but I
> >> > >> >> >> >>>>>> >> > am
> >> > >> >> >> >>>>>> >> > quite certain about it).
> >> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a
> job
> >> and
> >> > >> do
> >> > >> >> >> >>>>>> micro-batch
> >> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
> >> price is
> >> > >> that
> >> > >> >> >> you
> >> > >> >> >> >>>>>> need to
> >> > >> >> >> >>>>>> >> run
> >> > >> >> >> >>>>>> >> > and monitor a long-running
> >> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming,
> so
> >> you
> >> > >> need
> >> > >> >> >> >>>>>> knowledge of
> >> > >> >> >> >>>>>> >> > it.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
> >> your
> >> > >> >> customers
> >> > >> >> >> >>>>>> >> > can tolerate?
> >> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok
> for
> >> > most
> >> > >> >> >> cases.
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > ------------------------
> >> > >> >> >> >>>>>> >> > With warm regard
> >> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>>>>> >> wrote:
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >> > > Druid is better in
> >> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > ==========================
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > In this important scenario of realtime
> alalytics,
> >> > the
> >> > >> >> reason
> >> > >> >> >> >>>>>> here is
> >> > >> >> >> >>>>>> >> that
> >> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> >> > segment
> >> > >> >> build,
> >> > >> >> >> >>>>>> is that
> >> > >> >> >> >>>>>> >> > > correct?
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> >> work-around
> >> > of
> >> > >> >> >> >>>>>> combination of
> >> > >> >> >> >>>>>> >> :
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update)
> to
> >> > >> provide
> >> > >> >> >> >>>>>> >> > > realtime capability ?
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime
> DB
> >> > >> update)
> >> > >> >> and
> >> > >> >> >> >>>>>> >> integrate it
> >> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> > >> >> >> xxyu@apache.org>
> >> > >> >> >> >>>>>> wrote:
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> >> > don't
> >> > >> >> know
> >> > >> >> >> too
> >> > >> >> >> >>>>>> much
> >> > >> >> >> >>>>>> >> about
> >> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> >> > >> features
> >> > >> >> >> that I
> >> > >> >> >> >>>>>> know
> >> > >> >> >> >>>>>> >> are :
> >> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> >> > Druid
> >> > >> >> other
> >> > >> >> >> >>>>>> than Kylin
> >> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to
> compare
> >> the
> >> > >> >> Druid
> >> > >> >> >> >>>>>> which I
> >> > >> >> >> >>>>>> >> used
> >> > >> >> >> >>>>>> >> > two
> >> > >> >> >> >>>>>> >> > > > years ago):
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> >> result,
> >> > I
> >> > >> >> think
> >> > >> >> >> >>>>>> Druid had
> >> > >> >> >> >>>>>> >> > > better
> >> > >> >> >> >>>>>> >> > > > response time for small queries two years
> ago.)
> >> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
> >> want to
> >> > >> use
> >> > >> >> the
> >> > >> >> >> >>>>>> >> K8S/public
> >> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
> >> which
> >> > >> Kylin
> >> > >> >> >> could
> >> > >> >> >> >>>>>> be
> >> > >> >> >> >>>>>> >> better,
> >> > >> >> >> >>>>>> >> > > > like:
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> >> > Kylin
> >> > >> can
> >> > >> >> >> have
> >> > >> >> >> >>>>>> a more
> >> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> > >> >> >> >>>>>> >> > > >   Index for queries containing different
> >> `Group By
> >> > >> >> >> >>>>>> dimensions`.
> >> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> >> moment)
> >> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website
> did
> >> > not
> >> > >> >> show
> >> > >> >> >> it
> >> > >> >> >> >>>>>> supports
> >> > >> >> >> >>>>>> >> > ODBC
> >> > >> >> >> >>>>>> >> > > > well)
> >> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
> >> than
> >> > >> Druid.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> >> about
> >> > >> it.
> >> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share
> your
> >> > >> >> opinion.
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > ------------------------
> >> > >> >> >> >>>>>> >> > > > With warm regard
> >> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> > >> >> >> >>>>>> >> > > wrote:
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
> >> platform
> >> > >> Kylin
> >> > >> >> >> >>>>>> compared to
> >> > >> >> >> >>>>>> >> > Pinot
> >> > >> >> >> >>>>>> >> > > >> and
> >> > >> >> >> >>>>>> >> > > >> Druid?
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> > >> >> >> >>>>>> >> > > >>
> >> > >> >> >> >>>>>> >> > > >
> >> > >> >> >> >>>>>> >> > >
> >> > >> >> >> >>>>>> >> >
> >> > >> >> >> >>>>>> >>
> >> > >> >> >> >>>>>> >
> >> > >> >> >> >>>>>>
> >> > >> >> >> >>>>>
> >> > >> >> >>
> >> > >> >> >
> >> > >> >>
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Hello Xiaoxiang,

How are you, my boss is very interested in Kylin 5. so he would like to
know when Kylin 5 will be released...could you please provide an estimation?

Thank you very much and best regards





On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:

> Good morning Xiaoxiang, hope you are well
>
> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> ===============
>
> May I know when will the JDBC be available? as well as is there any change
> in Kylin 5 release date
>
> Thank you and best regards
>
>
> On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> 1. JDBC source is a feature which in development, it will be supported
>> later.
>>
>> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> (I will let you know.)
>>
>> 3. I think ranger and Kerberos are not doing the same things, one for
>> authentication, one for authorization. So they cannot replace each other.
>> Ranger can integrate with Kerberos, please check ranger's website for
>> information.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>> > Thank you Xiaoxiang for your reply
>> >
>> > ————————————-
>> > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > ————————————-
>> > Yes: please answer to help me clear this headache:
>> >
>> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
>> If
>> > not then do we have any work around?
>> >
>> > 2. My team is using kerberos for authentication, do you have any
>> > document/casestudy about integrating kerberos with kylin 4.x and kylin
>> 5.x
>> >
>> > 3. Should we use apache ranger instead of kerberos for authentication
>> and
>> > for security purposes?
>> >
>> > Thank you again
>> >
>> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> > > I guess the release date should be 2024/01 .
>> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > >
>> > > ------------------------
>> > > With warm regard
>> > > Xiaoxiang Yu
>> > >
>> > >
>> > >
>> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > wrote:
>> > >
>> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> > already
>> > >> so there is no time for you to comment. Next time I will send you in
>> > >> advance. The meeting result was that we will implement both druid and
>> > >> kylin
>> > >> in the next couple of projects because of its realtime feature. Hope
>> > that
>> > >> kylin will have same feature soon.
>> > >>
>> > >> May I ask when will you release kylin 5.0?
>> > >>
>> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> > >>
>> > >> > Since 2018 there are a lot of new features and code refactor.
>> > >> > If you like, you can share your ppt to me privately, maybe I can
>> > >> > give some comments.
>> > >> >
>> > >> > Here is the reference of advantages of Kylin since 2018:
>> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > >> > -
>> > >> >
>> > >>
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> > >> >
>> > >> > ------------------------
>> > >> > With warm regard
>> > >> > Xiaoxiang Yu
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > >> wrote:
>> > >> >
>> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> > >> Druid in
>> > >> >> my team.
>> > >> >>
>> > >> >> I found this article and would like you to update me the
>> advantages
>> > of
>> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> released)
>> > >> >>
>> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
>> 2)?
>> > >> >> <
>> > >> >>
>> > >>
>> >
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> > >> >> >
>> > >> >>
>> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>> > >> >>
>> > >> >> > Thank you very much for your prompt response, I still have
>> several
>> > >> >> > questions to seek for your help later.
>> > >> >> >
>> > >> >> > Best regards and have a good day
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
>> > wrote:
>> > >> >> >
>> > >> >> >> Done. Github branch changed to kylin5.
>> > >> >> >>
>> > >> >> >> ------------------------
>> > >> >> >> With warm regard
>> > >> >> >> Xiaoxiang Yu
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
>> > >> wrote:
>> > >> >> >>
>> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > >> >> >> > ------------------------
>> > >> >> >> > With warm regard
>> > >> >> >> > Xiaoxiang Yu
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>> > <namdd@vnpay.vn.invalid
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >
>> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
>> > your
>> > >> >> >> default
>> > >> >> >> >> branch. In case people are impressed by the numbers then I
>> hope
>> > >> to
>> > >> >> turn
>> > >> >> >> >> this situation to reverse direction.
>> > >> >> >> >>
>> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
>> xxyu@apache.org>
>> > >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
>> > the
>> > >> >> active
>> > >> >> >> >>> branch is kylin5.
>> > >> >> >> >>> I will change the default branch to kylin5 later.
>> > >> >> >> >>>
>> > >> >> >> >>> ------------------------
>> > >> >> >> >>> With warm regard
>> > >> >> >> >>> Xiaoxiang Yu
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> > >> <na...@vnpay.vn.invalid>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you see the atttached photo
>> > >> >> >> >>>>
>> > >> >> >> >>>> My boss asked that why druid commit code regularly but
>> kylin
>> > >> had
>> > >> >> not
>> > >> >> >> >>>> been committed since July
>> > >> >> >> >>>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xxyu@apache.org
>> >
>> > >> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>>> I think so.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Response time is not the only factor to make a decision.
>> > Kylin
>> > >> >> could
>> > >> >> >> >>>>> be cheaper
>> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
>> and
>> > >> Kylin
>> > >> >> >> can
>> > >> >> >> >>>>> guarantee
>> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in
>> an
>> > ad
>> > >> hoc
>> > >> >> >> >>>>> query scenario.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
>> > >> provide
>> > >> >> >> >>>>> unified data analytics services for their customers.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> ------------------------
>> > >> >> >> >>>>> With warm regard
>> > >> >> >> >>>>> Xiaoxiang Yu
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> > >> <namdd@vnpay.vn.invalid
>> > >> >> >
>> > >> >> >> >>>>> wrote:
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
>> or
>> > >> aws,
>> > >> >> >> which
>> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
>> > clickhouse
>> > >> >> >> (incase
>> > >> >> >> >>>>>> of
>> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
>> been
>> > >> done
>> > >> >> once
>> > >> >> >> >>>>>> and
>> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
>> > cloud
>> > >> >> >> >>>>>> computation,
>> > >> >> >> >>>>>> is that true)?
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
>> > xxyu@apache.org
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> > Following text is part of an article(
>> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> ===============================================================================
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
>> > modes
>> > >> >> >> because
>> > >> >> >> >>>>>> of its
>> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
>> by,
>> > and
>> > >> >> where
>> > >> >> >> >>>>>> condition
>> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
>> > data
>> > >> >> >> volume
>> > >> >> >> >>>>>> is, the
>> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> > >> particular,
>> > >> >> >> >>>>>> Kylin is
>> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
>> de-emphasis
>> > >> >> (count
>> > >> >> >> >>>>>> distinct),
>> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
>> advantages
>> > in
>> > >> >> >> >>>>>> de-weighting
>> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
>> scenarios
>> > are
>> > >> >> >> >>>>>> especially
>> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
>> such
>> > >> as
>> > >> >> >> >>>>>> Dashboard, all
>> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
>> > statistics,
>> > >> >> and
>> > >> >> >> user
>> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
>> etc.
>> > use
>> > >> >> Kylin
>> > >> >> >> >>>>>> to build
>> > >> >> >> >>>>>> > their data service platforms, providing millions to
>> tens
>> > of
>> > >> >> >> >>>>>> millions of
>> > >> >> >> >>>>>> > queries per day, and most of the queries can be
>> completed
>> > >> >> within
>> > >> >> >> 2
>> > >> >> >> >>>>>> - 3
>> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
>> high
>> > >> >> >> concurrency
>> > >> >> >> >>>>>> > scenario.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> > >> >> computing
>> > >> >> >> >>>>>> power and
>> > >> >> >> >>>>>> > is more suitable when the query request is more
>> flexible,
>> > >> or
>> > >> >> when
>> > >> >> >> >>>>>> there is
>> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
>> > Scenarios
>> > >> >> >> >>>>>> include: very
>> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
>> > combined
>> > >> >> with
>> > >> >> >> the
>> > >> >> >> >>>>>> user
>> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
>> > >> complex
>> > >> >> >> >>>>>> on-the-spot
>> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
>> > large,
>> > >> >> you
>> > >> >> >> >>>>>> need to
>> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
>> > higher
>> > >> >> >> >>>>>> challenge for
>> > >> >> >> >>>>>> > operation and maintenance.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it
>> is
>> > >> more
>> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
>> number
>> > >> of
>> > >> >> >> >>>>>> queries is
>> > >> >> >> >>>>>> > small, even if each query consumes a lot of
>> computational
>> > >> >> >> >>>>>> resources, it is
>> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
>> > fixed
>> > >> >> >> pattern
>> > >> >> >> >>>>>> and the
>> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
>> > >> because
>> > >> >> the
>> > >> >> >> >>>>>> query
>> > >> >> >> >>>>>> > volume is large, and by using large computational
>> > >> resources to
>> > >> >> >> save
>> > >> >> >> >>>>>> the
>> > >> >> >> >>>>>> > results, the upfront computational cost can be
>> amortized
>> > >> over
>> > >> >> >> each
>> > >> >> >> >>>>>> query,
>> > >> >> >> >>>>>> > so it is the most economical.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > ------------------------
>> > >> >> >> >>>>>> > With warm regard
>> > >> >> >> >>>>>> > Xiaoxiang Yu
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> > >> >> <namdd@vnpay.vn.invalid
>> > >> >> >> >
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
>> > >> feature.
>> > >> >> >> >>>>>> That's
>> > >> >> >> >>>>>> >> great.
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
>> team:
>> > >> >> >> clickhouse
>> > >> >> >> >>>>>> offered
>> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
>> > millisecond
>> > >> >> which
>> > >> >> >> is
>> > >> >> >> >>>>>> faster
>> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
>> calculating 1
>> > >> >> billion
>> > >> >> >> >>>>>> rows in
>> > >> >> >> >>>>>> >> 2.9
>> > >> >> >> >>>>>> >> seconds)
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> > >> >> clickhouse
>> > >> >> >> so
>> > >> >> >> >>>>>> that I
>> > >> >> >> >>>>>> >> can defend my demonstration.
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> > >> xxyu@apache.org
>> > >> >> >
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
>> analytics,
>> > >> the
>> > >> >> >> reason
>> > >> >> >> >>>>>> here is
>> > >> >> >> >>>>>> >> > that
>> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
>> segment
>> > >> >> build,
>> > >> >> >> is
>> > >> >> >> >>>>>> that
>> > >> >> >> >>>>>> >> > correct?"
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > You are correct.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
>> > work-around
>> > >> of
>> > >> >> >> >>>>>> combination
>> > >> >> >> >>>>>> >> of
>> > >> >> >> >>>>>> >> > ... "
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
>> streaming(coding is
>> > >> >> >> completed
>> > >> >> >> >>>>>> but not
>> > >> >> >> >>>>>> >> > released),
>> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
>> minutes(that is
>> > >> my
>> > >> >> >> >>>>>> estimation
>> > >> >> >> >>>>>> >> but I
>> > >> >> >> >>>>>> >> > am
>> > >> >> >> >>>>>> >> > quite certain about it).
>> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
>> and
>> > >> do
>> > >> >> >> >>>>>> micro-batch
>> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
>> price is
>> > >> that
>> > >> >> >> you
>> > >> >> >> >>>>>> need to
>> > >> >> >> >>>>>> >> run
>> > >> >> >> >>>>>> >> > and monitor a long-running
>> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
>> you
>> > >> need
>> > >> >> >> >>>>>> knowledge of
>> > >> >> >> >>>>>> >> > it.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
>> your
>> > >> >> customers
>> > >> >> >> >>>>>> >> > can tolerate?
>> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
>> > most
>> > >> >> >> cases.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > ------------------------
>> > >> >> >> >>>>>> >> > With warm regard
>> > >> >> >> >>>>>> >> > Xiaoxiang Yu
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> > >> >> >> >>>>>> >> wrote:
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > > Druid is better in
>> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > ==========================
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
>> > the
>> > >> >> reason
>> > >> >> >> >>>>>> here is
>> > >> >> >> >>>>>> >> that
>> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
>> > segment
>> > >> >> build,
>> > >> >> >> >>>>>> is that
>> > >> >> >> >>>>>> >> > > correct?
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
>> work-around
>> > of
>> > >> >> >> >>>>>> combination of
>> > >> >> >> >>>>>> >> :
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
>> > >> provide
>> > >> >> >> >>>>>> >> > > realtime capability ?
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
>> > >> update)
>> > >> >> and
>> > >> >> >> >>>>>> >> integrate it
>> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> > >> >> >> xxyu@apache.org>
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
>> > don't
>> > >> >> know
>> > >> >> >> too
>> > >> >> >> >>>>>> much
>> > >> >> >> >>>>>> >> about
>> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> > >> features
>> > >> >> >> that I
>> > >> >> >> >>>>>> know
>> > >> >> >> >>>>>> >> are :
>> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
>> > Druid
>> > >> >> other
>> > >> >> >> >>>>>> than Kylin
>> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
>> the
>> > >> >> Druid
>> > >> >> >> >>>>>> which I
>> > >> >> >> >>>>>> >> used
>> > >> >> >> >>>>>> >> > two
>> > >> >> >> >>>>>> >> > > > years ago):
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
>> result,
>> > I
>> > >> >> think
>> > >> >> >> >>>>>> Druid had
>> > >> >> >> >>>>>> >> > > better
>> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
>> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
>> want to
>> > >> use
>> > >> >> the
>> > >> >> >> >>>>>> >> K8S/public
>> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
>> which
>> > >> Kylin
>> > >> >> >> could
>> > >> >> >> >>>>>> be
>> > >> >> >> >>>>>> >> better,
>> > >> >> >> >>>>>> >> > > > like:
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
>> > Kylin
>> > >> can
>> > >> >> >> have
>> > >> >> >> >>>>>> a more
>> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
>> > >> >> >> >>>>>> >> > > >   Index for queries containing different
>> `Group By
>> > >> >> >> >>>>>> dimensions`.
>> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
>> moment)
>> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
>> > not
>> > >> >> show
>> > >> >> >> it
>> > >> >> >> >>>>>> supports
>> > >> >> >> >>>>>> >> > ODBC
>> > >> >> >> >>>>>> >> > > > well)
>> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
>> than
>> > >> Druid.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
>> about
>> > >> it.
>> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> > >> >> opinion.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > ------------------------
>> > >> >> >> >>>>>> >> > > > With warm regard
>> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> > >> >> >> >>>>>> >> > > wrote:
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
>> platform
>> > >> Kylin
>> > >> >> >> >>>>>> compared to
>> > >> >> >> >>>>>> >> > Pinot
>> > >> >> >> >>>>>> >> > > >> and
>> > >> >> >> >>>>>> >> > > >> Druid?
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> Please kindly let me know
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>
>> > >> >> >>
>> > >> >> >
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Hello Xiaoxiang,

How are you, my boss is very interested in Kylin 5. so he would like to
know when Kylin 5 will be released...could you please provide an estimation?

Thank you very much and best regards





On Thu, 18 Jan 2024 at 10:05 Nam Đỗ Duy <na...@vnpay.vn> wrote:

> Good morning Xiaoxiang, hope you are well
>
> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> ===============
>
> May I know when will the JDBC be available? as well as is there any change
> in Kylin 5 release date
>
> Thank you and best regards
>
>
> On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> 1. JDBC source is a feature which in development, it will be supported
>> later.
>>
>> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
>> (I will let you know.)
>>
>> 3. I think ranger and Kerberos are not doing the same things, one for
>> authentication, one for authorization. So they cannot replace each other.
>> Ranger can integrate with Kerberos, please check ranger's website for
>> information.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>> > Thank you Xiaoxiang for your reply
>> >
>> > ————————————-
>> > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > ————————————-
>> > Yes: please answer to help me clear this headache:
>> >
>> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ?
>> If
>> > not then do we have any work around?
>> >
>> > 2. My team is using kerberos for authentication, do you have any
>> > document/casestudy about integrating kerberos with kylin 4.x and kylin
>> 5.x
>> >
>> > 3. Should we use apache ranger instead of kerberos for authentication
>> and
>> > for security purposes?
>> >
>> > Thank you again
>> >
>> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> > > I guess the release date should be 2024/01 .
>> > > Do you have any suggestions/wishes for kylin 5(except real-time
>> feature)?
>> > >
>> > > ------------------------
>> > > With warm regard
>> > > Xiaoxiang Yu
>> > >
>> > >
>> > >
>> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > wrote:
>> > >
>> > >> Thank you very much xiaoxiang, I did the presentation this morning
>> > already
>> > >> so there is no time for you to comment. Next time I will send you in
>> > >> advance. The meeting result was that we will implement both druid and
>> > >> kylin
>> > >> in the next couple of projects because of its realtime feature. Hope
>> > that
>> > >> kylin will have same feature soon.
>> > >>
>> > >> May I ask when will you release kylin 5.0?
>> > >>
>> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> > >>
>> > >> > Since 2018 there are a lot of new features and code refactor.
>> > >> > If you like, you can share your ppt to me privately, maybe I can
>> > >> > give some comments.
>> > >> >
>> > >> > Here is the reference of advantages of Kylin since 2018:
>> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > >> > -
>> > >> >
>> > >>
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> > >> >
>> > >> > ------------------------
>> > >> > With warm regard
>> > >> > Xiaoxiang Yu
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > >> wrote:
>> > >> >
>> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> > >> Druid in
>> > >> >> my team.
>> > >> >>
>> > >> >> I found this article and would like you to update me the
>> advantages
>> > of
>> > >> >> Kylin since 2018 until now (especially with version 5 to be
>> released)
>> > >> >>
>> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
>> 2)?
>> > >> >> <
>> > >> >>
>> > >>
>> >
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> > >> >> >
>> > >> >>
>> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>> > >> >>
>> > >> >> > Thank you very much for your prompt response, I still have
>> several
>> > >> >> > questions to seek for your help later.
>> > >> >> >
>> > >> >> > Best regards and have a good day
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
>> > wrote:
>> > >> >> >
>> > >> >> >> Done. Github branch changed to kylin5.
>> > >> >> >>
>> > >> >> >> ------------------------
>> > >> >> >> With warm regard
>> > >> >> >> Xiaoxiang Yu
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
>> > >> wrote:
>> > >> >> >>
>> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > >> >> >> > ------------------------
>> > >> >> >> > With warm regard
>> > >> >> >> > Xiaoxiang Yu
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
>> > <namdd@vnpay.vn.invalid
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >
>> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
>> > your
>> > >> >> >> default
>> > >> >> >> >> branch. In case people are impressed by the numbers then I
>> hope
>> > >> to
>> > >> >> turn
>> > >> >> >> >> this situation to reverse direction.
>> > >> >> >> >>
>> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <
>> xxyu@apache.org>
>> > >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
>> > the
>> > >> >> active
>> > >> >> >> >>> branch is kylin5.
>> > >> >> >> >>> I will change the default branch to kylin5 later.
>> > >> >> >> >>>
>> > >> >> >> >>> ------------------------
>> > >> >> >> >>> With warm regard
>> > >> >> >> >>> Xiaoxiang Yu
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> > >> <na...@vnpay.vn.invalid>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you see the atttached photo
>> > >> >> >> >>>>
>> > >> >> >> >>>> My boss asked that why druid commit code regularly but
>> kylin
>> > >> had
>> > >> >> not
>> > >> >> >> >>>> been committed since July
>> > >> >> >> >>>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xxyu@apache.org
>> >
>> > >> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>>> I think so.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> Response time is not the only factor to make a decision.
>> > Kylin
>> > >> >> could
>> > >> >> >> >>>>> be cheaper
>> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
>> and
>> > >> Kylin
>> > >> >> >> can
>> > >> >> >> >>>>> guarantee
>> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in
>> an
>> > ad
>> > >> hoc
>> > >> >> >> >>>>> query scenario.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
>> > >> provide
>> > >> >> >> >>>>> unified data analytics services for their customers.
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> ------------------------
>> > >> >> >> >>>>> With warm regard
>> > >> >> >> >>>>> Xiaoxiang Yu
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>
>> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> > >> <namdd@vnpay.vn.invalid
>> > >> >> >
>> > >> >> >> >>>>> wrote:
>> > >> >> >> >>>>>
>> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
>> or
>> > >> aws,
>> > >> >> >> which
>> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
>> > clickhouse
>> > >> >> >> (incase
>> > >> >> >> >>>>>> of
>> > >> >> >> >>>>>> kylin, I have a thought that the query execution has
>> been
>> > >> done
>> > >> >> once
>> > >> >> >> >>>>>> and
>> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
>> > cloud
>> > >> >> >> >>>>>> computation,
>> > >> >> >> >>>>>> is that true)?
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
>> > xxyu@apache.org
>> > >> >
>> > >> >> >> wrote:
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>> > Following text is part of an article(
>> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>>
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> ===============================================================================
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
>> > modes
>> > >> >> >> because
>> > >> >> >> >>>>>> of its
>> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group
>> by,
>> > and
>> > >> >> where
>> > >> >> >> >>>>>> condition
>> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
>> > data
>> > >> >> >> volume
>> > >> >> >> >>>>>> is, the
>> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> > >> particular,
>> > >> >> >> >>>>>> Kylin is
>> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
>> de-emphasis
>> > >> >> (count
>> > >> >> >> >>>>>> distinct),
>> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
>> advantages
>> > in
>> > >> >> >> >>>>>> de-weighting
>> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other
>> scenarios
>> > are
>> > >> >> >> >>>>>> especially
>> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
>> such
>> > >> as
>> > >> >> >> >>>>>> Dashboard, all
>> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
>> > statistics,
>> > >> >> and
>> > >> >> >> user
>> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing,
>> etc.
>> > use
>> > >> >> Kylin
>> > >> >> >> >>>>>> to build
>> > >> >> >> >>>>>> > their data service platforms, providing millions to
>> tens
>> > of
>> > >> >> >> >>>>>> millions of
>> > >> >> >> >>>>>> > queries per day, and most of the queries can be
>> completed
>> > >> >> within
>> > >> >> >> 2
>> > >> >> >> >>>>>> - 3
>> > >> >> >> >>>>>> > seconds. There is no better alternative for such a
>> high
>> > >> >> >> concurrency
>> > >> >> >> >>>>>> > scenario.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> > >> >> computing
>> > >> >> >> >>>>>> power and
>> > >> >> >> >>>>>> > is more suitable when the query request is more
>> flexible,
>> > >> or
>> > >> >> when
>> > >> >> >> >>>>>> there is
>> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
>> > Scenarios
>> > >> >> >> >>>>>> include: very
>> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
>> > combined
>> > >> >> with
>> > >> >> >> the
>> > >> >> >> >>>>>> user
>> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
>> > >> complex
>> > >> >> >> >>>>>> on-the-spot
>> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
>> > large,
>> > >> >> you
>> > >> >> >> >>>>>> need to
>> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
>> > higher
>> > >> >> >> >>>>>> challenge for
>> > >> >> >> >>>>>> > operation and maintenance.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it
>> is
>> > >> more
>> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
>> number
>> > >> of
>> > >> >> >> >>>>>> queries is
>> > >> >> >> >>>>>> > small, even if each query consumes a lot of
>> computational
>> > >> >> >> >>>>>> resources, it is
>> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
>> > fixed
>> > >> >> >> pattern
>> > >> >> >> >>>>>> and the
>> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
>> > >> because
>> > >> >> the
>> > >> >> >> >>>>>> query
>> > >> >> >> >>>>>> > volume is large, and by using large computational
>> > >> resources to
>> > >> >> >> save
>> > >> >> >> >>>>>> the
>> > >> >> >> >>>>>> > results, the upfront computational cost can be
>> amortized
>> > >> over
>> > >> >> >> each
>> > >> >> >> >>>>>> query,
>> > >> >> >> >>>>>> > so it is the most economical.
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > ------------------------
>> > >> >> >> >>>>>> > With warm regard
>> > >> >> >> >>>>>> > Xiaoxiang Yu
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> > >> >> <namdd@vnpay.vn.invalid
>> > >> >> >> >
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
>> > >> feature.
>> > >> >> >> >>>>>> That's
>> > >> >> >> >>>>>> >> great.
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
>> team:
>> > >> >> >> clickhouse
>> > >> >> >> >>>>>> offered
>> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
>> > millisecond
>> > >> >> which
>> > >> >> >> is
>> > >> >> >> >>>>>> faster
>> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do
>> calculating 1
>> > >> >> billion
>> > >> >> >> >>>>>> rows in
>> > >> >> >> >>>>>> >> 2.9
>> > >> >> >> >>>>>> >> seconds)
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> > >> >> clickhouse
>> > >> >> >> so
>> > >> >> >> >>>>>> that I
>> > >> >> >> >>>>>> >> can defend my demonstration.
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> > >> xxyu@apache.org
>> > >> >> >
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
>> analytics,
>> > >> the
>> > >> >> >> reason
>> > >> >> >> >>>>>> here is
>> > >> >> >> >>>>>> >> > that
>> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
>> segment
>> > >> >> build,
>> > >> >> >> is
>> > >> >> >> >>>>>> that
>> > >> >> >> >>>>>> >> > correct?"
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > You are correct.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
>> > work-around
>> > >> of
>> > >> >> >> >>>>>> combination
>> > >> >> >> >>>>>> >> of
>> > >> >> >> >>>>>> >> > ... "
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT
>> streaming(coding is
>> > >> >> >> completed
>> > >> >> >> >>>>>> but not
>> > >> >> >> >>>>>> >> > released),
>> > >> >> >> >>>>>> >> > which can make the time-lag to about 3
>> minutes(that is
>> > >> my
>> > >> >> >> >>>>>> estimation
>> > >> >> >> >>>>>> >> but I
>> > >> >> >> >>>>>> >> > am
>> > >> >> >> >>>>>> >> > quite certain about it).
>> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
>> and
>> > >> do
>> > >> >> >> >>>>>> micro-batch
>> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The
>> price is
>> > >> that
>> > >> >> >> you
>> > >> >> >> >>>>>> need to
>> > >> >> >> >>>>>> >> run
>> > >> >> >> >>>>>> >> > and monitor a long-running
>> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
>> you
>> > >> need
>> > >> >> >> >>>>>> knowledge of
>> > >> >> >> >>>>>> >> > it.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag
>> your
>> > >> >> customers
>> > >> >> >> >>>>>> >> > can tolerate?
>> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
>> > most
>> > >> >> >> cases.
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > ------------------------
>> > >> >> >> >>>>>> >> > With warm regard
>> > >> >> >> >>>>>> >> > Xiaoxiang Yu
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> > >> >> >> >>>>>> >> wrote:
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >> > > Druid is better in
>> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > ==========================
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
>> > the
>> > >> >> reason
>> > >> >> >> >>>>>> here is
>> > >> >> >> >>>>>> >> that
>> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
>> > segment
>> > >> >> build,
>> > >> >> >> >>>>>> is that
>> > >> >> >> >>>>>> >> > > correct?
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
>> work-around
>> > of
>> > >> >> >> >>>>>> combination of
>> > >> >> >> >>>>>> >> :
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
>> > >> provide
>> > >> >> >> >>>>>> >> > > realtime capability ?
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
>> > >> update)
>> > >> >> and
>> > >> >> >> >>>>>> >> integrate it
>> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> > >> >> >> xxyu@apache.org>
>> > >> >> >> >>>>>> wrote:
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
>> > don't
>> > >> >> know
>> > >> >> >> too
>> > >> >> >> >>>>>> much
>> > >> >> >> >>>>>> >> about
>> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> > >> features
>> > >> >> >> that I
>> > >> >> >> >>>>>> know
>> > >> >> >> >>>>>> >> are :
>> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
>> > Druid
>> > >> >> other
>> > >> >> >> >>>>>> than Kylin
>> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
>> the
>> > >> >> Druid
>> > >> >> >> >>>>>> which I
>> > >> >> >> >>>>>> >> used
>> > >> >> >> >>>>>> >> > two
>> > >> >> >> >>>>>> >> > > > years ago):
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
>> result,
>> > I
>> > >> >> think
>> > >> >> >> >>>>>> Druid had
>> > >> >> >> >>>>>> >> > > better
>> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
>> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop,
>> want to
>> > >> use
>> > >> >> the
>> > >> >> >> >>>>>> >> K8S/public
>> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in
>> which
>> > >> Kylin
>> > >> >> >> could
>> > >> >> >> >>>>>> be
>> > >> >> >> >>>>>> >> better,
>> > >> >> >> >>>>>> >> > > > like:
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
>> > Kylin
>> > >> can
>> > >> >> >> have
>> > >> >> >> >>>>>> a more
>> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
>> > >> >> >> >>>>>> >> > > >   Index for queries containing different
>> `Group By
>> > >> >> >> >>>>>> dimensions`.
>> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
>> moment)
>> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
>> > not
>> > >> >> show
>> > >> >> >> it
>> > >> >> >> >>>>>> supports
>> > >> >> >> >>>>>> >> > ODBC
>> > >> >> >> >>>>>> >> > > > well)
>> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better
>> than
>> > >> Druid.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
>> about
>> > >> it.
>> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> > >> >> opinion.
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > ------------------------
>> > >> >> >> >>>>>> >> > > > With warm regard
>> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> > >> >> >> >>>>>> >> > > wrote:
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP
>> platform
>> > >> Kylin
>> > >> >> >> >>>>>> compared to
>> > >> >> >> >>>>>> >> > Pinot
>> > >> >> >> >>>>>> >> > > >> and
>> > >> >> >> >>>>>> >> > > >> Druid?
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> Please kindly let me know
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> > >> >> >> >>>>>> >> > > >>
>> > >> >> >> >>>>>> >> > > >
>> > >> >> >> >>>>>> >> > >
>> > >> >> >> >>>>>> >> >
>> > >> >> >> >>>>>> >>
>> > >> >> >> >>>>>> >
>> > >> >> >> >>>>>>
>> > >> >> >> >>>>>
>> > >> >> >>
>> > >> >> >
>> > >> >>
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Good morning Xiaoxiang, hope you are well

1. JDBC source is a feature which in development, it will be supported
later.

===============

May I know when will the JDBC be available? as well as is there any change
in Kylin 5 release date

Thank you and best regards


On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > ————————————-
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > ————————————-
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > ------------------------
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> ------------------------
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > ------------------------
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> > <namdd@vnpay.vn.invalid
> > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xxyu@apache.org
> >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> > the
> > >> >> active
> > >> >> >> >>> branch is kylin5.
> > >> >> >> >>> I will change the default branch to kylin5 later.
> > >> >> >> >>>
> > >> >> >> >>> ------------------------
> > >> >> >> >>> With warm regard
> > >> >> >> >>> Xiaoxiang Yu
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> > >> <na...@vnpay.vn.invalid>
> > >> >> >> >>> wrote:
> > >> >> >> >>>
> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> > >> >> >> >>>>
> > >> >> >> >>>> Can you see the atttached photo
> > >> >> >> >>>>
> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> kylin
> > >> had
> > >> >> not
> > >> >> >> >>>> been committed since July
> > >> >> >> >>>>
> > >> >> >> >>>>
> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >> >>>>
> > >> >> >> >>>>> I think so.
> > >> >> >> >>>>>
> > >> >> >> >>>>> Response time is not the only factor to make a decision.
> > Kylin
> > >> >> could
> > >> >> >> >>>>> be cheaper
> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> and
> > >> Kylin
> > >> >> >> can
> > >> >> >> >>>>> guarantee
> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> > ad
> > >> hoc
> > >> >> >> >>>>> query scenario.
> > >> >> >> >>>>>
> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> > >> provide
> > >> >> >> >>>>> unified data analytics services for their customers.
> > >> >> >> >>>>>
> > >> >> >> >>>>> ------------------------
> > >> >> >> >>>>> With warm regard
> > >> >> >> >>>>> Xiaoxiang Yu
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> > >> <namdd@vnpay.vn.invalid
> > >> >> >
> > >> >> >> >>>>> wrote:
> > >> >> >> >>>>>
> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
> or
> > >> aws,
> > >> >> >> which
> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> > clickhouse
> > >> >> >> (incase
> > >> >> >> >>>>>> of
> > >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> > >> done
> > >> >> once
> > >> >> >> >>>>>> and
> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> > cloud
> > >> >> >> >>>>>> computation,
> > >> >> >> >>>>>> is that true)?
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> > xxyu@apache.org
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> > Following text is part of an article(
> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >>
> > >> >>
> > >>
> >
> ===============================================================================
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> > modes
> > >> >> >> because
> > >> >> >> >>>>>> of its
> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> > and
> > >> >> where
> > >> >> >> >>>>>> condition
> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> > data
> > >> >> >> volume
> > >> >> >> >>>>>> is, the
> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> > >> particular,
> > >> >> >> >>>>>> Kylin is
> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> de-emphasis
> > >> >> (count
> > >> >> >> >>>>>> distinct),
> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> advantages
> > in
> > >> >> >> >>>>>> de-weighting
> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> > are
> > >> >> >> >>>>>> especially
> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> such
> > >> as
> > >> >> >> >>>>>> Dashboard, all
> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> > statistics,
> > >> >> and
> > >> >> >> user
> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> > use
> > >> >> Kylin
> > >> >> >> >>>>>> to build
> > >> >> >> >>>>>> > their data service platforms, providing millions to
> tens
> > of
> > >> >> >> >>>>>> millions of
> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> completed
> > >> >> within
> > >> >> >> 2
> > >> >> >> >>>>>> - 3
> > >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> > >> >> >> concurrency
> > >> >> >> >>>>>> > scenario.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> > >> >> computing
> > >> >> >> >>>>>> power and
> > >> >> >> >>>>>> > is more suitable when the query request is more
> flexible,
> > >> or
> > >> >> when
> > >> >> >> >>>>>> there is
> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> > Scenarios
> > >> >> >> >>>>>> include: very
> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> > combined
> > >> >> with
> > >> >> >> the
> > >> >> >> >>>>>> user
> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> > >> complex
> > >> >> >> >>>>>> on-the-spot
> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> > large,
> > >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> > higher
> > >> >> >> >>>>>> challenge for
> > >> >> >> >>>>>> > operation and maintenance.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> > >> more
> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> number
> > >> of
> > >> >> >> >>>>>> queries is
> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> computational
> > >> >> >> >>>>>> resources, it is
> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> > fixed
> > >> >> >> pattern
> > >> >> >> >>>>>> and the
> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> > >> because
> > >> >> the
> > >> >> >> >>>>>> query
> > >> >> >> >>>>>> > volume is large, and by using large computational
> > >> resources to
> > >> >> >> save
> > >> >> >> >>>>>> the
> > >> >> >> >>>>>> > results, the upfront computational cost can be
> amortized
> > >> over
> > >> >> >> each
> > >> >> >> >>>>>> query,
> > >> >> >> >>>>>> > so it is the most economical.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ------------------------
> > >> >> >> >>>>>> > With warm regard
> > >> >> >> >>>>>> > Xiaoxiang Yu
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> > >> >> <namdd@vnpay.vn.invalid
> > >> >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> > >> feature.
> > >> >> >> >>>>>> That's
> > >> >> >> >>>>>> >> great.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> team:
> > >> >> >> clickhouse
> > >> >> >> >>>>>> offered
> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> > millisecond
> > >> >> which
> > >> >> >> is
> > >> >> >> >>>>>> faster
> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating
> 1
> > >> >> billion
> > >> >> >> >>>>>> rows in
> > >> >> >> >>>>>> >> 2.9
> > >> >> >> >>>>>> >> seconds)
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> > >> >> clickhouse
> > >> >> >> so
> > >> >> >> >>>>>> that I
> > >> >> >> >>>>>> >> can defend my demonstration.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> > >> xxyu@apache.org
> > >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> analytics,
> > >> the
> > >> >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> > that
> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> segment
> > >> >> build,
> > >> >> >> is
> > >> >> >> >>>>>> that
> > >> >> >> >>>>>> >> > correct?"
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > You are correct.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> > work-around
> > >> of
> > >> >> >> >>>>>> combination
> > >> >> >> >>>>>> >> of
> > >> >> >> >>>>>> >> > ... "
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding
> is
> > >> >> >> completed
> > >> >> >> >>>>>> but not
> > >> >> >> >>>>>> >> > released),
> > >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that
> is
> > >> my
> > >> >> >> >>>>>> estimation
> > >> >> >> >>>>>> >> but I
> > >> >> >> >>>>>> >> > am
> > >> >> >> >>>>>> >> > quite certain about it).
> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
> and
> > >> do
> > >> >> >> >>>>>> micro-batch
> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The price
> is
> > >> that
> > >> >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> >> run
> > >> >> >> >>>>>> >> > and monitor a long-running
> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
> you
> > >> need
> > >> >> >> >>>>>> knowledge of
> > >> >> >> >>>>>> >> > it.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> > >> >> customers
> > >> >> >> >>>>>> >> > can tolerate?
> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> > most
> > >> >> >> cases.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > ------------------------
> > >> >> >> >>>>>> >> > With warm regard
> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> wrote:
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > > Druid is better in
> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > ==========================
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> > the
> > >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> that
> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> > segment
> > >> >> build,
> > >> >> >> >>>>>> is that
> > >> >> >> >>>>>> >> > > correct?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> work-around
> > of
> > >> >> >> >>>>>> combination of
> > >> >> >> >>>>>> >> :
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> > >> provide
> > >> >> >> >>>>>> >> > > realtime capability ?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> > >> update)
> > >> >> and
> > >> >> >> >>>>>> >> integrate it
> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> > >> >> >> xxyu@apache.org>
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> > don't
> > >> >> know
> > >> >> >> too
> > >> >> >> >>>>>> much
> > >> >> >> >>>>>> >> about
> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> > >> features
> > >> >> >> that I
> > >> >> >> >>>>>> know
> > >> >> >> >>>>>> >> are :
> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> > Druid
> > >> >> other
> > >> >> >> >>>>>> than Kylin
> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
> the
> > >> >> Druid
> > >> >> >> >>>>>> which I
> > >> >> >> >>>>>> >> used
> > >> >> >> >>>>>> >> > two
> > >> >> >> >>>>>> >> > > > years ago):
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> result,
> > I
> > >> >> think
> > >> >> >> >>>>>> Druid had
> > >> >> >> >>>>>> >> > > better
> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want
> to
> > >> use
> > >> >> the
> > >> >> >> >>>>>> >> K8S/public
> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> > >> Kylin
> > >> >> >> could
> > >> >> >> >>>>>> be
> > >> >> >> >>>>>> >> better,
> > >> >> >> >>>>>> >> > > > like:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> > Kylin
> > >> can
> > >> >> >> have
> > >> >> >> >>>>>> a more
> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> > >> >> >> >>>>>> >> > > >   Index for queries containing different `Group
> By
> > >> >> >> >>>>>> dimensions`.
> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> moment)
> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> > not
> > >> >> show
> > >> >> >> it
> > >> >> >> >>>>>> supports
> > >> >> >> >>>>>> >> > ODBC
> > >> >> >> >>>>>> >> > > > well)
> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> > >> Druid.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> about
> > >> it.
> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> > >> >> opinion.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > ------------------------
> > >> >> >> >>>>>> >> > > > With warm regard
> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> > > wrote:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> > >> Kylin
> > >> >> >> >>>>>> compared to
> > >> >> >> >>>>>> >> > Pinot
> > >> >> >> >>>>>> >> > > >> and
> > >> >> >> >>>>>> >> > > >> Druid?
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >> >>>>>
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you very much, please kindly start kylin-kerberos document and JDBC
connectivity, we will be actively participating in testing that JDBC when
it is available so please let us know.

Best regards

On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > ————————————-
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > ————————————-
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > ------------------------
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> ------------------------
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > ------------------------
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> > <namdd@vnpay.vn.invalid
> > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xxyu@apache.org
> >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> > the
> > >> >> active
> > >> >> >> >>> branch is kylin5.
> > >> >> >> >>> I will change the default branch to kylin5 later.
> > >> >> >> >>>
> > >> >> >> >>> ------------------------
> > >> >> >> >>> With warm regard
> > >> >> >> >>> Xiaoxiang Yu
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> > >> <na...@vnpay.vn.invalid>
> > >> >> >> >>> wrote:
> > >> >> >> >>>
> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> > >> >> >> >>>>
> > >> >> >> >>>> Can you see the atttached photo
> > >> >> >> >>>>
> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> kylin
> > >> had
> > >> >> not
> > >> >> >> >>>> been committed since July
> > >> >> >> >>>>
> > >> >> >> >>>>
> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >> >>>>
> > >> >> >> >>>>> I think so.
> > >> >> >> >>>>>
> > >> >> >> >>>>> Response time is not the only factor to make a decision.
> > Kylin
> > >> >> could
> > >> >> >> >>>>> be cheaper
> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> and
> > >> Kylin
> > >> >> >> can
> > >> >> >> >>>>> guarantee
> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> > ad
> > >> hoc
> > >> >> >> >>>>> query scenario.
> > >> >> >> >>>>>
> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> > >> provide
> > >> >> >> >>>>> unified data analytics services for their customers.
> > >> >> >> >>>>>
> > >> >> >> >>>>> ------------------------
> > >> >> >> >>>>> With warm regard
> > >> >> >> >>>>> Xiaoxiang Yu
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> > >> <namdd@vnpay.vn.invalid
> > >> >> >
> > >> >> >> >>>>> wrote:
> > >> >> >> >>>>>
> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
> or
> > >> aws,
> > >> >> >> which
> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> > clickhouse
> > >> >> >> (incase
> > >> >> >> >>>>>> of
> > >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> > >> done
> > >> >> once
> > >> >> >> >>>>>> and
> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> > cloud
> > >> >> >> >>>>>> computation,
> > >> >> >> >>>>>> is that true)?
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> > xxyu@apache.org
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> > Following text is part of an article(
> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >>
> > >> >>
> > >>
> >
> ===============================================================================
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> > modes
> > >> >> >> because
> > >> >> >> >>>>>> of its
> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> > and
> > >> >> where
> > >> >> >> >>>>>> condition
> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> > data
> > >> >> >> volume
> > >> >> >> >>>>>> is, the
> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> > >> particular,
> > >> >> >> >>>>>> Kylin is
> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> de-emphasis
> > >> >> (count
> > >> >> >> >>>>>> distinct),
> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> advantages
> > in
> > >> >> >> >>>>>> de-weighting
> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> > are
> > >> >> >> >>>>>> especially
> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> such
> > >> as
> > >> >> >> >>>>>> Dashboard, all
> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> > statistics,
> > >> >> and
> > >> >> >> user
> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> > use
> > >> >> Kylin
> > >> >> >> >>>>>> to build
> > >> >> >> >>>>>> > their data service platforms, providing millions to
> tens
> > of
> > >> >> >> >>>>>> millions of
> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> completed
> > >> >> within
> > >> >> >> 2
> > >> >> >> >>>>>> - 3
> > >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> > >> >> >> concurrency
> > >> >> >> >>>>>> > scenario.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> > >> >> computing
> > >> >> >> >>>>>> power and
> > >> >> >> >>>>>> > is more suitable when the query request is more
> flexible,
> > >> or
> > >> >> when
> > >> >> >> >>>>>> there is
> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> > Scenarios
> > >> >> >> >>>>>> include: very
> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> > combined
> > >> >> with
> > >> >> >> the
> > >> >> >> >>>>>> user
> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> > >> complex
> > >> >> >> >>>>>> on-the-spot
> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> > large,
> > >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> > higher
> > >> >> >> >>>>>> challenge for
> > >> >> >> >>>>>> > operation and maintenance.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> > >> more
> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> number
> > >> of
> > >> >> >> >>>>>> queries is
> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> computational
> > >> >> >> >>>>>> resources, it is
> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> > fixed
> > >> >> >> pattern
> > >> >> >> >>>>>> and the
> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> > >> because
> > >> >> the
> > >> >> >> >>>>>> query
> > >> >> >> >>>>>> > volume is large, and by using large computational
> > >> resources to
> > >> >> >> save
> > >> >> >> >>>>>> the
> > >> >> >> >>>>>> > results, the upfront computational cost can be
> amortized
> > >> over
> > >> >> >> each
> > >> >> >> >>>>>> query,
> > >> >> >> >>>>>> > so it is the most economical.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ------------------------
> > >> >> >> >>>>>> > With warm regard
> > >> >> >> >>>>>> > Xiaoxiang Yu
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> > >> >> <namdd@vnpay.vn.invalid
> > >> >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> > >> feature.
> > >> >> >> >>>>>> That's
> > >> >> >> >>>>>> >> great.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> team:
> > >> >> >> clickhouse
> > >> >> >> >>>>>> offered
> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> > millisecond
> > >> >> which
> > >> >> >> is
> > >> >> >> >>>>>> faster
> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating
> 1
> > >> >> billion
> > >> >> >> >>>>>> rows in
> > >> >> >> >>>>>> >> 2.9
> > >> >> >> >>>>>> >> seconds)
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> > >> >> clickhouse
> > >> >> >> so
> > >> >> >> >>>>>> that I
> > >> >> >> >>>>>> >> can defend my demonstration.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> > >> xxyu@apache.org
> > >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> analytics,
> > >> the
> > >> >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> > that
> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> segment
> > >> >> build,
> > >> >> >> is
> > >> >> >> >>>>>> that
> > >> >> >> >>>>>> >> > correct?"
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > You are correct.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> > work-around
> > >> of
> > >> >> >> >>>>>> combination
> > >> >> >> >>>>>> >> of
> > >> >> >> >>>>>> >> > ... "
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding
> is
> > >> >> >> completed
> > >> >> >> >>>>>> but not
> > >> >> >> >>>>>> >> > released),
> > >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that
> is
> > >> my
> > >> >> >> >>>>>> estimation
> > >> >> >> >>>>>> >> but I
> > >> >> >> >>>>>> >> > am
> > >> >> >> >>>>>> >> > quite certain about it).
> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
> and
> > >> do
> > >> >> >> >>>>>> micro-batch
> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The price
> is
> > >> that
> > >> >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> >> run
> > >> >> >> >>>>>> >> > and monitor a long-running
> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
> you
> > >> need
> > >> >> >> >>>>>> knowledge of
> > >> >> >> >>>>>> >> > it.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> > >> >> customers
> > >> >> >> >>>>>> >> > can tolerate?
> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> > most
> > >> >> >> cases.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > ------------------------
> > >> >> >> >>>>>> >> > With warm regard
> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> wrote:
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > > Druid is better in
> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > ==========================
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> > the
> > >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> that
> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> > segment
> > >> >> build,
> > >> >> >> >>>>>> is that
> > >> >> >> >>>>>> >> > > correct?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> work-around
> > of
> > >> >> >> >>>>>> combination of
> > >> >> >> >>>>>> >> :
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> > >> provide
> > >> >> >> >>>>>> >> > > realtime capability ?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> > >> update)
> > >> >> and
> > >> >> >> >>>>>> >> integrate it
> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> > >> >> >> xxyu@apache.org>
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> > don't
> > >> >> know
> > >> >> >> too
> > >> >> >> >>>>>> much
> > >> >> >> >>>>>> >> about
> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> > >> features
> > >> >> >> that I
> > >> >> >> >>>>>> know
> > >> >> >> >>>>>> >> are :
> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> > Druid
> > >> >> other
> > >> >> >> >>>>>> than Kylin
> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
> the
> > >> >> Druid
> > >> >> >> >>>>>> which I
> > >> >> >> >>>>>> >> used
> > >> >> >> >>>>>> >> > two
> > >> >> >> >>>>>> >> > > > years ago):
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> result,
> > I
> > >> >> think
> > >> >> >> >>>>>> Druid had
> > >> >> >> >>>>>> >> > > better
> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want
> to
> > >> use
> > >> >> the
> > >> >> >> >>>>>> >> K8S/public
> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> > >> Kylin
> > >> >> >> could
> > >> >> >> >>>>>> be
> > >> >> >> >>>>>> >> better,
> > >> >> >> >>>>>> >> > > > like:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> > Kylin
> > >> can
> > >> >> >> have
> > >> >> >> >>>>>> a more
> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> > >> >> >> >>>>>> >> > > >   Index for queries containing different `Group
> By
> > >> >> >> >>>>>> dimensions`.
> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> moment)
> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> > not
> > >> >> show
> > >> >> >> it
> > >> >> >> >>>>>> supports
> > >> >> >> >>>>>> >> > ODBC
> > >> >> >> >>>>>> >> > > > well)
> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> > >> Druid.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> about
> > >> it.
> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> > >> >> opinion.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > ------------------------
> > >> >> >> >>>>>> >> > > > With warm regard
> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> > > wrote:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> > >> Kylin
> > >> >> >> >>>>>> compared to
> > >> >> >> >>>>>> >> > Pinot
> > >> >> >> >>>>>> >> > > >> and
> > >> >> >> >>>>>> >> > > >> Druid?
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >> >>>>>
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you very much, please kindly start kylin-kerberos document and JDBC
connectivity, we will be actively participating in testing that JDBC when
it is available so please let us know.

Best regards

On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > ————————————-
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > ————————————-
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > ------------------------
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> ------------------------
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > ------------------------
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> > <namdd@vnpay.vn.invalid
> > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xxyu@apache.org
> >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> > the
> > >> >> active
> > >> >> >> >>> branch is kylin5.
> > >> >> >> >>> I will change the default branch to kylin5 later.
> > >> >> >> >>>
> > >> >> >> >>> ------------------------
> > >> >> >> >>> With warm regard
> > >> >> >> >>> Xiaoxiang Yu
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> > >> <na...@vnpay.vn.invalid>
> > >> >> >> >>> wrote:
> > >> >> >> >>>
> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> > >> >> >> >>>>
> > >> >> >> >>>> Can you see the atttached photo
> > >> >> >> >>>>
> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> kylin
> > >> had
> > >> >> not
> > >> >> >> >>>> been committed since July
> > >> >> >> >>>>
> > >> >> >> >>>>
> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >> >>>>
> > >> >> >> >>>>> I think so.
> > >> >> >> >>>>>
> > >> >> >> >>>>> Response time is not the only factor to make a decision.
> > Kylin
> > >> >> could
> > >> >> >> >>>>> be cheaper
> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> and
> > >> Kylin
> > >> >> >> can
> > >> >> >> >>>>> guarantee
> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> > ad
> > >> hoc
> > >> >> >> >>>>> query scenario.
> > >> >> >> >>>>>
> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> > >> provide
> > >> >> >> >>>>> unified data analytics services for their customers.
> > >> >> >> >>>>>
> > >> >> >> >>>>> ------------------------
> > >> >> >> >>>>> With warm regard
> > >> >> >> >>>>> Xiaoxiang Yu
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> > >> <namdd@vnpay.vn.invalid
> > >> >> >
> > >> >> >> >>>>> wrote:
> > >> >> >> >>>>>
> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
> or
> > >> aws,
> > >> >> >> which
> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> > clickhouse
> > >> >> >> (incase
> > >> >> >> >>>>>> of
> > >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> > >> done
> > >> >> once
> > >> >> >> >>>>>> and
> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> > cloud
> > >> >> >> >>>>>> computation,
> > >> >> >> >>>>>> is that true)?
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> > xxyu@apache.org
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> > Following text is part of an article(
> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >>
> > >> >>
> > >>
> >
> ===============================================================================
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> > modes
> > >> >> >> because
> > >> >> >> >>>>>> of its
> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> > and
> > >> >> where
> > >> >> >> >>>>>> condition
> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> > data
> > >> >> >> volume
> > >> >> >> >>>>>> is, the
> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> > >> particular,
> > >> >> >> >>>>>> Kylin is
> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> de-emphasis
> > >> >> (count
> > >> >> >> >>>>>> distinct),
> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> advantages
> > in
> > >> >> >> >>>>>> de-weighting
> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> > are
> > >> >> >> >>>>>> especially
> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> such
> > >> as
> > >> >> >> >>>>>> Dashboard, all
> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> > statistics,
> > >> >> and
> > >> >> >> user
> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> > use
> > >> >> Kylin
> > >> >> >> >>>>>> to build
> > >> >> >> >>>>>> > their data service platforms, providing millions to
> tens
> > of
> > >> >> >> >>>>>> millions of
> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> completed
> > >> >> within
> > >> >> >> 2
> > >> >> >> >>>>>> - 3
> > >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> > >> >> >> concurrency
> > >> >> >> >>>>>> > scenario.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> > >> >> computing
> > >> >> >> >>>>>> power and
> > >> >> >> >>>>>> > is more suitable when the query request is more
> flexible,
> > >> or
> > >> >> when
> > >> >> >> >>>>>> there is
> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> > Scenarios
> > >> >> >> >>>>>> include: very
> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> > combined
> > >> >> with
> > >> >> >> the
> > >> >> >> >>>>>> user
> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> > >> complex
> > >> >> >> >>>>>> on-the-spot
> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> > large,
> > >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> > higher
> > >> >> >> >>>>>> challenge for
> > >> >> >> >>>>>> > operation and maintenance.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> > >> more
> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> number
> > >> of
> > >> >> >> >>>>>> queries is
> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> computational
> > >> >> >> >>>>>> resources, it is
> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> > fixed
> > >> >> >> pattern
> > >> >> >> >>>>>> and the
> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> > >> because
> > >> >> the
> > >> >> >> >>>>>> query
> > >> >> >> >>>>>> > volume is large, and by using large computational
> > >> resources to
> > >> >> >> save
> > >> >> >> >>>>>> the
> > >> >> >> >>>>>> > results, the upfront computational cost can be
> amortized
> > >> over
> > >> >> >> each
> > >> >> >> >>>>>> query,
> > >> >> >> >>>>>> > so it is the most economical.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ------------------------
> > >> >> >> >>>>>> > With warm regard
> > >> >> >> >>>>>> > Xiaoxiang Yu
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> > >> >> <namdd@vnpay.vn.invalid
> > >> >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> > >> feature.
> > >> >> >> >>>>>> That's
> > >> >> >> >>>>>> >> great.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> team:
> > >> >> >> clickhouse
> > >> >> >> >>>>>> offered
> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> > millisecond
> > >> >> which
> > >> >> >> is
> > >> >> >> >>>>>> faster
> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating
> 1
> > >> >> billion
> > >> >> >> >>>>>> rows in
> > >> >> >> >>>>>> >> 2.9
> > >> >> >> >>>>>> >> seconds)
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> > >> >> clickhouse
> > >> >> >> so
> > >> >> >> >>>>>> that I
> > >> >> >> >>>>>> >> can defend my demonstration.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> > >> xxyu@apache.org
> > >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> analytics,
> > >> the
> > >> >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> > that
> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> segment
> > >> >> build,
> > >> >> >> is
> > >> >> >> >>>>>> that
> > >> >> >> >>>>>> >> > correct?"
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > You are correct.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> > work-around
> > >> of
> > >> >> >> >>>>>> combination
> > >> >> >> >>>>>> >> of
> > >> >> >> >>>>>> >> > ... "
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding
> is
> > >> >> >> completed
> > >> >> >> >>>>>> but not
> > >> >> >> >>>>>> >> > released),
> > >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that
> is
> > >> my
> > >> >> >> >>>>>> estimation
> > >> >> >> >>>>>> >> but I
> > >> >> >> >>>>>> >> > am
> > >> >> >> >>>>>> >> > quite certain about it).
> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
> and
> > >> do
> > >> >> >> >>>>>> micro-batch
> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The price
> is
> > >> that
> > >> >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> >> run
> > >> >> >> >>>>>> >> > and monitor a long-running
> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
> you
> > >> need
> > >> >> >> >>>>>> knowledge of
> > >> >> >> >>>>>> >> > it.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> > >> >> customers
> > >> >> >> >>>>>> >> > can tolerate?
> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> > most
> > >> >> >> cases.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > ------------------------
> > >> >> >> >>>>>> >> > With warm regard
> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> wrote:
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > > Druid is better in
> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > ==========================
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> > the
> > >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> that
> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> > segment
> > >> >> build,
> > >> >> >> >>>>>> is that
> > >> >> >> >>>>>> >> > > correct?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> work-around
> > of
> > >> >> >> >>>>>> combination of
> > >> >> >> >>>>>> >> :
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> > >> provide
> > >> >> >> >>>>>> >> > > realtime capability ?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> > >> update)
> > >> >> and
> > >> >> >> >>>>>> >> integrate it
> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> > >> >> >> xxyu@apache.org>
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> > don't
> > >> >> know
> > >> >> >> too
> > >> >> >> >>>>>> much
> > >> >> >> >>>>>> >> about
> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> > >> features
> > >> >> >> that I
> > >> >> >> >>>>>> know
> > >> >> >> >>>>>> >> are :
> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> > Druid
> > >> >> other
> > >> >> >> >>>>>> than Kylin
> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
> the
> > >> >> Druid
> > >> >> >> >>>>>> which I
> > >> >> >> >>>>>> >> used
> > >> >> >> >>>>>> >> > two
> > >> >> >> >>>>>> >> > > > years ago):
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> result,
> > I
> > >> >> think
> > >> >> >> >>>>>> Druid had
> > >> >> >> >>>>>> >> > > better
> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want
> to
> > >> use
> > >> >> the
> > >> >> >> >>>>>> >> K8S/public
> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> > >> Kylin
> > >> >> >> could
> > >> >> >> >>>>>> be
> > >> >> >> >>>>>> >> better,
> > >> >> >> >>>>>> >> > > > like:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> > Kylin
> > >> can
> > >> >> >> have
> > >> >> >> >>>>>> a more
> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> > >> >> >> >>>>>> >> > > >   Index for queries containing different `Group
> By
> > >> >> >> >>>>>> dimensions`.
> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> moment)
> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> > not
> > >> >> show
> > >> >> >> it
> > >> >> >> >>>>>> supports
> > >> >> >> >>>>>> >> > ODBC
> > >> >> >> >>>>>> >> > > > well)
> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> > >> Druid.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> about
> > >> it.
> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> > >> >> opinion.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > ------------------------
> > >> >> >> >>>>>> >> > > > With warm regard
> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> > > wrote:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> > >> Kylin
> > >> >> >> >>>>>> compared to
> > >> >> >> >>>>>> >> > Pinot
> > >> >> >> >>>>>> >> > > >> and
> > >> >> >> >>>>>> >> > > >> Druid?
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >> >>>>>
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Good morning Xiaoxiang, hope you are well

1. JDBC source is a feature which in development, it will be supported
later.

===============

May I know when will the JDBC be available? as well as is there any change
in Kylin 5 release date

Thank you and best regards


On Mon, Dec 11, 2023 at 2:15 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. JDBC source is a feature which in development, it will be supported
> later.
>
> 2. Kylin supports kerberos now, I will write a doc as soon as possible.
> (I will let you know.)
>
> 3. I think ranger and Kerberos are not doing the same things, one for
> authentication, one for authorization. So they cannot replace each other.
> Ranger can integrate with Kerberos, please check ranger's website for
> information.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Thank you Xiaoxiang for your reply
> >
> > ————————————-
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> > ————————————-
> > Yes: please answer to help me clear this headache:
> >
> > 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> > not then do we have any work around?
> >
> > 2. My team is using kerberos for authentication, do you have any
> > document/casestudy about integrating kerberos with kylin 4.x and kylin
> 5.x
> >
> > 3. Should we use apache ranger instead of kerberos for authentication and
> > for security purposes?
> >
> > Thank you again
> >
> > On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I guess the release date should be 2024/01 .
> > > Do you have any suggestions/wishes for kylin 5(except real-time
> feature)?
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Thank you very much xiaoxiang, I did the presentation this morning
> > already
> > >> so there is no time for you to comment. Next time I will send you in
> > >> advance. The meeting result was that we will implement both druid and
> > >> kylin
> > >> in the next couple of projects because of its realtime feature. Hope
> > that
> > >> kylin will have same feature soon.
> > >>
> > >> May I ask when will you release kylin 5.0?
> > >>
> > >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >>
> > >> > Since 2018 there are a lot of new features and code refactor.
> > >> > If you like, you can share your ppt to me privately, maybe I can
> > >> > give some comments.
> > >> >
> > >> > Here is the reference of advantages of Kylin since 2018:
> > >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > >> > -
> > >> >
> > >>
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> > >> >
> > >> > ------------------------
> > >> > With warm regard
> > >> > Xiaoxiang Yu
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > >> wrote:
> > >> >
> > >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> > >> Druid in
> > >> >> my team.
> > >> >>
> > >> >> I found this article and would like you to update me the advantages
> > of
> > >> >> Kylin since 2018 until now (especially with version 5 to be
> released)
> > >> >>
> > >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of
> 2)?
> > >> >> <
> > >> >>
> > >>
> >
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> > >> >> >
> > >> >>
> > >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > >> >>
> > >> >> > Thank you very much for your prompt response, I still have
> several
> > >> >> > questions to seek for your help later.
> > >> >> >
> > >> >> > Best regards and have a good day
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> > wrote:
> > >> >> >
> > >> >> >> Done. Github branch changed to kylin5.
> > >> >> >>
> > >> >> >> ------------------------
> > >> >> >> With warm regard
> > >> >> >> Xiaoxiang Yu
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >>
> > >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> > >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > >> >> >> > ------------------------
> > >> >> >> > With warm regard
> > >> >> >> > Xiaoxiang Yu
> > >> >> >> >
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> > <namdd@vnpay.vn.invalid
> > >> >
> > >> >> >> wrote:
> > >> >> >> >
> > >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> > your
> > >> >> >> default
> > >> >> >> >> branch. In case people are impressed by the numbers then I
> hope
> > >> to
> > >> >> turn
> > >> >> >> >> this situation to reverse direction.
> > >> >> >> >>
> > >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xxyu@apache.org
> >
> > >> >> wrote:
> > >> >> >> >>
> > >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> > the
> > >> >> active
> > >> >> >> >>> branch is kylin5.
> > >> >> >> >>> I will change the default branch to kylin5 later.
> > >> >> >> >>>
> > >> >> >> >>> ------------------------
> > >> >> >> >>> With warm regard
> > >> >> >> >>> Xiaoxiang Yu
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>>
> > >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> > >> <na...@vnpay.vn.invalid>
> > >> >> >> >>> wrote:
> > >> >> >> >>>
> > >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> > >> >> >> >>>>
> > >> >> >> >>>> Can you see the atttached photo
> > >> >> >> >>>>
> > >> >> >> >>>> My boss asked that why druid commit code regularly but
> kylin
> > >> had
> > >> >> not
> > >> >> >> >>>> been committed since July
> > >> >> >> >>>>
> > >> >> >> >>>>
> > >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> > >> wrote:
> > >> >> >> >>>>
> > >> >> >> >>>>> I think so.
> > >> >> >> >>>>>
> > >> >> >> >>>>> Response time is not the only factor to make a decision.
> > Kylin
> > >> >> could
> > >> >> >> >>>>> be cheaper
> > >> >> >> >>>>> when the query pattern is suitable for the Kylin model,
> and
> > >> Kylin
> > >> >> >> can
> > >> >> >> >>>>> guarantee
> > >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> > ad
> > >> hoc
> > >> >> >> >>>>> query scenario.
> > >> >> >> >>>>>
> > >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> > >> provide
> > >> >> >> >>>>> unified data analytics services for their customers.
> > >> >> >> >>>>>
> > >> >> >> >>>>> ------------------------
> > >> >> >> >>>>> With warm regard
> > >> >> >> >>>>> Xiaoxiang Yu
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>>
> > >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> > >> <namdd@vnpay.vn.invalid
> > >> >> >
> > >> >> >> >>>>> wrote:
> > >> >> >> >>>>>
> > >> >> >> >>>>>> Hi Xiaoxiang, thank you
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> In case my client uses cloud computing service like gcp
> or
> > >> aws,
> > >> >> >> which
> > >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> > clickhouse
> > >> >> >> (incase
> > >> >> >> >>>>>> of
> > >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> > >> done
> > >> >> once
> > >> >> >> >>>>>> and
> > >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> > cloud
> > >> >> >> >>>>>> computation,
> > >> >> >> >>>>>> is that true)?
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> > xxyu@apache.org
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>>>>>
> > >> >> >> >>>>>> > Following text is part of an article(
> > >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >>
> > >> >>
> > >>
> >
> ===============================================================================
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> > modes
> > >> >> >> because
> > >> >> >> >>>>>> of its
> > >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> > and
> > >> >> where
> > >> >> >> >>>>>> condition
> > >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> > data
> > >> >> >> volume
> > >> >> >> >>>>>> is, the
> > >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> > >> particular,
> > >> >> >> >>>>>> Kylin is
> > >> >> >> >>>>>> > particularly advantageous in the scenarios of
> de-emphasis
> > >> >> (count
> > >> >> >> >>>>>> distinct),
> > >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's
> advantages
> > in
> > >> >> >> >>>>>> de-weighting
> > >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> > are
> > >> >> >> >>>>>> especially
> > >> >> >> >>>>>> > huge, and it is used in a large number of scenarios,
> such
> > >> as
> > >> >> >> >>>>>> Dashboard, all
> > >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> > statistics,
> > >> >> and
> > >> >> >> user
> > >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> > use
> > >> >> Kylin
> > >> >> >> >>>>>> to build
> > >> >> >> >>>>>> > their data service platforms, providing millions to
> tens
> > of
> > >> >> >> >>>>>> millions of
> > >> >> >> >>>>>> > queries per day, and most of the queries can be
> completed
> > >> >> within
> > >> >> >> 2
> > >> >> >> >>>>>> - 3
> > >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> > >> >> >> concurrency
> > >> >> >> >>>>>> > scenario.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> > >> >> computing
> > >> >> >> >>>>>> power and
> > >> >> >> >>>>>> > is more suitable when the query request is more
> flexible,
> > >> or
> > >> >> when
> > >> >> >> >>>>>> there is
> > >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> > Scenarios
> > >> >> >> >>>>>> include: very
> > >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> > combined
> > >> >> with
> > >> >> >> the
> > >> >> >> >>>>>> user
> > >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> > >> complex
> > >> >> >> >>>>>> on-the-spot
> > >> >> >> >>>>>> > query and so on. If the amount of data and access is
> > large,
> > >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> > higher
> > >> >> >> >>>>>> challenge for
> > >> >> >> >>>>>> > operation and maintenance.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> > >> more
> > >> >> >> >>>>>> > resource-efficient to use now-computing. Since the
> number
> > >> of
> > >> >> >> >>>>>> queries is
> > >> >> >> >>>>>> > small, even if each query consumes a lot of
> computational
> > >> >> >> >>>>>> resources, it is
> > >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> > fixed
> > >> >> >> pattern
> > >> >> >> >>>>>> and the
> > >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> > >> because
> > >> >> the
> > >> >> >> >>>>>> query
> > >> >> >> >>>>>> > volume is large, and by using large computational
> > >> resources to
> > >> >> >> save
> > >> >> >> >>>>>> the
> > >> >> >> >>>>>> > results, the upfront computational cost can be
> amortized
> > >> over
> > >> >> >> each
> > >> >> >> >>>>>> query,
> > >> >> >> >>>>>> > so it is the most economical.
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > ------------------------
> > >> >> >> >>>>>> > With warm regard
> > >> >> >> >>>>>> > Xiaoxiang Yu
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> > >> >> <namdd@vnpay.vn.invalid
> > >> >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> > >> feature.
> > >> >> >> >>>>>> That's
> > >> >> >> >>>>>> >> great.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> This morning there has been a new challenge to my
> team:
> > >> >> >> clickhouse
> > >> >> >> >>>>>> offered
> > >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> > millisecond
> > >> >> which
> > >> >> >> is
> > >> >> >> >>>>>> faster
> > >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating
> 1
> > >> >> billion
> > >> >> >> >>>>>> rows in
> > >> >> >> >>>>>> >> 2.9
> > >> >> >> >>>>>> >> seconds)
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> > >> >> clickhouse
> > >> >> >> so
> > >> >> >> >>>>>> that I
> > >> >> >> >>>>>> >> can defend my demonstration.
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> > >> xxyu@apache.org
> > >> >> >
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >> > 1. "In this important scenario of realtime
> analytics,
> > >> the
> > >> >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> > that
> > >> >> >> >>>>>> >> > kylin has lag time due to model update of new
> segment
> > >> >> build,
> > >> >> >> is
> > >> >> >> >>>>>> that
> > >> >> >> >>>>>> >> > correct?"
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > You are correct.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> > work-around
> > >> of
> > >> >> >> >>>>>> combination
> > >> >> >> >>>>>> >> of
> > >> >> >> >>>>>> >> > ... "
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding
> is
> > >> >> >> completed
> > >> >> >> >>>>>> but not
> > >> >> >> >>>>>> >> > released),
> > >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that
> is
> > >> my
> > >> >> >> >>>>>> estimation
> > >> >> >> >>>>>> >> but I
> > >> >> >> >>>>>> >> > am
> > >> >> >> >>>>>> >> > quite certain about it).
> > >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job
> and
> > >> do
> > >> >> >> >>>>>> micro-batch
> > >> >> >> >>>>>> >> > aggregation and persistence periodically. The price
> is
> > >> that
> > >> >> >> you
> > >> >> >> >>>>>> need to
> > >> >> >> >>>>>> >> run
> > >> >> >> >>>>>> >> > and monitor a long-running
> > >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so
> you
> > >> need
> > >> >> >> >>>>>> knowledge of
> > >> >> >> >>>>>> >> > it.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> > >> >> customers
> > >> >> >> >>>>>> >> > can tolerate?
> > >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> > most
> > >> >> >> cases.
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > ------------------------
> > >> >> >> >>>>>> >> > With warm regard
> > >> >> >> >>>>>> >> > Xiaoxiang Yu
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> wrote:
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >> > > Druid is better in
> > >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > ==========================
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> > the
> > >> >> reason
> > >> >> >> >>>>>> here is
> > >> >> >> >>>>>> >> that
> > >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> > segment
> > >> >> build,
> > >> >> >> >>>>>> is that
> > >> >> >> >>>>>> >> > > correct?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > If that is true, then can you suggest a
> work-around
> > of
> > >> >> >> >>>>>> combination of
> > >> >> >> >>>>>> >> :
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> > >> provide
> > >> >> >> >>>>>> >> > > realtime capability ?
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> > >> update)
> > >> >> and
> > >> >> >> >>>>>> >> integrate it
> > >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> > >> >> >> xxyu@apache.org>
> > >> >> >> >>>>>> wrote:
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> > don't
> > >> >> know
> > >> >> >> too
> > >> >> >> >>>>>> much
> > >> >> >> >>>>>> >> about
> > >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> > >> features
> > >> >> >> that I
> > >> >> >> >>>>>> know
> > >> >> >> >>>>>> >> are :
> > >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> > Druid
> > >> >> other
> > >> >> >> >>>>>> than Kylin
> > >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare
> the
> > >> >> Druid
> > >> >> >> >>>>>> which I
> > >> >> >> >>>>>> >> used
> > >> >> >> >>>>>> >> > two
> > >> >> >> >>>>>> >> > > > years ago):
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> > >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test
> result,
> > I
> > >> >> think
> > >> >> >> >>>>>> Druid had
> > >> >> >> >>>>>> >> > > better
> > >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> > >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want
> to
> > >> use
> > >> >> the
> > >> >> >> >>>>>> >> K8S/public
> > >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> > >> Kylin
> > >> >> >> could
> > >> >> >> >>>>>> be
> > >> >> >> >>>>>> >> better,
> > >> >> >> >>>>>> >> > > > like:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> > Kylin
> > >> can
> > >> >> >> have
> > >> >> >> >>>>>> a more
> > >> >> >> >>>>>> >> > > > exact-match/fine-grained
> > >> >> >> >>>>>> >> > > >   Index for queries containing different `Group
> By
> > >> >> >> >>>>>> dimensions`.
> > >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> > >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the
> moment)
> > >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> > not
> > >> >> show
> > >> >> >> it
> > >> >> >> >>>>>> supports
> > >> >> >> >>>>>> >> > ODBC
> > >> >> >> >>>>>> >> > > > well)
> > >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> > >> Druid.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say
> about
> > >> it.
> > >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> > >> >> opinion.
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > ------------------------
> > >> >> >> >>>>>> >> > > > With warm regard
> > >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> > >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> > >> >> >> >>>>>> >> > > wrote:
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> > >> >> >> >>>>>> >> > > >> Sirs/Madams,
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> May I post my boss's question:
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> > >> Kylin
> > >> >> >> >>>>>> compared to
> > >> >> >> >>>>>> >> > Pinot
> > >> >> >> >>>>>> >> > > >> and
> > >> >> >> >>>>>> >> > > >> Druid?
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Please kindly let me know
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> > >> >> >> >>>>>> >> > > >>
> > >> >> >> >>>>>> >> > > >
> > >> >> >> >>>>>> >> > >
> > >> >> >> >>>>>> >> >
> > >> >> >> >>>>>> >>
> > >> >> >> >>>>>> >
> > >> >> >> >>>>>>
> > >> >> >> >>>>>
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
1. JDBC source is a feature which in development, it will be supported
later.

2. Kylin supports kerberos now, I will write a doc as soon as possible.
(I will let you know.)

3. I think ranger and Kerberos are not doing the same things, one for
authentication, one for authorization. So they cannot replace each other.
Ranger can integrate with Kerberos, please check ranger's website for
information.

------------------------
With warm regard
Xiaoxiang Yu



On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang for your reply
>
> ————————————-
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> ————————————-
> Yes: please answer to help me clear this headache:
>
> 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> not then do we have any work around?
>
> 2. My team is using kerberos for authentication, do you have any
> document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x
>
> 3. Should we use apache ranger instead of kerberos for authentication and
> for security purposes?
>
> Thank you again
>
> On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > I guess the release date should be 2024/01 .
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you very much xiaoxiang, I did the presentation this morning
> already
> >> so there is no time for you to comment. Next time I will send you in
> >> advance. The meeting result was that we will implement both druid and
> >> kylin
> >> in the next couple of projects because of its realtime feature. Hope
> that
> >> kylin will have same feature soon.
> >>
> >> May I ask when will you release kylin 5.0?
> >>
> >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > Since 2018 there are a lot of new features and code refactor.
> >> > If you like, you can share your ppt to me privately, maybe I can
> >> > give some comments.
> >> >
> >> > Here is the reference of advantages of Kylin since 2018:
> >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > -
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> >
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> >> Druid in
> >> >> my team.
> >> >>
> >> >> I found this article and would like you to update me the advantages
> of
> >> >> Kylin since 2018 until now (especially with version 5 to be released)
> >> >>
> >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> >> <
> >> >>
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >> >
> >> >>
> >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> >> >>
> >> >> > Thank you very much for your prompt response, I still have several
> >> >> > questions to seek for your help later.
> >> >> >
> >> >> > Best regards and have a good day
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >> >
> >> >> >> Done. Github branch changed to kylin5.
> >> >> >>
> >> >> >> ------------------------
> >> >> >> With warm regard
> >> >> >> Xiaoxiang Yu
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >>
> >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> >> > ------------------------
> >> >> >> > With warm regard
> >> >> >> > Xiaoxiang Yu
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> your
> >> >> >> default
> >> >> >> >> branch. In case people are impressed by the numbers then I hope
> >> to
> >> >> turn
> >> >> >> >> this situation to reverse direction.
> >> >> >> >>
> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> the
> >> >> active
> >> >> >> >>> branch is kylin5.
> >> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >> >>>
> >> >> >> >>> ------------------------
> >> >> >> >>> With warm regard
> >> >> >> >>> Xiaoxiang Yu
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> <na...@vnpay.vn.invalid>
> >> >> >> >>> wrote:
> >> >> >> >>>
> >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >> >> >>>>
> >> >> >> >>>> Can you see the atttached photo
> >> >> >> >>>>
> >> >> >> >>>> My boss asked that why druid commit code regularly but kylin
> >> had
> >> >> not
> >> >> >> >>>> been committed since July
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >> >>>>
> >> >> >> >>>>> I think so.
> >> >> >> >>>>>
> >> >> >> >>>>> Response time is not the only factor to make a decision.
> Kylin
> >> >> could
> >> >> >> >>>>> be cheaper
> >> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
> >> Kylin
> >> >> >> can
> >> >> >> >>>>> guarantee
> >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> ad
> >> hoc
> >> >> >> >>>>> query scenario.
> >> >> >> >>>>>
> >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> >> provide
> >> >> >> >>>>> unified data analytics services for their customers.
> >> >> >> >>>>>
> >> >> >> >>>>> ------------------------
> >> >> >> >>>>> With warm regard
> >> >> >> >>>>> Xiaoxiang Yu
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> >> <namdd@vnpay.vn.invalid
> >> >> >
> >> >> >> >>>>> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> >> >> >>>>>>
> >> >> >> >>>>>> In case my client uses cloud computing service like gcp or
> >> aws,
> >> >> >> which
> >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> clickhouse
> >> >> >> (incase
> >> >> >> >>>>>> of
> >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> >> done
> >> >> once
> >> >> >> >>>>>> and
> >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> cloud
> >> >> >> >>>>>> computation,
> >> >> >> >>>>>> is that true)?
> >> >> >> >>>>>>
> >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> >> >> wrote:
> >> >> >> >>>>>>
> >> >> >> >>>>>> > Following text is part of an article(
> >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>>
> >> >> >>
> >> >>
> >>
> ===============================================================================
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> modes
> >> >> >> because
> >> >> >> >>>>>> of its
> >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> and
> >> >> where
> >> >> >> >>>>>> condition
> >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> data
> >> >> >> volume
> >> >> >> >>>>>> is, the
> >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> >> particular,
> >> >> >> >>>>>> Kylin is
> >> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> >> >> (count
> >> >> >> >>>>>> distinct),
> >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages
> in
> >> >> >> >>>>>> de-weighting
> >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> are
> >> >> >> >>>>>> especially
> >> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such
> >> as
> >> >> >> >>>>>> Dashboard, all
> >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> statistics,
> >> >> and
> >> >> >> user
> >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> use
> >> >> Kylin
> >> >> >> >>>>>> to build
> >> >> >> >>>>>> > their data service platforms, providing millions to tens
> of
> >> >> >> >>>>>> millions of
> >> >> >> >>>>>> > queries per day, and most of the queries can be completed
> >> >> within
> >> >> >> 2
> >> >> >> >>>>>> - 3
> >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> >> >> >> concurrency
> >> >> >> >>>>>> > scenario.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> >> >> computing
> >> >> >> >>>>>> power and
> >> >> >> >>>>>> > is more suitable when the query request is more flexible,
> >> or
> >> >> when
> >> >> >> >>>>>> there is
> >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> Scenarios
> >> >> >> >>>>>> include: very
> >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> combined
> >> >> with
> >> >> >> the
> >> >> >> >>>>>> user
> >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> >> complex
> >> >> >> >>>>>> on-the-spot
> >> >> >> >>>>>> > query and so on. If the amount of data and access is
> large,
> >> >> you
> >> >> >> >>>>>> need to
> >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> higher
> >> >> >> >>>>>> challenge for
> >> >> >> >>>>>> > operation and maintenance.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> >> more
> >> >> >> >>>>>> > resource-efficient to use now-computing. Since the number
> >> of
> >> >> >> >>>>>> queries is
> >> >> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >> >> >>>>>> resources, it is
> >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> fixed
> >> >> >> pattern
> >> >> >> >>>>>> and the
> >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> >> because
> >> >> the
> >> >> >> >>>>>> query
> >> >> >> >>>>>> > volume is large, and by using large computational
> >> resources to
> >> >> >> save
> >> >> >> >>>>>> the
> >> >> >> >>>>>> > results, the upfront computational cost can be amortized
> >> over
> >> >> >> each
> >> >> >> >>>>>> query,
> >> >> >> >>>>>> > so it is the most economical.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > ------------------------
> >> >> >> >>>>>> > With warm regard
> >> >> >> >>>>>> > Xiaoxiang Yu
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> >> <namdd@vnpay.vn.invalid
> >> >> >> >
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> >> feature.
> >> >> >> >>>>>> That's
> >> >> >> >>>>>> >> great.
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> >> >> clickhouse
> >> >> >> >>>>>> offered
> >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> millisecond
> >> >> which
> >> >> >> is
> >> >> >> >>>>>> faster
> >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> >> >> billion
> >> >> >> >>>>>> rows in
> >> >> >> >>>>>> >> 2.9
> >> >> >> >>>>>> >> seconds)
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> >> >> clickhouse
> >> >> >> so
> >> >> >> >>>>>> that I
> >> >> >> >>>>>> >> can defend my demonstration.
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> >> xxyu@apache.org
> >> >> >
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics,
> >> the
> >> >> >> reason
> >> >> >> >>>>>> here is
> >> >> >> >>>>>> >> > that
> >> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
> >> >> build,
> >> >> >> is
> >> >> >> >>>>>> that
> >> >> >> >>>>>> >> > correct?"
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > You are correct.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> work-around
> >> of
> >> >> >> >>>>>> combination
> >> >> >> >>>>>> >> of
> >> >> >> >>>>>> >> > ... "
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> >> >> completed
> >> >> >> >>>>>> but not
> >> >> >> >>>>>> >> > released),
> >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is
> >> my
> >> >> >> >>>>>> estimation
> >> >> >> >>>>>> >> but I
> >> >> >> >>>>>> >> > am
> >> >> >> >>>>>> >> > quite certain about it).
> >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and
> >> do
> >> >> >> >>>>>> micro-batch
> >> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
> >> that
> >> >> >> you
> >> >> >> >>>>>> need to
> >> >> >> >>>>>> >> run
> >> >> >> >>>>>> >> > and monitor a long-running
> >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
> >> need
> >> >> >> >>>>>> knowledge of
> >> >> >> >>>>>> >> > it.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> >> >> customers
> >> >> >> >>>>>> >> > can tolerate?
> >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> most
> >> >> >> cases.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > ------------------------
> >> >> >> >>>>>> >> > With warm regard
> >> >> >> >>>>>> >> > Xiaoxiang Yu
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >> >>>>>> >> wrote:
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > > Druid is better in
> >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > ==========================
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> the
> >> >> reason
> >> >> >> >>>>>> here is
> >> >> >> >>>>>> >> that
> >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> segment
> >> >> build,
> >> >> >> >>>>>> is that
> >> >> >> >>>>>> >> > > correct?
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around
> of
> >> >> >> >>>>>> combination of
> >> >> >> >>>>>> >> :
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> >> provide
> >> >> >> >>>>>> >> > > realtime capability ?
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> >> update)
> >> >> and
> >> >> >> >>>>>> >> integrate it
> >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> >> >> xxyu@apache.org>
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> don't
> >> >> know
> >> >> >> too
> >> >> >> >>>>>> much
> >> >> >> >>>>>> >> about
> >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> >> features
> >> >> >> that I
> >> >> >> >>>>>> know
> >> >> >> >>>>>> >> are :
> >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> Druid
> >> >> other
> >> >> >> >>>>>> than Kylin
> >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
> >> >> Druid
> >> >> >> >>>>>> which I
> >> >> >> >>>>>> >> used
> >> >> >> >>>>>> >> > two
> >> >> >> >>>>>> >> > > > years ago):
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result,
> I
> >> >> think
> >> >> >> >>>>>> Druid had
> >> >> >> >>>>>> >> > > better
> >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
> >> use
> >> >> the
> >> >> >> >>>>>> >> K8S/public
> >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> >> Kylin
> >> >> >> could
> >> >> >> >>>>>> be
> >> >> >> >>>>>> >> better,
> >> >> >> >>>>>> >> > > > like:
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> Kylin
> >> can
> >> >> >> have
> >> >> >> >>>>>> a more
> >> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >> >> >>>>>> dimensions`.
> >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> not
> >> >> show
> >> >> >> it
> >> >> >> >>>>>> supports
> >> >> >> >>>>>> >> > ODBC
> >> >> >> >>>>>> >> > > > well)
> >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> >> Druid.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
> >> it.
> >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> >> >> opinion.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > ------------------------
> >> >> >> >>>>>> >> > > > With warm regard
> >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >> >>>>>> >> > > wrote:
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> >> Kylin
> >> >> >> >>>>>> compared to
> >> >> >> >>>>>> >> > Pinot
> >> >> >> >>>>>> >> > > >> and
> >> >> >> >>>>>> >> > > >> Druid?
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> Please kindly let me know
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >
> >> >> >> >>>>>>
> >> >> >> >>>>>
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
1. JDBC source is a feature which in development, it will be supported
later.

2. Kylin supports kerberos now, I will write a doc as soon as possible.
(I will let you know.)

3. I think ranger and Kerberos are not doing the same things, one for
authentication, one for authorization. So they cannot replace each other.
Ranger can integrate with Kerberos, please check ranger's website for
information.

------------------------
With warm regard
Xiaoxiang Yu



On Sat, Dec 9, 2023 at 8:01 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang for your reply
>
> ————————————-
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> ————————————-
> Yes: please answer to help me clear this headache:
>
> 1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
> not then do we have any work around?
>
> 2. My team is using kerberos for authentication, do you have any
> document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x
>
> 3. Should we use apache ranger instead of kerberos for authentication and
> for security purposes?
>
> Thank you again
>
> On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > I guess the release date should be 2024/01 .
> > Do you have any suggestions/wishes for kylin 5(except real-time feature)?
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you very much xiaoxiang, I did the presentation this morning
> already
> >> so there is no time for you to comment. Next time I will send you in
> >> advance. The meeting result was that we will implement both druid and
> >> kylin
> >> in the next couple of projects because of its realtime feature. Hope
> that
> >> kylin will have same feature soon.
> >>
> >> May I ask when will you release kylin 5.0?
> >>
> >> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > Since 2018 there are a lot of new features and code refactor.
> >> > If you like, you can share your ppt to me privately, maybe I can
> >> > give some comments.
> >> >
> >> > Here is the reference of advantages of Kylin since 2018:
> >> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> >> > -
> >> >
> >>
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> >> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >> >
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
> >> Druid in
> >> >> my team.
> >> >>
> >> >> I found this article and would like you to update me the advantages
> of
> >> >> Kylin since 2018 until now (especially with version 5 to be released)
> >> >>
> >> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> >> <
> >> >>
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >> >
> >> >>
> >> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> >> >>
> >> >> > Thank you very much for your prompt response, I still have several
> >> >> > questions to seek for your help later.
> >> >> >
> >> >> > Best regards and have a good day
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >> >
> >> >> >> Done. Github branch changed to kylin5.
> >> >> >>
> >> >> >> ------------------------
> >> >> >> With warm regard
> >> >> >> Xiaoxiang Yu
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >>
> >> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> >> > ------------------------
> >> >> >> > With warm regard
> >> >> >> > Xiaoxiang Yu
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >> Thank you Xiaoxiang, please update me when you have changed
> your
> >> >> >> default
> >> >> >> >> branch. In case people are impressed by the numbers then I hope
> >> to
> >> >> turn
> >> >> >> >> this situation to reverse direction.
> >> >> >> >>
> >> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
> >> >> wrote:
> >> >> >> >>
> >> >> >> >>> The default branch is for 4.X which is a maintained branch,
> the
> >> >> active
> >> >> >> >>> branch is kylin5.
> >> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >> >>>
> >> >> >> >>> ------------------------
> >> >> >> >>> With warm regard
> >> >> >> >>> Xiaoxiang Yu
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
> >> <na...@vnpay.vn.invalid>
> >> >> >> >>> wrote:
> >> >> >> >>>
> >> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >> >> >>>>
> >> >> >> >>>> Can you see the atttached photo
> >> >> >> >>>>
> >> >> >> >>>> My boss asked that why druid commit code regularly but kylin
> >> had
> >> >> not
> >> >> >> >>>> been committed since July
> >> >> >> >>>>
> >> >> >> >>>>
> >> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >> >>>>
> >> >> >> >>>>> I think so.
> >> >> >> >>>>>
> >> >> >> >>>>> Response time is not the only factor to make a decision.
> Kylin
> >> >> could
> >> >> >> >>>>> be cheaper
> >> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
> >> Kylin
> >> >> >> can
> >> >> >> >>>>> guarantee
> >> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an
> ad
> >> hoc
> >> >> >> >>>>> query scenario.
> >> >> >> >>>>>
> >> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> >> provide
> >> >> >> >>>>> unified data analytics services for their customers.
> >> >> >> >>>>>
> >> >> >> >>>>> ------------------------
> >> >> >> >>>>> With warm regard
> >> >> >> >>>>> Xiaoxiang Yu
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>>
> >> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> >> <namdd@vnpay.vn.invalid
> >> >> >
> >> >> >> >>>>> wrote:
> >> >> >> >>>>>
> >> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> >> >> >>>>>>
> >> >> >> >>>>>> In case my client uses cloud computing service like gcp or
> >> aws,
> >> >> >> which
> >> >> >> >>>>>> will cost more: precalculation feature of kylin or
> clickhouse
> >> >> >> (incase
> >> >> >> >>>>>> of
> >> >> >> >>>>>> kylin, I have a thought that the query execution has been
> >> done
> >> >> once
> >> >> >> >>>>>> and
> >> >> >> >>>>>> stored in cube to be used many times so kylin uses less
> cloud
> >> >> >> >>>>>> computation,
> >> >> >> >>>>>> is that true)?
> >> >> >> >>>>>>
> >> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> >> >> wrote:
> >> >> >> >>>>>>
> >> >> >> >>>>>> > Following text is part of an article(
> >> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>>
> >> >> >>
> >> >>
> >>
> ===============================================================================
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed
> modes
> >> >> >> because
> >> >> >> >>>>>> of its
> >> >> >> >>>>>> > pre-calculated technology, for example, join, group by,
> and
> >> >> where
> >> >> >> >>>>>> condition
> >> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the
> data
> >> >> >> volume
> >> >> >> >>>>>> is, the
> >> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> >> particular,
> >> >> >> >>>>>> Kylin is
> >> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> >> >> (count
> >> >> >> >>>>>> distinct),
> >> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages
> in
> >> >> >> >>>>>> de-weighting
> >> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios
> are
> >> >> >> >>>>>> especially
> >> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such
> >> as
> >> >> >> >>>>>> Dashboard, all
> >> >> >> >>>>>> > kinds of reports, large-screen display, traffic
> statistics,
> >> >> and
> >> >> >> user
> >> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc.
> use
> >> >> Kylin
> >> >> >> >>>>>> to build
> >> >> >> >>>>>> > their data service platforms, providing millions to tens
> of
> >> >> >> >>>>>> millions of
> >> >> >> >>>>>> > queries per day, and most of the queries can be completed
> >> >> within
> >> >> >> 2
> >> >> >> >>>>>> - 3
> >> >> >> >>>>>> > seconds. There is no better alternative for such a high
> >> >> >> concurrency
> >> >> >> >>>>>> > scenario.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> >> >> computing
> >> >> >> >>>>>> power and
> >> >> >> >>>>>> > is more suitable when the query request is more flexible,
> >> or
> >> >> when
> >> >> >> >>>>>> there is
> >> >> >> >>>>>> > a need for detailed queries with low concurrency.
> Scenarios
> >> >> >> >>>>>> include: very
> >> >> >> >>>>>> > many columns and where conditions are arbitrarily
> combined
> >> >> with
> >> >> >> the
> >> >> >> >>>>>> user
> >> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> >> complex
> >> >> >> >>>>>> on-the-spot
> >> >> >> >>>>>> > query and so on. If the amount of data and access is
> large,
> >> >> you
> >> >> >> >>>>>> need to
> >> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a
> higher
> >> >> >> >>>>>> challenge for
> >> >> >> >>>>>> > operation and maintenance.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
> >> more
> >> >> >> >>>>>> > resource-efficient to use now-computing. Since the number
> >> of
> >> >> >> >>>>>> queries is
> >> >> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >> >> >>>>>> resources, it is
> >> >> >> >>>>>> > still cost-effective overall. If some queries have a
> fixed
> >> >> >> pattern
> >> >> >> >>>>>> and the
> >> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> >> because
> >> >> the
> >> >> >> >>>>>> query
> >> >> >> >>>>>> > volume is large, and by using large computational
> >> resources to
> >> >> >> save
> >> >> >> >>>>>> the
> >> >> >> >>>>>> > results, the upfront computational cost can be amortized
> >> over
> >> >> >> each
> >> >> >> >>>>>> query,
> >> >> >> >>>>>> > so it is the most economical.
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > ------------------------
> >> >> >> >>>>>> > With warm regard
> >> >> >> >>>>>> > Xiaoxiang Yu
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >
> >> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> >> <namdd@vnpay.vn.invalid
> >> >> >> >
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >
> >> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> >> feature.
> >> >> >> >>>>>> That's
> >> >> >> >>>>>> >> great.
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> >> >> clickhouse
> >> >> >> >>>>>> offered
> >> >> >> >>>>>> >> us the speed of calculating 8 billion rows in
> millisecond
> >> >> which
> >> >> >> is
> >> >> >> >>>>>> faster
> >> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> >> >> billion
> >> >> >> >>>>>> rows in
> >> >> >> >>>>>> >> 2.9
> >> >> >> >>>>>> >> seconds)
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> >> >> clickhouse
> >> >> >> so
> >> >> >> >>>>>> that I
> >> >> >> >>>>>> >> can defend my demonstration.
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> >> xxyu@apache.org
> >> >> >
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics,
> >> the
> >> >> >> reason
> >> >> >> >>>>>> here is
> >> >> >> >>>>>> >> > that
> >> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
> >> >> build,
> >> >> >> is
> >> >> >> >>>>>> that
> >> >> >> >>>>>> >> > correct?"
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > You are correct.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a
> work-around
> >> of
> >> >> >> >>>>>> combination
> >> >> >> >>>>>> >> of
> >> >> >> >>>>>> >> > ... "
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> >> >> completed
> >> >> >> >>>>>> but not
> >> >> >> >>>>>> >> > released),
> >> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is
> >> my
> >> >> >> >>>>>> estimation
> >> >> >> >>>>>> >> but I
> >> >> >> >>>>>> >> > am
> >> >> >> >>>>>> >> > quite certain about it).
> >> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and
> >> do
> >> >> >> >>>>>> micro-batch
> >> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
> >> that
> >> >> >> you
> >> >> >> >>>>>> need to
> >> >> >> >>>>>> >> run
> >> >> >> >>>>>> >> > and monitor a long-running
> >> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
> >> need
> >> >> >> >>>>>> knowledge of
> >> >> >> >>>>>> >> > it.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> >> >> customers
> >> >> >> >>>>>> >> > can tolerate?
> >> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for
> most
> >> >> >> cases.
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > ------------------------
> >> >> >> >>>>>> >> > With warm regard
> >> >> >> >>>>>> >> > Xiaoxiang Yu
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >> >>>>>> >> wrote:
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >> > > Druid is better in
> >> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > ==========================
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > In this important scenario of realtime alalytics,
> the
> >> >> reason
> >> >> >> >>>>>> here is
> >> >> >> >>>>>> >> that
> >> >> >> >>>>>> >> > > kylin has lag time due to model update of new
> segment
> >> >> build,
> >> >> >> >>>>>> is that
> >> >> >> >>>>>> >> > > correct?
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around
> of
> >> >> >> >>>>>> combination of
> >> >> >> >>>>>> >> :
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> >> provide
> >> >> >> >>>>>> >> > > realtime capability ?
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> >> update)
> >> >> and
> >> >> >> >>>>>> >> integrate it
> >> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> >> >> xxyu@apache.org>
> >> >> >> >>>>>> wrote:
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I
> don't
> >> >> know
> >> >> >> too
> >> >> >> >>>>>> much
> >> >> >> >>>>>> >> about
> >> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
> >> features
> >> >> >> that I
> >> >> >> >>>>>> know
> >> >> >> >>>>>> >> are :
> >> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > Here are some cases you should consider using
> Druid
> >> >> other
> >> >> >> >>>>>> than Kylin
> >> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
> >> >> Druid
> >> >> >> >>>>>> which I
> >> >> >> >>>>>> >> used
> >> >> >> >>>>>> >> > two
> >> >> >> >>>>>> >> > > > years ago):
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result,
> I
> >> >> think
> >> >> >> >>>>>> Druid had
> >> >> >> >>>>>> >> > > better
> >> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
> >> use
> >> >> the
> >> >> >> >>>>>> >> K8S/public
> >> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> >> Kylin
> >> >> >> could
> >> >> >> >>>>>> be
> >> >> >> >>>>>> >> better,
> >> >> >> >>>>>> >> > > > like:
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > - Better performance for complex/big queries.
> Kylin
> >> can
> >> >> >> have
> >> >> >> >>>>>> a more
> >> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >> >> >>>>>> dimensions`.
> >> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did
> not
> >> >> show
> >> >> >> it
> >> >> >> >>>>>> supports
> >> >> >> >>>>>> >> > ODBC
> >> >> >> >>>>>> >> > > > well)
> >> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> >> Druid.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
> >> it.
> >> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> >> >> opinion.
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > ------------------------
> >> >> >> >>>>>> >> > > > With warm regard
> >> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >> >>>>>> >> > > wrote:
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> >> Kylin
> >> >> >> >>>>>> compared to
> >> >> >> >>>>>> >> > Pinot
> >> >> >> >>>>>> >> > > >> and
> >> >> >> >>>>>> >> > > >> Druid?
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> Please kindly let me know
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >> >> >>>>>> >> > > >>
> >> >> >> >>>>>> >> > > >
> >> >> >> >>>>>> >> > >
> >> >> >> >>>>>> >> >
> >> >> >> >>>>>> >>
> >> >> >> >>>>>> >
> >> >> >> >>>>>>
> >> >> >> >>>>>
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you Xiaoxiang for your reply

————————————-
Do you have any suggestions/wishes for kylin 5(except real-time feature)?
————————————-
Yes: please answer to help me clear this headache:

1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
not then do we have any work around?

2. My team is using kerberos for authentication, do you have any
document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x

3. Should we use apache ranger instead of kerberos for authentication and
for security purposes?

Thank you again

On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:

> I guess the release date should be 2024/01 .
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you very much xiaoxiang, I did the presentation this morning already
>> so there is no time for you to comment. Next time I will send you in
>> advance. The meeting result was that we will implement both druid and
>> kylin
>> in the next couple of projects because of its realtime feature. Hope that
>> kylin will have same feature soon.
>>
>> May I ask when will you release kylin 5.0?
>>
>> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > Since 2018 there are a lot of new features and code refactor.
>> > If you like, you can share your ppt to me privately, maybe I can
>> > give some comments.
>> >
>> > Here is the reference of advantages of Kylin since 2018:
>> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > -
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> Druid in
>> >> my team.
>> >>
>> >> I found this article and would like you to update me the advantages of
>> >> Kylin since 2018 until now (especially with version 5 to be released)
>> >>
>> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> >> <
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> >
>> >>
>> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>> >>
>> >> > Thank you very much for your prompt response, I still have several
>> >> > questions to seek for your help later.
>> >> >
>> >> > Best regards and have a good day
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >
>> >> >> Done. Github branch changed to kylin5.
>> >> >>
>> >> >> ------------------------
>> >> >> With warm regard
>> >> >> Xiaoxiang Yu
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >>
>> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> >> > ------------------------
>> >> >> > With warm regard
>> >> >> > Xiaoxiang Yu
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> >> default
>> >> >> >> branch. In case people are impressed by the numbers then I hope
>> to
>> >> turn
>> >> >> >> this situation to reverse direction.
>> >> >> >>
>> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
>> >> wrote:
>> >> >> >>
>> >> >> >>> The default branch is for 4.X which is a maintained branch, the
>> >> active
>> >> >> >>> branch is kylin5.
>> >> >> >>> I will change the default branch to kylin5 later.
>> >> >> >>>
>> >> >> >>> ------------------------
>> >> >> >>> With warm regard
>> >> >> >>> Xiaoxiang Yu
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> >> >>>>
>> >> >> >>>> Can you see the atttached photo
>> >> >> >>>>
>> >> >> >>>> My boss asked that why druid commit code regularly but kylin
>> had
>> >> not
>> >> >> >>>> been committed since July
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >> >>>>
>> >> >> >>>>> I think so.
>> >> >> >>>>>
>> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
>> >> could
>> >> >> >>>>> be cheaper
>> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
>> Kylin
>> >> >> can
>> >> >> >>>>> guarantee
>> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
>> hoc
>> >> >> >>>>> query scenario.
>> >> >> >>>>>
>> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
>> provide
>> >> >> >>>>> unified data analytics services for their customers.
>> >> >> >>>>>
>> >> >> >>>>> ------------------------
>> >> >> >>>>> With warm regard
>> >> >> >>>>> Xiaoxiang Yu
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> <namdd@vnpay.vn.invalid
>> >> >
>> >> >> >>>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>>> Hi Xiaoxiang, thank you
>> >> >> >>>>>>
>> >> >> >>>>>> In case my client uses cloud computing service like gcp or
>> aws,
>> >> >> which
>> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> >> >> (incase
>> >> >> >>>>>> of
>> >> >> >>>>>> kylin, I have a thought that the query execution has been
>> done
>> >> once
>> >> >> >>>>>> and
>> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >> >> >>>>>> computation,
>> >> >> >>>>>> is that true)?
>> >> >> >>>>>>
>> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> >> wrote:
>> >> >> >>>>>>
>> >> >> >>>>>> > Following text is part of an article(
>> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >>
>> >>
>> ===============================================================================
>> >> >> >>>>>> >
>> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> >> >> because
>> >> >> >>>>>> of its
>> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
>> >> where
>> >> >> >>>>>> condition
>> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> >> >> volume
>> >> >> >>>>>> is, the
>> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> particular,
>> >> >> >>>>>> Kylin is
>> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
>> >> (count
>> >> >> >>>>>> distinct),
>> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> >> >>>>>> de-weighting
>> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >> >> >>>>>> especially
>> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such
>> as
>> >> >> >>>>>> Dashboard, all
>> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
>> >> and
>> >> >> user
>> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
>> >> Kylin
>> >> >> >>>>>> to build
>> >> >> >>>>>> > their data service platforms, providing millions to tens of
>> >> >> >>>>>> millions of
>> >> >> >>>>>> > queries per day, and most of the queries can be completed
>> >> within
>> >> >> 2
>> >> >> >>>>>> - 3
>> >> >> >>>>>> > seconds. There is no better alternative for such a high
>> >> >> concurrency
>> >> >> >>>>>> > scenario.
>> >> >> >>>>>> >
>> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> >> computing
>> >> >> >>>>>> power and
>> >> >> >>>>>> > is more suitable when the query request is more flexible,
>> or
>> >> when
>> >> >> >>>>>> there is
>> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >> >> >>>>>> include: very
>> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
>> >> with
>> >> >> the
>> >> >> >>>>>> user
>> >> >> >>>>>> > label filtering, not a large amount of concurrency of
>> complex
>> >> >> >>>>>> on-the-spot
>> >> >> >>>>>> > query and so on. If the amount of data and access is large,
>> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >> >> >>>>>> challenge for
>> >> >> >>>>>> > operation and maintenance.
>> >> >> >>>>>> >
>> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
>> more
>> >> >> >>>>>> > resource-efficient to use now-computing. Since the number
>> of
>> >> >> >>>>>> queries is
>> >> >> >>>>>> > small, even if each query consumes a lot of computational
>> >> >> >>>>>> resources, it is
>> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
>> >> >> pattern
>> >> >> >>>>>> and the
>> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
>> because
>> >> the
>> >> >> >>>>>> query
>> >> >> >>>>>> > volume is large, and by using large computational
>> resources to
>> >> >> save
>> >> >> >>>>>> the
>> >> >> >>>>>> > results, the upfront computational cost can be amortized
>> over
>> >> >> each
>> >> >> >>>>>> query,
>> >> >> >>>>>> > so it is the most economical.
>> >> >> >>>>>> >
>> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > ------------------------
>> >> >> >>>>>> > With warm regard
>> >> >> >>>>>> > Xiaoxiang Yu
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> >> <namdd@vnpay.vn.invalid
>> >> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >
>> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
>> feature.
>> >> >> >>>>>> That's
>> >> >> >>>>>> >> great.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> This morning there has been a new challenge to my team:
>> >> >> clickhouse
>> >> >> >>>>>> offered
>> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
>> >> which
>> >> >> is
>> >> >> >>>>>> faster
>> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
>> >> billion
>> >> >> >>>>>> rows in
>> >> >> >>>>>> >> 2.9
>> >> >> >>>>>> >> seconds)
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> >> clickhouse
>> >> >> so
>> >> >> >>>>>> that I
>> >> >> >>>>>> >> can defend my demonstration.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> xxyu@apache.org
>> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics,
>> the
>> >> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> > that
>> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> is
>> >> >> >>>>>> that
>> >> >> >>>>>> >> > correct?"
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > You are correct.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
>> of
>> >> >> >>>>>> combination
>> >> >> >>>>>> >> of
>> >> >> >>>>>> >> > ... "
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> >> >> completed
>> >> >> >>>>>> but not
>> >> >> >>>>>> >> > released),
>> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is
>> my
>> >> >> >>>>>> estimation
>> >> >> >>>>>> >> but I
>> >> >> >>>>>> >> > am
>> >> >> >>>>>> >> > quite certain about it).
>> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and
>> do
>> >> >> >>>>>> micro-batch
>> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
>> that
>> >> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> >> run
>> >> >> >>>>>> >> > and monitor a long-running
>> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
>> need
>> >> >> >>>>>> knowledge of
>> >> >> >>>>>> >> > it.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
>> >> customers
>> >> >> >>>>>> >> > can tolerate?
>> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> >> >> cases.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > ------------------------
>> >> >> >>>>>> >> > With warm regard
>> >> >> >>>>>> >> > Xiaoxiang Yu
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >> >>>>>> >> wrote:
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > > Druid is better in
>> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > ==========================
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
>> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> that
>> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> >>>>>> is that
>> >> >> >>>>>> >> > > correct?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >> >> >>>>>> combination of
>> >> >> >>>>>> >> :
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
>> provide
>> >> >> >>>>>> >> > > realtime capability ?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
>> update)
>> >> and
>> >> >> >>>>>> >> integrate it
>> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> >> xxyu@apache.org>
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
>> >> know
>> >> >> too
>> >> >> >>>>>> much
>> >> >> >>>>>> >> about
>> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> features
>> >> >> that I
>> >> >> >>>>>> know
>> >> >> >>>>>> >> are :
>> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
>> >> other
>> >> >> >>>>>> than Kylin
>> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
>> >> Druid
>> >> >> >>>>>> which I
>> >> >> >>>>>> >> used
>> >> >> >>>>>> >> > two
>> >> >> >>>>>> >> > > > years ago):
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
>> >> think
>> >> >> >>>>>> Druid had
>> >> >> >>>>>> >> > > better
>> >> >> >>>>>> >> > > > response time for small queries two years ago.)
>> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
>> use
>> >> the
>> >> >> >>>>>> >> K8S/public
>> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
>> Kylin
>> >> >> could
>> >> >> >>>>>> be
>> >> >> >>>>>> >> better,
>> >> >> >>>>>> >> > > > like:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
>> can
>> >> >> have
>> >> >> >>>>>> a more
>> >> >> >>>>>> >> > > > exact-match/fine-grained
>> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
>> >> >> >>>>>> dimensions`.
>> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
>> >> show
>> >> >> it
>> >> >> >>>>>> supports
>> >> >> >>>>>> >> > ODBC
>> >> >> >>>>>> >> > > > well)
>> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
>> Druid.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
>> it.
>> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> >> opinion.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > ------------------------
>> >> >> >>>>>> >> > > > With warm regard
>> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >> >>>>>> >> > > wrote:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> >> >>>>>> >> > > >> Sirs/Madams,
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> May I post my boss's question:
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
>> Kylin
>> >> >> >>>>>> compared to
>> >> >> >>>>>> >> > Pinot
>> >> >> >>>>>> >> > > >> and
>> >> >> >>>>>> >> > > >> Druid?
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Please kindly let me know
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >>
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you Xiaoxiang for your reply

————————————-
Do you have any suggestions/wishes for kylin 5(except real-time feature)?
————————————-
Yes: please answer to help me clear this headache:

1. Can Kylin access the existing star schema in Oracle datawarehouse ? If
not then do we have any work around?

2. My team is using kerberos for authentication, do you have any
document/casestudy about integrating kerberos with kylin 4.x and kylin 5.x

3. Should we use apache ranger instead of kerberos for authentication and
for security purposes?

Thank you again

On Thu, 7 Dec 2023 at 15:00 Xiaoxiang Yu <xx...@apache.org> wrote:

> I guess the release date should be 2024/01 .
> Do you have any suggestions/wishes for kylin 5(except real-time feature)?
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you very much xiaoxiang, I did the presentation this morning already
>> so there is no time for you to comment. Next time I will send you in
>> advance. The meeting result was that we will implement both druid and
>> kylin
>> in the next couple of projects because of its realtime feature. Hope that
>> kylin will have same feature soon.
>>
>> May I ask when will you release kylin 5.0?
>>
>> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > Since 2018 there are a lot of new features and code refactor.
>> > If you like, you can share your ppt to me privately, maybe I can
>> > give some comments.
>> >
>> > Here is the reference of advantages of Kylin since 2018:
>> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
>> > -
>> >
>> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
>> > - https://kylin.apache.org/5.0/docs/development/roadmap
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and
>> Druid in
>> >> my team.
>> >>
>> >> I found this article and would like you to update me the advantages of
>> >> Kylin since 2018 until now (especially with version 5 to be released)
>> >>
>> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> >> <
>> >>
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >> >
>> >>
>> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>> >>
>> >> > Thank you very much for your prompt response, I still have several
>> >> > questions to seek for your help later.
>> >> >
>> >> > Best regards and have a good day
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >
>> >> >> Done. Github branch changed to kylin5.
>> >> >>
>> >> >> ------------------------
>> >> >> With warm regard
>> >> >> Xiaoxiang Yu
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >>
>> >> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> >> > ------------------------
>> >> >> > With warm regard
>> >> >> > Xiaoxiang Yu
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> >> default
>> >> >> >> branch. In case people are impressed by the numbers then I hope
>> to
>> >> turn
>> >> >> >> this situation to reverse direction.
>> >> >> >>
>> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
>> >> wrote:
>> >> >> >>
>> >> >> >>> The default branch is for 4.X which is a maintained branch, the
>> >> active
>> >> >> >>> branch is kylin5.
>> >> >> >>> I will change the default branch to kylin5 later.
>> >> >> >>>
>> >> >> >>> ------------------------
>> >> >> >>> With warm regard
>> >> >> >>> Xiaoxiang Yu
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> >> >>> wrote:
>> >> >> >>>
>> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> >> >>>>
>> >> >> >>>> Can you see the atttached photo
>> >> >> >>>>
>> >> >> >>>> My boss asked that why druid commit code regularly but kylin
>> had
>> >> not
>> >> >> >>>> been committed since July
>> >> >> >>>>
>> >> >> >>>>
>> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >> >>>>
>> >> >> >>>>> I think so.
>> >> >> >>>>>
>> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
>> >> could
>> >> >> >>>>> be cheaper
>> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
>> Kylin
>> >> >> can
>> >> >> >>>>> guarantee
>> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
>> hoc
>> >> >> >>>>> query scenario.
>> >> >> >>>>>
>> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
>> provide
>> >> >> >>>>> unified data analytics services for their customers.
>> >> >> >>>>>
>> >> >> >>>>> ------------------------
>> >> >> >>>>> With warm regard
>> >> >> >>>>> Xiaoxiang Yu
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>>
>> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
>> <namdd@vnpay.vn.invalid
>> >> >
>> >> >> >>>>> wrote:
>> >> >> >>>>>
>> >> >> >>>>>> Hi Xiaoxiang, thank you
>> >> >> >>>>>>
>> >> >> >>>>>> In case my client uses cloud computing service like gcp or
>> aws,
>> >> >> which
>> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> >> >> (incase
>> >> >> >>>>>> of
>> >> >> >>>>>> kylin, I have a thought that the query execution has been
>> done
>> >> once
>> >> >> >>>>>> and
>> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >> >> >>>>>> computation,
>> >> >> >>>>>> is that true)?
>> >> >> >>>>>>
>> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> >> wrote:
>> >> >> >>>>>>
>> >> >> >>>>>> > Following text is part of an article(
>> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >>
>> >>
>> ===============================================================================
>> >> >> >>>>>> >
>> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> >> >> because
>> >> >> >>>>>> of its
>> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
>> >> where
>> >> >> >>>>>> condition
>> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> >> >> volume
>> >> >> >>>>>> is, the
>> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
>> particular,
>> >> >> >>>>>> Kylin is
>> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
>> >> (count
>> >> >> >>>>>> distinct),
>> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> >> >>>>>> de-weighting
>> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >> >> >>>>>> especially
>> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such
>> as
>> >> >> >>>>>> Dashboard, all
>> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
>> >> and
>> >> >> user
>> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
>> >> Kylin
>> >> >> >>>>>> to build
>> >> >> >>>>>> > their data service platforms, providing millions to tens of
>> >> >> >>>>>> millions of
>> >> >> >>>>>> > queries per day, and most of the queries can be completed
>> >> within
>> >> >> 2
>> >> >> >>>>>> - 3
>> >> >> >>>>>> > seconds. There is no better alternative for such a high
>> >> >> concurrency
>> >> >> >>>>>> > scenario.
>> >> >> >>>>>> >
>> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> >> computing
>> >> >> >>>>>> power and
>> >> >> >>>>>> > is more suitable when the query request is more flexible,
>> or
>> >> when
>> >> >> >>>>>> there is
>> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >> >> >>>>>> include: very
>> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
>> >> with
>> >> >> the
>> >> >> >>>>>> user
>> >> >> >>>>>> > label filtering, not a large amount of concurrency of
>> complex
>> >> >> >>>>>> on-the-spot
>> >> >> >>>>>> > query and so on. If the amount of data and access is large,
>> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >> >> >>>>>> challenge for
>> >> >> >>>>>> > operation and maintenance.
>> >> >> >>>>>> >
>> >> >> >>>>>> > If some queries are very flexible but infrequent, it is
>> more
>> >> >> >>>>>> > resource-efficient to use now-computing. Since the number
>> of
>> >> >> >>>>>> queries is
>> >> >> >>>>>> > small, even if each query consumes a lot of computational
>> >> >> >>>>>> resources, it is
>> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
>> >> >> pattern
>> >> >> >>>>>> and the
>> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
>> because
>> >> the
>> >> >> >>>>>> query
>> >> >> >>>>>> > volume is large, and by using large computational
>> resources to
>> >> >> save
>> >> >> >>>>>> the
>> >> >> >>>>>> > results, the upfront computational cost can be amortized
>> over
>> >> >> each
>> >> >> >>>>>> query,
>> >> >> >>>>>> > so it is the most economical.
>> >> >> >>>>>> >
>> >> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > ------------------------
>> >> >> >>>>>> > With warm regard
>> >> >> >>>>>> > Xiaoxiang Yu
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> >
>> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> >> <namdd@vnpay.vn.invalid
>> >> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >
>> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
>> feature.
>> >> >> >>>>>> That's
>> >> >> >>>>>> >> great.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> This morning there has been a new challenge to my team:
>> >> >> clickhouse
>> >> >> >>>>>> offered
>> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
>> >> which
>> >> >> is
>> >> >> >>>>>> faster
>> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
>> >> billion
>> >> >> >>>>>> rows in
>> >> >> >>>>>> >> 2.9
>> >> >> >>>>>> >> seconds)
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> >> clickhouse
>> >> >> so
>> >> >> >>>>>> that I
>> >> >> >>>>>> >> can defend my demonstration.
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
>> xxyu@apache.org
>> >> >
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >>
>> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics,
>> the
>> >> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> > that
>> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> is
>> >> >> >>>>>> that
>> >> >> >>>>>> >> > correct?"
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > You are correct.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
>> of
>> >> >> >>>>>> combination
>> >> >> >>>>>> >> of
>> >> >> >>>>>> >> > ... "
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> >> >> completed
>> >> >> >>>>>> but not
>> >> >> >>>>>> >> > released),
>> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is
>> my
>> >> >> >>>>>> estimation
>> >> >> >>>>>> >> but I
>> >> >> >>>>>> >> > am
>> >> >> >>>>>> >> > quite certain about it).
>> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and
>> do
>> >> >> >>>>>> micro-batch
>> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
>> that
>> >> >> you
>> >> >> >>>>>> need to
>> >> >> >>>>>> >> run
>> >> >> >>>>>> >> > and monitor a long-running
>> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
>> need
>> >> >> >>>>>> knowledge of
>> >> >> >>>>>> >> > it.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
>> >> customers
>> >> >> >>>>>> >> > can tolerate?
>> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> >> >> cases.
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > ------------------------
>> >> >> >>>>>> >> > With warm regard
>> >> >> >>>>>> >> > Xiaoxiang Yu
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >> >>>>>> >> wrote:
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >> > > Druid is better in
>> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > ==========================
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
>> >> reason
>> >> >> >>>>>> here is
>> >> >> >>>>>> >> that
>> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
>> >> build,
>> >> >> >>>>>> is that
>> >> >> >>>>>> >> > > correct?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >> >> >>>>>> combination of
>> >> >> >>>>>> >> :
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
>> provide
>> >> >> >>>>>> >> > > realtime capability ?
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
>> update)
>> >> and
>> >> >> >>>>>> >> integrate it
>> >> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> >> xxyu@apache.org>
>> >> >> >>>>>> wrote:
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
>> >> know
>> >> >> too
>> >> >> >>>>>> much
>> >> >> >>>>>> >> about
>> >> >> >>>>>> >> > > >  the change of Druid in these two years. New
>> features
>> >> >> that I
>> >> >> >>>>>> know
>> >> >> >>>>>> >> are :
>> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
>> >> other
>> >> >> >>>>>> than Kylin
>> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
>> >> Druid
>> >> >> >>>>>> which I
>> >> >> >>>>>> >> used
>> >> >> >>>>>> >> > two
>> >> >> >>>>>> >> > > > years ago):
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
>> >> think
>> >> >> >>>>>> Druid had
>> >> >> >>>>>> >> > > better
>> >> >> >>>>>> >> > > > response time for small queries two years ago.)
>> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
>> use
>> >> the
>> >> >> >>>>>> >> K8S/public
>> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
>> Kylin
>> >> >> could
>> >> >> >>>>>> be
>> >> >> >>>>>> >> better,
>> >> >> >>>>>> >> > > > like:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
>> can
>> >> >> have
>> >> >> >>>>>> a more
>> >> >> >>>>>> >> > > > exact-match/fine-grained
>> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
>> >> >> >>>>>> dimensions`.
>> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
>> >> show
>> >> >> it
>> >> >> >>>>>> supports
>> >> >> >>>>>> >> > ODBC
>> >> >> >>>>>> >> > > > well)
>> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
>> Druid.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
>> it.
>> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> >> opinion.
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > ------------------------
>> >> >> >>>>>> >> > > > With warm regard
>> >> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >> >>>>>> >> > > wrote:
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> >> >>>>>> >> > > >> Sirs/Madams,
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> May I post my boss's question:
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
>> Kylin
>> >> >> >>>>>> compared to
>> >> >> >>>>>> >> > Pinot
>> >> >> >>>>>> >> > > >> and
>> >> >> >>>>>> >> > > >> Druid?
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Please kindly let me know
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> >> >>>>>> >> > > >>
>> >> >> >>>>>> >> > > >
>> >> >> >>>>>> >> > >
>> >> >> >>>>>> >> >
>> >> >> >>>>>> >>
>> >> >> >>>>>> >
>> >> >> >>>>>>
>> >> >> >>>>>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I guess the release date should be 2024/01 .
Do you have any suggestions/wishes for kylin 5(except real-time feature)?

------------------------
With warm regard
Xiaoxiang Yu



On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you very much xiaoxiang, I did the presentation this morning already
> so there is no time for you to comment. Next time I will send you in
> advance. The meeting result was that we will implement both druid and kylin
> in the next couple of projects because of its realtime feature. Hope that
> kylin will have same feature soon.
>
> May I ask when will you release kylin 5.0?
>
> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > Since 2018 there are a lot of new features and code refactor.
> > If you like, you can share your ppt to me privately, maybe I can
> > give some comments.
> >
> > Here is the reference of advantages of Kylin since 2018:
> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > -
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid
> in
> >> my team.
> >>
> >> I found this article and would like you to update me the advantages of
> >> Kylin since 2018 until now (especially with version 5 to be released)
> >>
> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> <
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >
> >>
> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> >>
> >> > Thank you very much for your prompt response, I still have several
> >> > questions to seek for your help later.
> >> >
> >> > Best regards and have a good day
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >
> >> >> Done. Github branch changed to kylin5.
> >> >>
> >> >> ------------------------
> >> >> With warm regard
> >> >> Xiaoxiang Yu
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >>
> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> > ------------------------
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >> wrote:
> >> >> >
> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> >> default
> >> >> >> branch. In case people are impressed by the numbers then I hope to
> >> turn
> >> >> >> this situation to reverse direction.
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >>
> >> >> >>> The default branch is for 4.X which is a maintained branch, the
> >> active
> >> >> >>> branch is kylin5.
> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >>>
> >> >> >>> ------------------------
> >> >> >>> With warm regard
> >> >> >>> Xiaoxiang Yu
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >> >>>>
> >> >> >>>> Can you see the atttached photo
> >> >> >>>>
> >> >> >>>> My boss asked that why druid commit code regularly but kylin had
> >> not
> >> >> >>>> been committed since July
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >> >>>>
> >> >> >>>>> I think so.
> >> >> >>>>>
> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
> >> could
> >> >> >>>>> be cheaper
> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
> Kylin
> >> >> can
> >> >> >>>>> guarantee
> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
> hoc
> >> >> >>>>> query scenario.
> >> >> >>>>>
> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> provide
> >> >> >>>>> unified data analytics services for their customers.
> >> >> >>>>>
> >> >> >>>>> ------------------------
> >> >> >>>>> With warm regard
> >> >> >>>>> Xiaoxiang Yu
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >> >>>>> wrote:
> >> >> >>>>>
> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> >> >>>>>>
> >> >> >>>>>> In case my client uses cloud computing service like gcp or
> aws,
> >> >> which
> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> >> (incase
> >> >> >>>>>> of
> >> >> >>>>>> kylin, I have a thought that the query execution has been done
> >> once
> >> >> >>>>>> and
> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >> >>>>>> computation,
> >> >> >>>>>> is that true)?
> >> >> >>>>>>
> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> >> >> wrote:
> >> >> >>>>>>
> >> >> >>>>>> > Following text is part of an article(
> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >>
> >>
> ===============================================================================
> >> >> >>>>>> >
> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> >> because
> >> >> >>>>>> of its
> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> >> where
> >> >> >>>>>> condition
> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> >> volume
> >> >> >>>>>> is, the
> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> particular,
> >> >> >>>>>> Kylin is
> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> >> (count
> >> >> >>>>>> distinct),
> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >> >>>>>> de-weighting
> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >> >>>>>> especially
> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >> >>>>>> Dashboard, all
> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
> >> and
> >> >> user
> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> >> Kylin
> >> >> >>>>>> to build
> >> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >> >>>>>> millions of
> >> >> >>>>>> > queries per day, and most of the queries can be completed
> >> within
> >> >> 2
> >> >> >>>>>> - 3
> >> >> >>>>>> > seconds. There is no better alternative for such a high
> >> >> concurrency
> >> >> >>>>>> > scenario.
> >> >> >>>>>> >
> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> >> computing
> >> >> >>>>>> power and
> >> >> >>>>>> > is more suitable when the query request is more flexible, or
> >> when
> >> >> >>>>>> there is
> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >> >>>>>> include: very
> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
> >> with
> >> >> the
> >> >> >>>>>> user
> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> complex
> >> >> >>>>>> on-the-spot
> >> >> >>>>>> > query and so on. If the amount of data and access is large,
> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >> >>>>>> challenge for
> >> >> >>>>>> > operation and maintenance.
> >> >> >>>>>> >
> >> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >> >>>>>> queries is
> >> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >> >>>>>> resources, it is
> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> >> pattern
> >> >> >>>>>> and the
> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> because
> >> the
> >> >> >>>>>> query
> >> >> >>>>>> > volume is large, and by using large computational resources
> to
> >> >> save
> >> >> >>>>>> the
> >> >> >>>>>> > results, the upfront computational cost can be amortized
> over
> >> >> each
> >> >> >>>>>> query,
> >> >> >>>>>> > so it is the most economical.
> >> >> >>>>>> >
> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > ------------------------
> >> >> >>>>>> > With warm regard
> >> >> >>>>>> > Xiaoxiang Yu
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> <namdd@vnpay.vn.invalid
> >> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >
> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> feature.
> >> >> >>>>>> That's
> >> >> >>>>>> >> great.
> >> >> >>>>>> >>
> >> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> >> clickhouse
> >> >> >>>>>> offered
> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> >> which
> >> >> is
> >> >> >>>>>> faster
> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> >> billion
> >> >> >>>>>> rows in
> >> >> >>>>>> >> 2.9
> >> >> >>>>>> >> seconds)
> >> >> >>>>>> >>
> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> >> clickhouse
> >> >> so
> >> >> >>>>>> that I
> >> >> >>>>>> >> can defend my demonstration.
> >> >> >>>>>> >>
> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >>
> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> > that
> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
> >> build,
> >> >> is
> >> >> >>>>>> that
> >> >> >>>>>> >> > correct?"
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > You are correct.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
> of
> >> >> >>>>>> combination
> >> >> >>>>>> >> of
> >> >> >>>>>> >> > ... "
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> >> completed
> >> >> >>>>>> but not
> >> >> >>>>>> >> > released),
> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >> >>>>>> estimation
> >> >> >>>>>> >> but I
> >> >> >>>>>> >> > am
> >> >> >>>>>> >> > quite certain about it).
> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >> >>>>>> micro-batch
> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
> that
> >> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> >> run
> >> >> >>>>>> >> > and monitor a long-running
> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
> need
> >> >> >>>>>> knowledge of
> >> >> >>>>>> >> > it.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> >> customers
> >> >> >>>>>> >> > can tolerate?
> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> >> cases.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > ------------------------
> >> >> >>>>>> >> > With warm regard
> >> >> >>>>>> >> > Xiaoxiang Yu
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> wrote:
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > > Druid is better in
> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > ==========================
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> that
> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> >> build,
> >> >> >>>>>> is that
> >> >> >>>>>> >> > > correct?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >> >>>>>> combination of
> >> >> >>>>>> >> :
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> provide
> >> >> >>>>>> >> > > realtime capability ?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> update)
> >> and
> >> >> >>>>>> >> integrate it
> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> >> xxyu@apache.org>
> >> >> >>>>>> wrote:
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
> >> know
> >> >> too
> >> >> >>>>>> much
> >> >> >>>>>> >> about
> >> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> >> that I
> >> >> >>>>>> know
> >> >> >>>>>> >> are :
> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> >> other
> >> >> >>>>>> than Kylin
> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
> >> Druid
> >> >> >>>>>> which I
> >> >> >>>>>> >> used
> >> >> >>>>>> >> > two
> >> >> >>>>>> >> > > > years ago):
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> >> think
> >> >> >>>>>> Druid had
> >> >> >>>>>> >> > > better
> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
> use
> >> the
> >> >> >>>>>> >> K8S/public
> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> Kylin
> >> >> could
> >> >> >>>>>> be
> >> >> >>>>>> >> better,
> >> >> >>>>>> >> > > > like:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
> can
> >> >> have
> >> >> >>>>>> a more
> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >> >>>>>> dimensions`.
> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
> >> show
> >> >> it
> >> >> >>>>>> supports
> >> >> >>>>>> >> > ODBC
> >> >> >>>>>> >> > > > well)
> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> Druid.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
> it.
> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> >> opinion.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > ------------------------
> >> >> >>>>>> >> > > > With warm regard
> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> > > wrote:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> Kylin
> >> >> >>>>>> compared to
> >> >> >>>>>> >> > Pinot
> >> >> >>>>>> >> > > >> and
> >> >> >>>>>> >> > > >> Druid?
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Please kindly let me know
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> >
> >> >> >>>>>> >>
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >> >>>>>
> >> >>
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I guess the release date should be 2024/01 .
Do you have any suggestions/wishes for kylin 5(except real-time feature)?

------------------------
With warm regard
Xiaoxiang Yu



On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you very much xiaoxiang, I did the presentation this morning already
> so there is no time for you to comment. Next time I will send you in
> advance. The meeting result was that we will implement both druid and kylin
> in the next couple of projects because of its realtime feature. Hope that
> kylin will have same feature soon.
>
> May I ask when will you release kylin 5.0?
>
> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > Since 2018 there are a lot of new features and code refactor.
> > If you like, you can share your ppt to me privately, maybe I can
> > give some comments.
> >
> > Here is the reference of advantages of Kylin since 2018:
> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > -
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid
> in
> >> my team.
> >>
> >> I found this article and would like you to update me the advantages of
> >> Kylin since 2018 until now (especially with version 5 to be released)
> >>
> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> <
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >
> >>
> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> >>
> >> > Thank you very much for your prompt response, I still have several
> >> > questions to seek for your help later.
> >> >
> >> > Best regards and have a good day
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >
> >> >> Done. Github branch changed to kylin5.
> >> >>
> >> >> ------------------------
> >> >> With warm regard
> >> >> Xiaoxiang Yu
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >>
> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> > ------------------------
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >> wrote:
> >> >> >
> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> >> default
> >> >> >> branch. In case people are impressed by the numbers then I hope to
> >> turn
> >> >> >> this situation to reverse direction.
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >> >>
> >> >> >>> The default branch is for 4.X which is a maintained branch, the
> >> active
> >> >> >>> branch is kylin5.
> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >>>
> >> >> >>> ------------------------
> >> >> >>> With warm regard
> >> >> >>> Xiaoxiang Yu
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >> >>>>
> >> >> >>>> Can you see the atttached photo
> >> >> >>>>
> >> >> >>>> My boss asked that why druid commit code regularly but kylin had
> >> not
> >> >> >>>> been committed since July
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> >> >>>>
> >> >> >>>>> I think so.
> >> >> >>>>>
> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
> >> could
> >> >> >>>>> be cheaper
> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
> Kylin
> >> >> can
> >> >> >>>>> guarantee
> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
> hoc
> >> >> >>>>> query scenario.
> >> >> >>>>>
> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> provide
> >> >> >>>>> unified data analytics services for their customers.
> >> >> >>>>>
> >> >> >>>>> ------------------------
> >> >> >>>>> With warm regard
> >> >> >>>>> Xiaoxiang Yu
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >> >>>>> wrote:
> >> >> >>>>>
> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> >> >>>>>>
> >> >> >>>>>> In case my client uses cloud computing service like gcp or
> aws,
> >> >> which
> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> >> (incase
> >> >> >>>>>> of
> >> >> >>>>>> kylin, I have a thought that the query execution has been done
> >> once
> >> >> >>>>>> and
> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >> >>>>>> computation,
> >> >> >>>>>> is that true)?
> >> >> >>>>>>
> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> >> >> wrote:
> >> >> >>>>>>
> >> >> >>>>>> > Following text is part of an article(
> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >>
> >>
> ===============================================================================
> >> >> >>>>>> >
> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> >> because
> >> >> >>>>>> of its
> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> >> where
> >> >> >>>>>> condition
> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> >> volume
> >> >> >>>>>> is, the
> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> particular,
> >> >> >>>>>> Kylin is
> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> >> (count
> >> >> >>>>>> distinct),
> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >> >>>>>> de-weighting
> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >> >>>>>> especially
> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >> >>>>>> Dashboard, all
> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
> >> and
> >> >> user
> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> >> Kylin
> >> >> >>>>>> to build
> >> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >> >>>>>> millions of
> >> >> >>>>>> > queries per day, and most of the queries can be completed
> >> within
> >> >> 2
> >> >> >>>>>> - 3
> >> >> >>>>>> > seconds. There is no better alternative for such a high
> >> >> concurrency
> >> >> >>>>>> > scenario.
> >> >> >>>>>> >
> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> >> computing
> >> >> >>>>>> power and
> >> >> >>>>>> > is more suitable when the query request is more flexible, or
> >> when
> >> >> >>>>>> there is
> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >> >>>>>> include: very
> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
> >> with
> >> >> the
> >> >> >>>>>> user
> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> complex
> >> >> >>>>>> on-the-spot
> >> >> >>>>>> > query and so on. If the amount of data and access is large,
> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >> >>>>>> challenge for
> >> >> >>>>>> > operation and maintenance.
> >> >> >>>>>> >
> >> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >> >>>>>> queries is
> >> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >> >>>>>> resources, it is
> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> >> pattern
> >> >> >>>>>> and the
> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> because
> >> the
> >> >> >>>>>> query
> >> >> >>>>>> > volume is large, and by using large computational resources
> to
> >> >> save
> >> >> >>>>>> the
> >> >> >>>>>> > results, the upfront computational cost can be amortized
> over
> >> >> each
> >> >> >>>>>> query,
> >> >> >>>>>> > so it is the most economical.
> >> >> >>>>>> >
> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > ------------------------
> >> >> >>>>>> > With warm regard
> >> >> >>>>>> > Xiaoxiang Yu
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> <namdd@vnpay.vn.invalid
> >> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >
> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> feature.
> >> >> >>>>>> That's
> >> >> >>>>>> >> great.
> >> >> >>>>>> >>
> >> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> >> clickhouse
> >> >> >>>>>> offered
> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> >> which
> >> >> is
> >> >> >>>>>> faster
> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> >> billion
> >> >> >>>>>> rows in
> >> >> >>>>>> >> 2.9
> >> >> >>>>>> >> seconds)
> >> >> >>>>>> >>
> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> >> clickhouse
> >> >> so
> >> >> >>>>>> that I
> >> >> >>>>>> >> can defend my demonstration.
> >> >> >>>>>> >>
> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> xxyu@apache.org
> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >>
> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> > that
> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
> >> build,
> >> >> is
> >> >> >>>>>> that
> >> >> >>>>>> >> > correct?"
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > You are correct.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
> of
> >> >> >>>>>> combination
> >> >> >>>>>> >> of
> >> >> >>>>>> >> > ... "
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> >> completed
> >> >> >>>>>> but not
> >> >> >>>>>> >> > released),
> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >> >>>>>> estimation
> >> >> >>>>>> >> but I
> >> >> >>>>>> >> > am
> >> >> >>>>>> >> > quite certain about it).
> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >> >>>>>> micro-batch
> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
> that
> >> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> >> run
> >> >> >>>>>> >> > and monitor a long-running
> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
> need
> >> >> >>>>>> knowledge of
> >> >> >>>>>> >> > it.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> >> customers
> >> >> >>>>>> >> > can tolerate?
> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> >> cases.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > ------------------------
> >> >> >>>>>> >> > With warm regard
> >> >> >>>>>> >> > Xiaoxiang Yu
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> wrote:
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > > Druid is better in
> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > ==========================
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> that
> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> >> build,
> >> >> >>>>>> is that
> >> >> >>>>>> >> > > correct?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >> >>>>>> combination of
> >> >> >>>>>> >> :
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> provide
> >> >> >>>>>> >> > > realtime capability ?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> update)
> >> and
> >> >> >>>>>> >> integrate it
> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> >> xxyu@apache.org>
> >> >> >>>>>> wrote:
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
> >> know
> >> >> too
> >> >> >>>>>> much
> >> >> >>>>>> >> about
> >> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> >> that I
> >> >> >>>>>> know
> >> >> >>>>>> >> are :
> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> >> other
> >> >> >>>>>> than Kylin
> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
> >> Druid
> >> >> >>>>>> which I
> >> >> >>>>>> >> used
> >> >> >>>>>> >> > two
> >> >> >>>>>> >> > > > years ago):
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> >> think
> >> >> >>>>>> Druid had
> >> >> >>>>>> >> > > better
> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
> use
> >> the
> >> >> >>>>>> >> K8S/public
> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> Kylin
> >> >> could
> >> >> >>>>>> be
> >> >> >>>>>> >> better,
> >> >> >>>>>> >> > > > like:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
> can
> >> >> have
> >> >> >>>>>> a more
> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >> >>>>>> dimensions`.
> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
> >> show
> >> >> it
> >> >> >>>>>> supports
> >> >> >>>>>> >> > ODBC
> >> >> >>>>>> >> > > > well)
> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> Druid.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
> it.
> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> >> opinion.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > ------------------------
> >> >> >>>>>> >> > > > With warm regard
> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> > > wrote:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> Kylin
> >> >> >>>>>> compared to
> >> >> >>>>>> >> > Pinot
> >> >> >>>>>> >> > > >> and
> >> >> >>>>>> >> > > >> Druid?
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Please kindly let me know
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> >
> >> >> >>>>>> >>
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >> >>>>>
> >> >>
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you very much xiaoxiang, I did the presentation this morning already
so there is no time for you to comment. Next time I will send you in
advance. The meeting result was that we will implement both druid and kylin
in the next couple of projects because of its realtime feature. Hope that
kylin will have same feature soon.

May I ask when will you release kylin 5.0?

On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> Since 2018 there are a lot of new features and code refactor.
> If you like, you can share your ppt to me privately, maybe I can
> give some comments.
>
> Here is the reference of advantages of Kylin since 2018:
> - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> -
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> - https://kylin.apache.org/5.0/docs/development/roadmap
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
>> my team.
>>
>> I found this article and would like you to update me the advantages of
>> Kylin since 2018 until now (especially with version 5 to be released)
>>
>> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> <
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >
>>
>> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>
>> > Thank you very much for your prompt response, I still have several
>> > questions to seek for your help later.
>> >
>> > Best regards and have a good day
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> >> Done. Github branch changed to kylin5.
>> >>
>> >> ------------------------
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > ------------------------
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> wrote:
>> >> >
>> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> default
>> >> >> branch. In case people are impressed by the numbers then I hope to
>> turn
>> >> >> this situation to reverse direction.
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >>
>> >> >>> The default branch is for 4.X which is a maintained branch, the
>> active
>> >> >>> branch is kylin5.
>> >> >>> I will change the default branch to kylin5 later.
>> >> >>>
>> >> >>> ------------------------
>> >> >>> With warm regard
>> >> >>> Xiaoxiang Yu
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> >>>>
>> >> >>>> Can you see the atttached photo
>> >> >>>>
>> >> >>>> My boss asked that why druid commit code regularly but kylin had
>> not
>> >> >>>> been committed since July
>> >> >>>>
>> >> >>>>
>> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >>>>
>> >> >>>>> I think so.
>> >> >>>>>
>> >> >>>>> Response time is not the only factor to make a decision. Kylin
>> could
>> >> >>>>> be cheaper
>> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
>> >> can
>> >> >>>>> guarantee
>> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >> >>>>> query scenario.
>> >> >>>>>
>> >> >>>>> By the way, Youzan and Kyligence combine them together to provide
>> >> >>>>> unified data analytics services for their customers.
>> >> >>>>>
>> >> >>>>> ------------------------
>> >> >>>>> With warm regard
>> >> >>>>> Xiaoxiang Yu
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Hi Xiaoxiang, thank you
>> >> >>>>>>
>> >> >>>>>> In case my client uses cloud computing service like gcp or aws,
>> >> which
>> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> >> (incase
>> >> >>>>>> of
>> >> >>>>>> kylin, I have a thought that the query execution has been done
>> once
>> >> >>>>>> and
>> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >> >>>>>> computation,
>> >> >>>>>> is that true)?
>> >> >>>>>>
>> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
>> >> wrote:
>> >> >>>>>>
>> >> >>>>>> > Following text is part of an article(
>> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> ===============================================================================
>> >> >>>>>> >
>> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> >> because
>> >> >>>>>> of its
>> >> >>>>>> > pre-calculated technology, for example, join, group by, and
>> where
>> >> >>>>>> condition
>> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> >> volume
>> >> >>>>>> is, the
>> >> >>>>>> > more obvious the advantages of using Kylin are; in particular,
>> >> >>>>>> Kylin is
>> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
>> (count
>> >> >>>>>> distinct),
>> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> >>>>>> de-weighting
>> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >> >>>>>> especially
>> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
>> >> >>>>>> Dashboard, all
>> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
>> and
>> >> user
>> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
>> Kylin
>> >> >>>>>> to build
>> >> >>>>>> > their data service platforms, providing millions to tens of
>> >> >>>>>> millions of
>> >> >>>>>> > queries per day, and most of the queries can be completed
>> within
>> >> 2
>> >> >>>>>> - 3
>> >> >>>>>> > seconds. There is no better alternative for such a high
>> >> concurrency
>> >> >>>>>> > scenario.
>> >> >>>>>> >
>> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> computing
>> >> >>>>>> power and
>> >> >>>>>> > is more suitable when the query request is more flexible, or
>> when
>> >> >>>>>> there is
>> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >> >>>>>> include: very
>> >> >>>>>> > many columns and where conditions are arbitrarily combined
>> with
>> >> the
>> >> >>>>>> user
>> >> >>>>>> > label filtering, not a large amount of concurrency of complex
>> >> >>>>>> on-the-spot
>> >> >>>>>> > query and so on. If the amount of data and access is large,
>> you
>> >> >>>>>> need to
>> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >> >>>>>> challenge for
>> >> >>>>>> > operation and maintenance.
>> >> >>>>>> >
>> >> >>>>>> > If some queries are very flexible but infrequent, it is more
>> >> >>>>>> > resource-efficient to use now-computing. Since the number of
>> >> >>>>>> queries is
>> >> >>>>>> > small, even if each query consumes a lot of computational
>> >> >>>>>> resources, it is
>> >> >>>>>> > still cost-effective overall. If some queries have a fixed
>> >> pattern
>> >> >>>>>> and the
>> >> >>>>>> > query volume is large, it is more suitable for Kylin, because
>> the
>> >> >>>>>> query
>> >> >>>>>> > volume is large, and by using large computational resources to
>> >> save
>> >> >>>>>> the
>> >> >>>>>> > results, the upfront computational cost can be amortized over
>> >> each
>> >> >>>>>> query,
>> >> >>>>>> > so it is the most economical.
>> >> >>>>>> >
>> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> > ------------------------
>> >> >>>>>> > With warm regard
>> >> >>>>>> > Xiaoxiang Yu
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> <namdd@vnpay.vn.invalid
>> >> >
>> >> >>>>>> wrote:
>> >> >>>>>> >
>> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>> >> >>>>>> That's
>> >> >>>>>> >> great.
>> >> >>>>>> >>
>> >> >>>>>> >> This morning there has been a new challenge to my team:
>> >> clickhouse
>> >> >>>>>> offered
>> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
>> which
>> >> is
>> >> >>>>>> faster
>> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
>> billion
>> >> >>>>>> rows in
>> >> >>>>>> >> 2.9
>> >> >>>>>> >> seconds)
>> >> >>>>>> >>
>> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> clickhouse
>> >> so
>> >> >>>>>> that I
>> >> >>>>>> >> can defend my demonstration.
>> >> >>>>>> >>
>> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> >>>>>> wrote:
>> >> >>>>>> >>
>> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
>> >> reason
>> >> >>>>>> here is
>> >> >>>>>> >> > that
>> >> >>>>>> >> > kylin has lag time due to model update of new segment
>> build,
>> >> is
>> >> >>>>>> that
>> >> >>>>>> >> > correct?"
>> >> >>>>>> >> >
>> >> >>>>>> >> > You are correct.
>> >> >>>>>> >> >
>> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>> >> >>>>>> combination
>> >> >>>>>> >> of
>> >> >>>>>> >> > ... "
>> >> >>>>>> >> >
>> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> >> completed
>> >> >>>>>> but not
>> >> >>>>>> >> > released),
>> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>> >> >>>>>> estimation
>> >> >>>>>> >> but I
>> >> >>>>>> >> > am
>> >> >>>>>> >> > quite certain about it).
>> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>> >> >>>>>> micro-batch
>> >> >>>>>> >> > aggregation and persistence periodically. The price is that
>> >> you
>> >> >>>>>> need to
>> >> >>>>>> >> run
>> >> >>>>>> >> > and monitor a long-running
>> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>> >> >>>>>> knowledge of
>> >> >>>>>> >> > it.
>> >> >>>>>> >> >
>> >> >>>>>> >> > I am curious about what is the maximum time-lag your
>> customers
>> >> >>>>>> >> > can tolerate?
>> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> >> cases.
>> >> >>>>>> >> >
>> >> >>>>>> >> > ------------------------
>> >> >>>>>> >> > With warm regard
>> >> >>>>>> >> > Xiaoxiang Yu
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >>>>>> >> wrote:
>> >> >>>>>> >> >
>> >> >>>>>> >> > > Druid is better in
>> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > ==========================
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
>> reason
>> >> >>>>>> here is
>> >> >>>>>> >> that
>> >> >>>>>> >> > > kylin has lag time due to model update of new segment
>> build,
>> >> >>>>>> is that
>> >> >>>>>> >> > > correct?
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >> >>>>>> combination of
>> >> >>>>>> >> :
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >> >>>>>> >> > > realtime capability ?
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update)
>> and
>> >> >>>>>> >> integrate it
>> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> xxyu@apache.org>
>> >> >>>>>> wrote:
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
>> know
>> >> too
>> >> >>>>>> much
>> >> >>>>>> >> about
>> >> >>>>>> >> > > >  the change of Druid in these two years. New features
>> >> that I
>> >> >>>>>> know
>> >> >>>>>> >> are :
>> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > Here are some cases you should consider using Druid
>> other
>> >> >>>>>> than Kylin
>> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
>> Druid
>> >> >>>>>> which I
>> >> >>>>>> >> used
>> >> >>>>>> >> > two
>> >> >>>>>> >> > > > years ago):
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
>> think
>> >> >>>>>> Druid had
>> >> >>>>>> >> > > better
>> >> >>>>>> >> > > > response time for small queries two years ago.)
>> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use
>> the
>> >> >>>>>> >> K8S/public
>> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
>> >> could
>> >> >>>>>> be
>> >> >>>>>> >> better,
>> >> >>>>>> >> > > > like:
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
>> >> have
>> >> >>>>>> a more
>> >> >>>>>> >> > > > exact-match/fine-grained
>> >> >>>>>> >> > > >   Index for queries containing different `Group By
>> >> >>>>>> dimensions`.
>> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
>> show
>> >> it
>> >> >>>>>> supports
>> >> >>>>>> >> > ODBC
>> >> >>>>>> >> > > > well)
>> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> opinion.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > ------------------------
>> >> >>>>>> >> > > > With warm regard
>> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >>>>>> >> > > wrote:
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> >>>>>> >> > > >> Sirs/Madams,
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> May I post my boss's question:
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>> >> >>>>>> compared to
>> >> >>>>>> >> > Pinot
>> >> >>>>>> >> > > >> and
>> >> >>>>>> >> > > >> Druid?
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> Please kindly let me know
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > >
>> >> >>>>>> >> >
>> >> >>>>>> >>
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>>>>
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you very much xiaoxiang, I did the presentation this morning already
so there is no time for you to comment. Next time I will send you in
advance. The meeting result was that we will implement both druid and kylin
in the next couple of projects because of its realtime feature. Hope that
kylin will have same feature soon.

May I ask when will you release kylin 5.0?

On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> Since 2018 there are a lot of new features and code refactor.
> If you like, you can share your ppt to me privately, maybe I can
> give some comments.
>
> Here is the reference of advantages of Kylin since 2018:
> - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> -
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> - https://kylin.apache.org/5.0/docs/development/roadmap
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
>> my team.
>>
>> I found this article and would like you to update me the advantages of
>> Kylin since 2018 until now (especially with version 5 to be released)
>>
>> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
>> <
>> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
>> >
>>
>> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>
>> > Thank you very much for your prompt response, I still have several
>> > questions to seek for your help later.
>> >
>> > Best regards and have a good day
>> >
>> >
>> >
>> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >
>> >> Done. Github branch changed to kylin5.
>> >>
>> >> ------------------------
>> >> With warm regard
>> >> Xiaoxiang Yu
>> >>
>> >>
>> >>
>> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >> > A JIRA ticket has been opened, waiting for INFRA :
>> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> >> > ------------------------
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> wrote:
>> >> >
>> >> >> Thank you Xiaoxiang, please update me when you have changed your
>> >> default
>> >> >> branch. In case people are impressed by the numbers then I hope to
>> turn
>> >> >> this situation to reverse direction.
>> >> >>
>> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> >>
>> >> >>> The default branch is for 4.X which is a maintained branch, the
>> active
>> >> >>> branch is kylin5.
>> >> >>> I will change the default branch to kylin5 later.
>> >> >>>
>> >> >>> ------------------------
>> >> >>> With warm regard
>> >> >>> Xiaoxiang Yu
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> >>> wrote:
>> >> >>>
>> >> >>>> Hi Xiaoxiang, Sirs / Madams
>> >> >>>>
>> >> >>>> Can you see the atttached photo
>> >> >>>>
>> >> >>>> My boss asked that why druid commit code regularly but kylin had
>> not
>> >> >>>> been committed since July
>> >> >>>>
>> >> >>>>
>> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >> >>>>
>> >> >>>>> I think so.
>> >> >>>>>
>> >> >>>>> Response time is not the only factor to make a decision. Kylin
>> could
>> >> >>>>> be cheaper
>> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
>> >> can
>> >> >>>>> guarantee
>> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >> >>>>> query scenario.
>> >> >>>>>
>> >> >>>>> By the way, Youzan and Kyligence combine them together to provide
>> >> >>>>> unified data analytics services for their customers.
>> >> >>>>>
>> >> >>>>> ------------------------
>> >> >>>>> With warm regard
>> >> >>>>> Xiaoxiang Yu
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >> >>>>> wrote:
>> >> >>>>>
>> >> >>>>>> Hi Xiaoxiang, thank you
>> >> >>>>>>
>> >> >>>>>> In case my client uses cloud computing service like gcp or aws,
>> >> which
>> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> >> (incase
>> >> >>>>>> of
>> >> >>>>>> kylin, I have a thought that the query execution has been done
>> once
>> >> >>>>>> and
>> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >> >>>>>> computation,
>> >> >>>>>> is that true)?
>> >> >>>>>>
>> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
>> >> wrote:
>> >> >>>>>>
>> >> >>>>>> > Following text is part of an article(
>> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>>
>> >>
>> ===============================================================================
>> >> >>>>>> >
>> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> >> because
>> >> >>>>>> of its
>> >> >>>>>> > pre-calculated technology, for example, join, group by, and
>> where
>> >> >>>>>> condition
>> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> >> volume
>> >> >>>>>> is, the
>> >> >>>>>> > more obvious the advantages of using Kylin are; in particular,
>> >> >>>>>> Kylin is
>> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
>> (count
>> >> >>>>>> distinct),
>> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >> >>>>>> de-weighting
>> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >> >>>>>> especially
>> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
>> >> >>>>>> Dashboard, all
>> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
>> and
>> >> user
>> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
>> Kylin
>> >> >>>>>> to build
>> >> >>>>>> > their data service platforms, providing millions to tens of
>> >> >>>>>> millions of
>> >> >>>>>> > queries per day, and most of the queries can be completed
>> within
>> >> 2
>> >> >>>>>> - 3
>> >> >>>>>> > seconds. There is no better alternative for such a high
>> >> concurrency
>> >> >>>>>> > scenario.
>> >> >>>>>> >
>> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
>> computing
>> >> >>>>>> power and
>> >> >>>>>> > is more suitable when the query request is more flexible, or
>> when
>> >> >>>>>> there is
>> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >> >>>>>> include: very
>> >> >>>>>> > many columns and where conditions are arbitrarily combined
>> with
>> >> the
>> >> >>>>>> user
>> >> >>>>>> > label filtering, not a large amount of concurrency of complex
>> >> >>>>>> on-the-spot
>> >> >>>>>> > query and so on. If the amount of data and access is large,
>> you
>> >> >>>>>> need to
>> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >> >>>>>> challenge for
>> >> >>>>>> > operation and maintenance.
>> >> >>>>>> >
>> >> >>>>>> > If some queries are very flexible but infrequent, it is more
>> >> >>>>>> > resource-efficient to use now-computing. Since the number of
>> >> >>>>>> queries is
>> >> >>>>>> > small, even if each query consumes a lot of computational
>> >> >>>>>> resources, it is
>> >> >>>>>> > still cost-effective overall. If some queries have a fixed
>> >> pattern
>> >> >>>>>> and the
>> >> >>>>>> > query volume is large, it is more suitable for Kylin, because
>> the
>> >> >>>>>> query
>> >> >>>>>> > volume is large, and by using large computational resources to
>> >> save
>> >> >>>>>> the
>> >> >>>>>> > results, the upfront computational cost can be amortized over
>> >> each
>> >> >>>>>> query,
>> >> >>>>>> > so it is the most economical.
>> >> >>>>>> >
>> >> >>>>>> > --- Translated with DeepL.com (free version)
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> > ------------------------
>> >> >>>>>> > With warm regard
>> >> >>>>>> > Xiaoxiang Yu
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> >
>> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
>> <namdd@vnpay.vn.invalid
>> >> >
>> >> >>>>>> wrote:
>> >> >>>>>> >
>> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>> >> >>>>>> That's
>> >> >>>>>> >> great.
>> >> >>>>>> >>
>> >> >>>>>> >> This morning there has been a new challenge to my team:
>> >> clickhouse
>> >> >>>>>> offered
>> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
>> which
>> >> is
>> >> >>>>>> faster
>> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
>> billion
>> >> >>>>>> rows in
>> >> >>>>>> >> 2.9
>> >> >>>>>> >> seconds)
>> >> >>>>>> >>
>> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
>> clickhouse
>> >> so
>> >> >>>>>> that I
>> >> >>>>>> >> can defend my demonstration.
>> >> >>>>>> >>
>> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xxyu@apache.org
>> >
>> >> >>>>>> wrote:
>> >> >>>>>> >>
>> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
>> >> reason
>> >> >>>>>> here is
>> >> >>>>>> >> > that
>> >> >>>>>> >> > kylin has lag time due to model update of new segment
>> build,
>> >> is
>> >> >>>>>> that
>> >> >>>>>> >> > correct?"
>> >> >>>>>> >> >
>> >> >>>>>> >> > You are correct.
>> >> >>>>>> >> >
>> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>> >> >>>>>> combination
>> >> >>>>>> >> of
>> >> >>>>>> >> > ... "
>> >> >>>>>> >> >
>> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> >> completed
>> >> >>>>>> but not
>> >> >>>>>> >> > released),
>> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>> >> >>>>>> estimation
>> >> >>>>>> >> but I
>> >> >>>>>> >> > am
>> >> >>>>>> >> > quite certain about it).
>> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>> >> >>>>>> micro-batch
>> >> >>>>>> >> > aggregation and persistence periodically. The price is that
>> >> you
>> >> >>>>>> need to
>> >> >>>>>> >> run
>> >> >>>>>> >> > and monitor a long-running
>> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>> >> >>>>>> knowledge of
>> >> >>>>>> >> > it.
>> >> >>>>>> >> >
>> >> >>>>>> >> > I am curious about what is the maximum time-lag your
>> customers
>> >> >>>>>> >> > can tolerate?
>> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> >> cases.
>> >> >>>>>> >> >
>> >> >>>>>> >> > ------------------------
>> >> >>>>>> >> > With warm regard
>> >> >>>>>> >> > Xiaoxiang Yu
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> >
>> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >>>>>> >> wrote:
>> >> >>>>>> >> >
>> >> >>>>>> >> > > Druid is better in
>> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > ==========================
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
>> reason
>> >> >>>>>> here is
>> >> >>>>>> >> that
>> >> >>>>>> >> > > kylin has lag time due to model update of new segment
>> build,
>> >> >>>>>> is that
>> >> >>>>>> >> > > correct?
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >> >>>>>> combination of
>> >> >>>>>> >> :
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >> >>>>>> >> > > realtime capability ?
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update)
>> and
>> >> >>>>>> >> integrate it
>> >> >>>>>> >> > > with (time - lag kylin cube).
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> >> xxyu@apache.org>
>> >> >>>>>> wrote:
>> >> >>>>>> >> > >
>> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
>> know
>> >> too
>> >> >>>>>> much
>> >> >>>>>> >> about
>> >> >>>>>> >> > > >  the change of Druid in these two years. New features
>> >> that I
>> >> >>>>>> know
>> >> >>>>>> >> are :
>> >> >>>>>> >> > > > new UI, fully on K8s etc).
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > Here are some cases you should consider using Druid
>> other
>> >> >>>>>> than Kylin
>> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
>> Druid
>> >> >>>>>> which I
>> >> >>>>>> >> used
>> >> >>>>>> >> > two
>> >> >>>>>> >> > > > years ago):
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
>> think
>> >> >>>>>> Druid had
>> >> >>>>>> >> > > better
>> >> >>>>>> >> > > > response time for small queries two years ago.)
>> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use
>> the
>> >> >>>>>> >> K8S/public
>> >> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
>> >> could
>> >> >>>>>> be
>> >> >>>>>> >> better,
>> >> >>>>>> >> > > > like:
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
>> >> have
>> >> >>>>>> a more
>> >> >>>>>> >> > > > exact-match/fine-grained
>> >> >>>>>> >> > > >   Index for queries containing different `Group By
>> >> >>>>>> dimensions`.
>> >> >>>>>> >> > > > - User-friendly UI for modeling.
>> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
>> show
>> >> it
>> >> >>>>>> supports
>> >> >>>>>> >> > ODBC
>> >> >>>>>> >> > > > well)
>> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >> >>>>>> >> > > > Hope to help you, or you are free to share your
>> opinion.
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > ------------------------
>> >> >>>>>> >> > > > With warm regard
>> >> >>>>>> >> > > > Xiaoxiang Yu
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >> >>>>>> <na...@vnpay.vn.invalid>
>> >> >>>>>> >> > > wrote:
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > > >> Dear Xiaoxiang,
>> >> >>>>>> >> > > >> Sirs/Madams,
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> May I post my boss's question:
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>> >> >>>>>> compared to
>> >> >>>>>> >> > Pinot
>> >> >>>>>> >> > > >> and
>> >> >>>>>> >> > > >> Druid?
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> Please kindly let me know
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >> Thank you very much and best regards
>> >> >>>>>> >> > > >>
>> >> >>>>>> >> > > >
>> >> >>>>>> >> > >
>> >> >>>>>> >> >
>> >> >>>>>> >>
>> >> >>>>>> >
>> >> >>>>>>
>> >> >>>>>
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Since 2018 there are a lot of new features and code refactor.
If you like, you can share your ppt to me privately, maybe I can
give some comments.

Here is the reference of advantages of Kylin since 2018:
- https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
-
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
- https://kylin.apache.org/5.0/docs/development/roadmap

------------------------
With warm regard
Xiaoxiang Yu



On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
> my team.
>
> I found this article and would like you to update me the advantages of
> Kylin since 2018 until now (especially with version 5 to be released)
>
> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> <
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >
>
> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>
> > Thank you very much for your prompt response, I still have several
> > questions to seek for your help later.
> >
> > Best regards and have a good day
> >
> >
> >
> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> >> Done. Github branch changed to kylin5.
> >>
> >> ------------------------
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> default
> >> >> branch. In case people are impressed by the numbers then I hope to
> turn
> >> >> this situation to reverse direction.
> >> >>
> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >>
> >> >>> The default branch is for 4.X which is a maintained branch, the
> active
> >> >>> branch is kylin5.
> >> >>> I will change the default branch to kylin5 later.
> >> >>>
> >> >>> ------------------------
> >> >>> With warm regard
> >> >>> Xiaoxiang Yu
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >>>>
> >> >>>> Can you see the atttached photo
> >> >>>>
> >> >>>> My boss asked that why druid commit code regularly but kylin had
> not
> >> >>>> been committed since July
> >> >>>>
> >> >>>>
> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >>>>
> >> >>>>> I think so.
> >> >>>>>
> >> >>>>> Response time is not the only factor to make a decision. Kylin
> could
> >> >>>>> be cheaper
> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
> >> can
> >> >>>>> guarantee
> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >> >>>>> query scenario.
> >> >>>>>
> >> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >> >>>>> unified data analytics services for their customers.
> >> >>>>>
> >> >>>>> ------------------------
> >> >>>>> With warm regard
> >> >>>>> Xiaoxiang Yu
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Hi Xiaoxiang, thank you
> >> >>>>>>
> >> >>>>>> In case my client uses cloud computing service like gcp or aws,
> >> which
> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> (incase
> >> >>>>>> of
> >> >>>>>> kylin, I have a thought that the query execution has been done
> once
> >> >>>>>> and
> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >>>>>> computation,
> >> >>>>>> is that true)?
> >> >>>>>>
> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >>>>>>
> >> >>>>>> > Following text is part of an article(
> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >>
> ===============================================================================
> >> >>>>>> >
> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> because
> >> >>>>>> of its
> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> where
> >> >>>>>> condition
> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> volume
> >> >>>>>> is, the
> >> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >> >>>>>> Kylin is
> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> (count
> >> >>>>>> distinct),
> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >>>>>> de-weighting
> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >>>>>> especially
> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >>>>>> Dashboard, all
> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> >> user
> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> Kylin
> >> >>>>>> to build
> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >>>>>> millions of
> >> >>>>>> > queries per day, and most of the queries can be completed
> within
> >> 2
> >> >>>>>> - 3
> >> >>>>>> > seconds. There is no better alternative for such a high
> >> concurrency
> >> >>>>>> > scenario.
> >> >>>>>> >
> >> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >> >>>>>> power and
> >> >>>>>> > is more suitable when the query request is more flexible, or
> when
> >> >>>>>> there is
> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >>>>>> include: very
> >> >>>>>> > many columns and where conditions are arbitrarily combined with
> >> the
> >> >>>>>> user
> >> >>>>>> > label filtering, not a large amount of concurrency of complex
> >> >>>>>> on-the-spot
> >> >>>>>> > query and so on. If the amount of data and access is large, you
> >> >>>>>> need to
> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >>>>>> challenge for
> >> >>>>>> > operation and maintenance.
> >> >>>>>> >
> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >>>>>> queries is
> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >>>>>> resources, it is
> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> pattern
> >> >>>>>> and the
> >> >>>>>> > query volume is large, it is more suitable for Kylin, because
> the
> >> >>>>>> query
> >> >>>>>> > volume is large, and by using large computational resources to
> >> save
> >> >>>>>> the
> >> >>>>>> > results, the upfront computational cost can be amortized over
> >> each
> >> >>>>>> query,
> >> >>>>>> > so it is the most economical.
> >> >>>>>> >
> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > ------------------------
> >> >>>>>> > With warm regard
> >> >>>>>> > Xiaoxiang Yu
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >> >>>>>> That's
> >> >>>>>> >> great.
> >> >>>>>> >>
> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> clickhouse
> >> >>>>>> offered
> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> which
> >> is
> >> >>>>>> faster
> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> billion
> >> >>>>>> rows in
> >> >>>>>> >> 2.9
> >> >>>>>> >> seconds)
> >> >>>>>> >>
> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> clickhouse
> >> so
> >> >>>>>> that I
> >> >>>>>> >> can defend my demonstration.
> >> >>>>>> >>
> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
> >> >>>>>> wrote:
> >> >>>>>> >>
> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> reason
> >> >>>>>> here is
> >> >>>>>> >> > that
> >> >>>>>> >> > kylin has lag time due to model update of new segment build,
> >> is
> >> >>>>>> that
> >> >>>>>> >> > correct?"
> >> >>>>>> >> >
> >> >>>>>> >> > You are correct.
> >> >>>>>> >> >
> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >> >>>>>> combination
> >> >>>>>> >> of
> >> >>>>>> >> > ... "
> >> >>>>>> >> >
> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> completed
> >> >>>>>> but not
> >> >>>>>> >> > released),
> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >>>>>> estimation
> >> >>>>>> >> but I
> >> >>>>>> >> > am
> >> >>>>>> >> > quite certain about it).
> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >>>>>> micro-batch
> >> >>>>>> >> > aggregation and persistence periodically. The price is that
> >> you
> >> >>>>>> need to
> >> >>>>>> >> run
> >> >>>>>> >> > and monitor a long-running
> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >> >>>>>> knowledge of
> >> >>>>>> >> > it.
> >> >>>>>> >> >
> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> customers
> >> >>>>>> >> > can tolerate?
> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> cases.
> >> >>>>>> >> >
> >> >>>>>> >> > ------------------------
> >> >>>>>> >> > With warm regard
> >> >>>>>> >> > Xiaoxiang Yu
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >>>>>> >> wrote:
> >> >>>>>> >> >
> >> >>>>>> >> > > Druid is better in
> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > >
> >> >>>>>> >> > > ==========================
> >> >>>>>> >> > >
> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >>>>>> >> > >
> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> reason
> >> >>>>>> here is
> >> >>>>>> >> that
> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> build,
> >> >>>>>> is that
> >> >>>>>> >> > > correct?
> >> >>>>>> >> > >
> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >>>>>> combination of
> >> >>>>>> >> :
> >> >>>>>> >> > >
> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> >>>>>> >> > > realtime capability ?
> >> >>>>>> >> > >
> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update)
> and
> >> >>>>>> >> integrate it
> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >>>>>> >> > >
> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> xxyu@apache.org>
> >> >>>>>> wrote:
> >> >>>>>> >> > >
> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> >> too
> >> >>>>>> much
> >> >>>>>> >> about
> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> that I
> >> >>>>>> know
> >> >>>>>> >> are :
> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> other
> >> >>>>>> than Kylin
> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >> >>>>>> which I
> >> >>>>>> >> used
> >> >>>>>> >> > two
> >> >>>>>> >> > > > years ago):
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> think
> >> >>>>>> Druid had
> >> >>>>>> >> > > better
> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use
> the
> >> >>>>>> >> K8S/public
> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> >> could
> >> >>>>>> be
> >> >>>>>> >> better,
> >> >>>>>> >> > > > like:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> >> have
> >> >>>>>> a more
> >> >>>>>> >> > > > exact-match/fine-grained
> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >>>>>> dimensions`.
> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
> >> it
> >> >>>>>> supports
> >> >>>>>> >> > ODBC
> >> >>>>>> >> > > > well)
> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > ------------------------
> >> >>>>>> >> > > > With warm regard
> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >>>>>> >> > > wrote:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >>>>>> >> > > >> Sirs/Madams,
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> May I post my boss's question:
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >> >>>>>> compared to
> >> >>>>>> >> > Pinot
> >> >>>>>> >> > > >> and
> >> >>>>>> >> > > >> Druid?
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Please kindly let me know
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >
> >> >>>>>> >> > >
> >> >>>>>> >> >
> >> >>>>>> >>
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Since 2018 there are a lot of new features and code refactor.
If you like, you can share your ppt to me privately, maybe I can
give some comments.

Here is the reference of advantages of Kylin since 2018:
- https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
-
https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
- https://kylin.apache.org/5.0/docs/development/roadmap

------------------------
With warm regard
Xiaoxiang Yu



On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
> my team.
>
> I found this article and would like you to update me the advantages of
> Kylin since 2018 until now (especially with version 5 to be released)
>
> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> <
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >
>
> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>
> > Thank you very much for your prompt response, I still have several
> > questions to seek for your help later.
> >
> > Best regards and have a good day
> >
> >
> >
> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> >> Done. Github branch changed to kylin5.
> >>
> >> ------------------------
> >> With warm regard
> >> Xiaoxiang Yu
> >>
> >>
> >>
> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> default
> >> >> branch. In case people are impressed by the numbers then I hope to
> turn
> >> >> this situation to reverse direction.
> >> >>
> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >>
> >> >>> The default branch is for 4.X which is a maintained branch, the
> active
> >> >>> branch is kylin5.
> >> >>> I will change the default branch to kylin5 later.
> >> >>>
> >> >>> ------------------------
> >> >>> With warm regard
> >> >>> Xiaoxiang Yu
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >>>>
> >> >>>> Can you see the atttached photo
> >> >>>>
> >> >>>> My boss asked that why druid commit code regularly but kylin had
> not
> >> >>>> been committed since July
> >> >>>>
> >> >>>>
> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
> >> >>>>
> >> >>>>> I think so.
> >> >>>>>
> >> >>>>> Response time is not the only factor to make a decision. Kylin
> could
> >> >>>>> be cheaper
> >> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
> >> can
> >> >>>>> guarantee
> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >> >>>>> query scenario.
> >> >>>>>
> >> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >> >>>>> unified data analytics services for their customers.
> >> >>>>>
> >> >>>>> ------------------------
> >> >>>>> With warm regard
> >> >>>>> Xiaoxiang Yu
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>> Hi Xiaoxiang, thank you
> >> >>>>>>
> >> >>>>>> In case my client uses cloud computing service like gcp or aws,
> >> which
> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> (incase
> >> >>>>>> of
> >> >>>>>> kylin, I have a thought that the query execution has been done
> once
> >> >>>>>> and
> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >>>>>> computation,
> >> >>>>>> is that true)?
> >> >>>>>>
> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> >> wrote:
> >> >>>>>>
> >> >>>>>> > Following text is part of an article(
> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>>
> >>
> ===============================================================================
> >> >>>>>> >
> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> because
> >> >>>>>> of its
> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> where
> >> >>>>>> condition
> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> volume
> >> >>>>>> is, the
> >> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >> >>>>>> Kylin is
> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> (count
> >> >>>>>> distinct),
> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >>>>>> de-weighting
> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >>>>>> especially
> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >>>>>> Dashboard, all
> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> >> user
> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> Kylin
> >> >>>>>> to build
> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >>>>>> millions of
> >> >>>>>> > queries per day, and most of the queries can be completed
> within
> >> 2
> >> >>>>>> - 3
> >> >>>>>> > seconds. There is no better alternative for such a high
> >> concurrency
> >> >>>>>> > scenario.
> >> >>>>>> >
> >> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >> >>>>>> power and
> >> >>>>>> > is more suitable when the query request is more flexible, or
> when
> >> >>>>>> there is
> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >>>>>> include: very
> >> >>>>>> > many columns and where conditions are arbitrarily combined with
> >> the
> >> >>>>>> user
> >> >>>>>> > label filtering, not a large amount of concurrency of complex
> >> >>>>>> on-the-spot
> >> >>>>>> > query and so on. If the amount of data and access is large, you
> >> >>>>>> need to
> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >>>>>> challenge for
> >> >>>>>> > operation and maintenance.
> >> >>>>>> >
> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >>>>>> queries is
> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >>>>>> resources, it is
> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> pattern
> >> >>>>>> and the
> >> >>>>>> > query volume is large, it is more suitable for Kylin, because
> the
> >> >>>>>> query
> >> >>>>>> > volume is large, and by using large computational resources to
> >> save
> >> >>>>>> the
> >> >>>>>> > results, the upfront computational cost can be amortized over
> >> each
> >> >>>>>> query,
> >> >>>>>> > so it is the most economical.
> >> >>>>>> >
> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > ------------------------
> >> >>>>>> > With warm regard
> >> >>>>>> > Xiaoxiang Yu
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> >
> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> <namdd@vnpay.vn.invalid
> >> >
> >> >>>>>> wrote:
> >> >>>>>> >
> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >> >>>>>> That's
> >> >>>>>> >> great.
> >> >>>>>> >>
> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> clickhouse
> >> >>>>>> offered
> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> which
> >> is
> >> >>>>>> faster
> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> billion
> >> >>>>>> rows in
> >> >>>>>> >> 2.9
> >> >>>>>> >> seconds)
> >> >>>>>> >>
> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> clickhouse
> >> so
> >> >>>>>> that I
> >> >>>>>> >> can defend my demonstration.
> >> >>>>>> >>
> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
> >> >>>>>> wrote:
> >> >>>>>> >>
> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> reason
> >> >>>>>> here is
> >> >>>>>> >> > that
> >> >>>>>> >> > kylin has lag time due to model update of new segment build,
> >> is
> >> >>>>>> that
> >> >>>>>> >> > correct?"
> >> >>>>>> >> >
> >> >>>>>> >> > You are correct.
> >> >>>>>> >> >
> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >> >>>>>> combination
> >> >>>>>> >> of
> >> >>>>>> >> > ... "
> >> >>>>>> >> >
> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> completed
> >> >>>>>> but not
> >> >>>>>> >> > released),
> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >>>>>> estimation
> >> >>>>>> >> but I
> >> >>>>>> >> > am
> >> >>>>>> >> > quite certain about it).
> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >>>>>> micro-batch
> >> >>>>>> >> > aggregation and persistence periodically. The price is that
> >> you
> >> >>>>>> need to
> >> >>>>>> >> run
> >> >>>>>> >> > and monitor a long-running
> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >> >>>>>> knowledge of
> >> >>>>>> >> > it.
> >> >>>>>> >> >
> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> customers
> >> >>>>>> >> > can tolerate?
> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> cases.
> >> >>>>>> >> >
> >> >>>>>> >> > ------------------------
> >> >>>>>> >> > With warm regard
> >> >>>>>> >> > Xiaoxiang Yu
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> >
> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >>>>>> >> wrote:
> >> >>>>>> >> >
> >> >>>>>> >> > > Druid is better in
> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > >
> >> >>>>>> >> > > ==========================
> >> >>>>>> >> > >
> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >>>>>> >> > >
> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> reason
> >> >>>>>> here is
> >> >>>>>> >> that
> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> build,
> >> >>>>>> is that
> >> >>>>>> >> > > correct?
> >> >>>>>> >> > >
> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >>>>>> combination of
> >> >>>>>> >> :
> >> >>>>>> >> > >
> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> >>>>>> >> > > realtime capability ?
> >> >>>>>> >> > >
> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB update)
> and
> >> >>>>>> >> integrate it
> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >>>>>> >> > >
> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> xxyu@apache.org>
> >> >>>>>> wrote:
> >> >>>>>> >> > >
> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> >> too
> >> >>>>>> much
> >> >>>>>> >> about
> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> that I
> >> >>>>>> know
> >> >>>>>> >> are :
> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> other
> >> >>>>>> than Kylin
> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >> >>>>>> which I
> >> >>>>>> >> used
> >> >>>>>> >> > two
> >> >>>>>> >> > > > years ago):
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> think
> >> >>>>>> Druid had
> >> >>>>>> >> > > better
> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use
> the
> >> >>>>>> >> K8S/public
> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> >> could
> >> >>>>>> be
> >> >>>>>> >> better,
> >> >>>>>> >> > > > like:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> >> have
> >> >>>>>> a more
> >> >>>>>> >> > > > exact-match/fine-grained
> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >>>>>> dimensions`.
> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
> >> it
> >> >>>>>> supports
> >> >>>>>> >> > ODBC
> >> >>>>>> >> > > > well)
> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > ------------------------
> >> >>>>>> >> > > > With warm regard
> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >
> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >>>>>> >> > > wrote:
> >> >>>>>> >> > > >
> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >>>>>> >> > > >> Sirs/Madams,
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> May I post my boss's question:
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >> >>>>>> compared to
> >> >>>>>> >> > Pinot
> >> >>>>>> >> > > >> and
> >> >>>>>> >> > > >> Druid?
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Please kindly let me know
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >>>>>> >> > > >>
> >> >>>>>> >> > > >
> >> >>>>>> >> > >
> >> >>>>>> >> >
> >> >>>>>> >>
> >> >>>>>> >
> >> >>>>>>
> >> >>>>>
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
my team.

I found this article and would like you to update me the advantages of
Kylin since 2018 until now (especially with version 5 to be released)

Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
<https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/>

On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:

> Thank you very much for your prompt response, I still have several
> questions to seek for your help later.
>
> Best regards and have a good day
>
>
>
> On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> Done. Github branch changed to kylin5.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > A JIRA ticket has been opened, waiting for INFRA :
>> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Thank you Xiaoxiang, please update me when you have changed your
>> default
>> >> branch. In case people are impressed by the numbers then I hope to turn
>> >> this situation to reverse direction.
>> >>
>> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >>> The default branch is for 4.X which is a maintained branch, the active
>> >>> branch is kylin5.
>> >>> I will change the default branch to kylin5 later.
>> >>>
>> >>> ------------------------
>> >>> With warm regard
>> >>> Xiaoxiang Yu
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >>> wrote:
>> >>>
>> >>>> Hi Xiaoxiang, Sirs / Madams
>> >>>>
>> >>>> Can you see the atttached photo
>> >>>>
>> >>>> My boss asked that why druid commit code regularly but kylin had not
>> >>>> been committed since July
>> >>>>
>> >>>>
>> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>>>
>> >>>>> I think so.
>> >>>>>
>> >>>>> Response time is not the only factor to make a decision. Kylin could
>> >>>>> be cheaper
>> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
>> can
>> >>>>> guarantee
>> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >>>>> query scenario.
>> >>>>>
>> >>>>> By the way, Youzan and Kyligence combine them together to provide
>> >>>>> unified data analytics services for their customers.
>> >>>>>
>> >>>>> ------------------------
>> >>>>> With warm regard
>> >>>>> Xiaoxiang Yu
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi Xiaoxiang, thank you
>> >>>>>>
>> >>>>>> In case my client uses cloud computing service like gcp or aws,
>> which
>> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> (incase
>> >>>>>> of
>> >>>>>> kylin, I have a thought that the query execution has been done once
>> >>>>>> and
>> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >>>>>> computation,
>> >>>>>> is that true)?
>> >>>>>>
>> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >>>>>>
>> >>>>>> > Following text is part of an article(
>> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> ===============================================================================
>> >>>>>> >
>> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> because
>> >>>>>> of its
>> >>>>>> > pre-calculated technology, for example, join, group by, and where
>> >>>>>> condition
>> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> volume
>> >>>>>> is, the
>> >>>>>> > more obvious the advantages of using Kylin are; in particular,
>> >>>>>> Kylin is
>> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>> >>>>>> distinct),
>> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >>>>>> de-weighting
>> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >>>>>> especially
>> >>>>>> > huge, and it is used in a large number of scenarios, such as
>> >>>>>> Dashboard, all
>> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
>> user
>> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> >>>>>> to build
>> >>>>>> > their data service platforms, providing millions to tens of
>> >>>>>> millions of
>> >>>>>> > queries per day, and most of the queries can be completed within
>> 2
>> >>>>>> - 3
>> >>>>>> > seconds. There is no better alternative for such a high
>> concurrency
>> >>>>>> > scenario.
>> >>>>>> >
>> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
>> >>>>>> power and
>> >>>>>> > is more suitable when the query request is more flexible, or when
>> >>>>>> there is
>> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >>>>>> include: very
>> >>>>>> > many columns and where conditions are arbitrarily combined with
>> the
>> >>>>>> user
>> >>>>>> > label filtering, not a large amount of concurrency of complex
>> >>>>>> on-the-spot
>> >>>>>> > query and so on. If the amount of data and access is large, you
>> >>>>>> need to
>> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >>>>>> challenge for
>> >>>>>> > operation and maintenance.
>> >>>>>> >
>> >>>>>> > If some queries are very flexible but infrequent, it is more
>> >>>>>> > resource-efficient to use now-computing. Since the number of
>> >>>>>> queries is
>> >>>>>> > small, even if each query consumes a lot of computational
>> >>>>>> resources, it is
>> >>>>>> > still cost-effective overall. If some queries have a fixed
>> pattern
>> >>>>>> and the
>> >>>>>> > query volume is large, it is more suitable for Kylin, because the
>> >>>>>> query
>> >>>>>> > volume is large, and by using large computational resources to
>> save
>> >>>>>> the
>> >>>>>> > results, the upfront computational cost can be amortized over
>> each
>> >>>>>> query,
>> >>>>>> > so it is the most economical.
>> >>>>>> >
>> >>>>>> > --- Translated with DeepL.com (free version)
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > ------------------------
>> >>>>>> > With warm regard
>> >>>>>> > Xiaoxiang Yu
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >>>>>> wrote:
>> >>>>>> >
>> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>> >>>>>> That's
>> >>>>>> >> great.
>> >>>>>> >>
>> >>>>>> >> This morning there has been a new challenge to my team:
>> clickhouse
>> >>>>>> offered
>> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
>> is
>> >>>>>> faster
>> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>> >>>>>> rows in
>> >>>>>> >> 2.9
>> >>>>>> >> seconds)
>> >>>>>> >>
>> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
>> so
>> >>>>>> that I
>> >>>>>> >> can defend my demonstration.
>> >>>>>> >>
>> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>> >>>>>> wrote:
>> >>>>>> >>
>> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
>> reason
>> >>>>>> here is
>> >>>>>> >> > that
>> >>>>>> >> > kylin has lag time due to model update of new segment build,
>> is
>> >>>>>> that
>> >>>>>> >> > correct?"
>> >>>>>> >> >
>> >>>>>> >> > You are correct.
>> >>>>>> >> >
>> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>> >>>>>> combination
>> >>>>>> >> of
>> >>>>>> >> > ... "
>> >>>>>> >> >
>> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> completed
>> >>>>>> but not
>> >>>>>> >> > released),
>> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>> >>>>>> estimation
>> >>>>>> >> but I
>> >>>>>> >> > am
>> >>>>>> >> > quite certain about it).
>> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>> >>>>>> micro-batch
>> >>>>>> >> > aggregation and persistence periodically. The price is that
>> you
>> >>>>>> need to
>> >>>>>> >> run
>> >>>>>> >> > and monitor a long-running
>> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>> >>>>>> knowledge of
>> >>>>>> >> > it.
>> >>>>>> >> >
>> >>>>>> >> > I am curious about what is the maximum time-lag your customers
>> >>>>>> >> > can tolerate?
>> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> cases.
>> >>>>>> >> >
>> >>>>>> >> > ------------------------
>> >>>>>> >> > With warm regard
>> >>>>>> >> > Xiaoxiang Yu
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >>>>>> <na...@vnpay.vn.invalid>
>> >>>>>> >> wrote:
>> >>>>>> >> >
>> >>>>>> >> > > Druid is better in
>> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > >
>> >>>>>> >> > > ==========================
>> >>>>>> >> > >
>> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >>>>>> >> > >
>> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
>> >>>>>> here is
>> >>>>>> >> that
>> >>>>>> >> > > kylin has lag time due to model update of new segment build,
>> >>>>>> is that
>> >>>>>> >> > > correct?
>> >>>>>> >> > >
>> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >>>>>> combination of
>> >>>>>> >> :
>> >>>>>> >> > >
>> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >>>>>> >> > > realtime capability ?
>> >>>>>> >> > >
>> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>> >>>>>> >> integrate it
>> >>>>>> >> > > with (time - lag kylin cube).
>> >>>>>> >> > >
>> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> xxyu@apache.org>
>> >>>>>> wrote:
>> >>>>>> >> > >
>> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
>> too
>> >>>>>> much
>> >>>>>> >> about
>> >>>>>> >> > > >  the change of Druid in these two years. New features
>> that I
>> >>>>>> know
>> >>>>>> >> are :
>> >>>>>> >> > > > new UI, fully on K8s etc).
>> >>>>>> >> > > >
>> >>>>>> >> > > > Here are some cases you should consider using Druid other
>> >>>>>> than Kylin
>> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>> >>>>>> which I
>> >>>>>> >> used
>> >>>>>> >> > two
>> >>>>>> >> > > > years ago):
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
>> >>>>>> Druid had
>> >>>>>> >> > > better
>> >>>>>> >> > > > response time for small queries two years ago.)
>> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> >>>>>> >> K8S/public
>> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >>>>>> >> > > >
>> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
>> could
>> >>>>>> be
>> >>>>>> >> better,
>> >>>>>> >> > > > like:
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
>> have
>> >>>>>> a more
>> >>>>>> >> > > > exact-match/fine-grained
>> >>>>>> >> > > >   Index for queries containing different `Group By
>> >>>>>> dimensions`.
>> >>>>>> >> > > > - User-friendly UI for modeling.
>> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
>> it
>> >>>>>> supports
>> >>>>>> >> > ODBC
>> >>>>>> >> > > > well)
>> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>> >>>>>> >> > > >
>> >>>>>> >> > > > ------------------------
>> >>>>>> >> > > > With warm regard
>> >>>>>> >> > > > Xiaoxiang Yu
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >>>>>> <na...@vnpay.vn.invalid>
>> >>>>>> >> > > wrote:
>> >>>>>> >> > > >
>> >>>>>> >> > > >> Dear Xiaoxiang,
>> >>>>>> >> > > >> Sirs/Madams,
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> May I post my boss's question:
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>> >>>>>> compared to
>> >>>>>> >> > Pinot
>> >>>>>> >> > > >> and
>> >>>>>> >> > > >> Druid?
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Please kindly let me know
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Thank you very much and best regards
>> >>>>>> >> > > >>
>> >>>>>> >> > > >
>> >>>>>> >> > >
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >
>> >>>>>>
>> >>>>>
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid in
my team.

I found this article and would like you to update me the advantages of
Kylin since 2018 until now (especially with version 5 to be released)

Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
<https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/>

On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:

> Thank you very much for your prompt response, I still have several
> questions to seek for your help later.
>
> Best regards and have a good day
>
>
>
> On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> Done. Github branch changed to kylin5.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > A JIRA ticket has been opened, waiting for INFRA :
>> > https://issues.apache.org/jira/browse/INFRA-25238 .
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Thank you Xiaoxiang, please update me when you have changed your
>> default
>> >> branch. In case people are impressed by the numbers then I hope to turn
>> >> this situation to reverse direction.
>> >>
>> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >>> The default branch is for 4.X which is a maintained branch, the active
>> >>> branch is kylin5.
>> >>> I will change the default branch to kylin5 later.
>> >>>
>> >>> ------------------------
>> >>> With warm regard
>> >>> Xiaoxiang Yu
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >>> wrote:
>> >>>
>> >>>> Hi Xiaoxiang, Sirs / Madams
>> >>>>
>> >>>> Can you see the atttached photo
>> >>>>
>> >>>> My boss asked that why druid commit code regularly but kylin had not
>> >>>> been committed since July
>> >>>>
>> >>>>
>> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>>>
>> >>>>> I think so.
>> >>>>>
>> >>>>> Response time is not the only factor to make a decision. Kylin could
>> >>>>> be cheaper
>> >>>>> when the query pattern is suitable for the Kylin model, and Kylin
>> can
>> >>>>> guarantee
>> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>> >>>>> query scenario.
>> >>>>>
>> >>>>> By the way, Youzan and Kyligence combine them together to provide
>> >>>>> unified data analytics services for their customers.
>> >>>>>
>> >>>>> ------------------------
>> >>>>> With warm regard
>> >>>>> Xiaoxiang Yu
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hi Xiaoxiang, thank you
>> >>>>>>
>> >>>>>> In case my client uses cloud computing service like gcp or aws,
>> which
>> >>>>>> will cost more: precalculation feature of kylin or clickhouse
>> (incase
>> >>>>>> of
>> >>>>>> kylin, I have a thought that the query execution has been done once
>> >>>>>> and
>> >>>>>> stored in cube to be used many times so kylin uses less cloud
>> >>>>>> computation,
>> >>>>>> is that true)?
>> >>>>>>
>> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >>>>>>
>> >>>>>> > Following text is part of an article(
>> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>>
>> ===============================================================================
>> >>>>>> >
>> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
>> because
>> >>>>>> of its
>> >>>>>> > pre-calculated technology, for example, join, group by, and where
>> >>>>>> condition
>> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
>> volume
>> >>>>>> is, the
>> >>>>>> > more obvious the advantages of using Kylin are; in particular,
>> >>>>>> Kylin is
>> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>> >>>>>> distinct),
>> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>> >>>>>> de-weighting
>> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
>> >>>>>> especially
>> >>>>>> > huge, and it is used in a large number of scenarios, such as
>> >>>>>> Dashboard, all
>> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
>> user
>> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>> >>>>>> to build
>> >>>>>> > their data service platforms, providing millions to tens of
>> >>>>>> millions of
>> >>>>>> > queries per day, and most of the queries can be completed within
>> 2
>> >>>>>> - 3
>> >>>>>> > seconds. There is no better alternative for such a high
>> concurrency
>> >>>>>> > scenario.
>> >>>>>> >
>> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
>> >>>>>> power and
>> >>>>>> > is more suitable when the query request is more flexible, or when
>> >>>>>> there is
>> >>>>>> > a need for detailed queries with low concurrency. Scenarios
>> >>>>>> include: very
>> >>>>>> > many columns and where conditions are arbitrarily combined with
>> the
>> >>>>>> user
>> >>>>>> > label filtering, not a large amount of concurrency of complex
>> >>>>>> on-the-spot
>> >>>>>> > query and so on. If the amount of data and access is large, you
>> >>>>>> need to
>> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>> >>>>>> challenge for
>> >>>>>> > operation and maintenance.
>> >>>>>> >
>> >>>>>> > If some queries are very flexible but infrequent, it is more
>> >>>>>> > resource-efficient to use now-computing. Since the number of
>> >>>>>> queries is
>> >>>>>> > small, even if each query consumes a lot of computational
>> >>>>>> resources, it is
>> >>>>>> > still cost-effective overall. If some queries have a fixed
>> pattern
>> >>>>>> and the
>> >>>>>> > query volume is large, it is more suitable for Kylin, because the
>> >>>>>> query
>> >>>>>> > volume is large, and by using large computational resources to
>> save
>> >>>>>> the
>> >>>>>> > results, the upfront computational cost can be amortized over
>> each
>> >>>>>> query,
>> >>>>>> > so it is the most economical.
>> >>>>>> >
>> >>>>>> > --- Translated with DeepL.com (free version)
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > ------------------------
>> >>>>>> > With warm regard
>> >>>>>> > Xiaoxiang Yu
>> >>>>>> >
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>> >
>> >>>>>> wrote:
>> >>>>>> >
>> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>> >>>>>> That's
>> >>>>>> >> great.
>> >>>>>> >>
>> >>>>>> >> This morning there has been a new challenge to my team:
>> clickhouse
>> >>>>>> offered
>> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
>> is
>> >>>>>> faster
>> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>> >>>>>> rows in
>> >>>>>> >> 2.9
>> >>>>>> >> seconds)
>> >>>>>> >>
>> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
>> so
>> >>>>>> that I
>> >>>>>> >> can defend my demonstration.
>> >>>>>> >>
>> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>> >>>>>> wrote:
>> >>>>>> >>
>> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
>> reason
>> >>>>>> here is
>> >>>>>> >> > that
>> >>>>>> >> > kylin has lag time due to model update of new segment build,
>> is
>> >>>>>> that
>> >>>>>> >> > correct?"
>> >>>>>> >> >
>> >>>>>> >> > You are correct.
>> >>>>>> >> >
>> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>> >>>>>> combination
>> >>>>>> >> of
>> >>>>>> >> > ... "
>> >>>>>> >> >
>> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
>> completed
>> >>>>>> but not
>> >>>>>> >> > released),
>> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>> >>>>>> estimation
>> >>>>>> >> but I
>> >>>>>> >> > am
>> >>>>>> >> > quite certain about it).
>> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>> >>>>>> micro-batch
>> >>>>>> >> > aggregation and persistence periodically. The price is that
>> you
>> >>>>>> need to
>> >>>>>> >> run
>> >>>>>> >> > and monitor a long-running
>> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>> >>>>>> knowledge of
>> >>>>>> >> > it.
>> >>>>>> >> >
>> >>>>>> >> > I am curious about what is the maximum time-lag your customers
>> >>>>>> >> > can tolerate?
>> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
>> cases.
>> >>>>>> >> >
>> >>>>>> >> > ------------------------
>> >>>>>> >> > With warm regard
>> >>>>>> >> > Xiaoxiang Yu
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> >
>> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>> >>>>>> <na...@vnpay.vn.invalid>
>> >>>>>> >> wrote:
>> >>>>>> >> >
>> >>>>>> >> > > Druid is better in
>> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > >
>> >>>>>> >> > > ==========================
>> >>>>>> >> > >
>> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>> >>>>>> >> > >
>> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
>> >>>>>> here is
>> >>>>>> >> that
>> >>>>>> >> > > kylin has lag time due to model update of new segment build,
>> >>>>>> is that
>> >>>>>> >> > > correct?
>> >>>>>> >> > >
>> >>>>>> >> > > If that is true, then can you suggest a work-around of
>> >>>>>> combination of
>> >>>>>> >> :
>> >>>>>> >> > >
>> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >>>>>> >> > > realtime capability ?
>> >>>>>> >> > >
>> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>> >>>>>> >> integrate it
>> >>>>>> >> > > with (time - lag kylin cube).
>> >>>>>> >> > >
>> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
>> xxyu@apache.org>
>> >>>>>> wrote:
>> >>>>>> >> > >
>> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
>> too
>> >>>>>> much
>> >>>>>> >> about
>> >>>>>> >> > > >  the change of Druid in these two years. New features
>> that I
>> >>>>>> know
>> >>>>>> >> are :
>> >>>>>> >> > > > new UI, fully on K8s etc).
>> >>>>>> >> > > >
>> >>>>>> >> > > > Here are some cases you should consider using Druid other
>> >>>>>> than Kylin
>> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>> >>>>>> which I
>> >>>>>> >> used
>> >>>>>> >> > two
>> >>>>>> >> > > > years ago):
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
>> >>>>>> Druid had
>> >>>>>> >> > > better
>> >>>>>> >> > > > response time for small queries two years ago.)
>> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> >>>>>> >> K8S/public
>> >>>>>> >> > > >   cloud platform as your deployment platform.
>> >>>>>> >> > > >
>> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
>> could
>> >>>>>> be
>> >>>>>> >> better,
>> >>>>>> >> > > > like:
>> >>>>>> >> > > >
>> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
>> have
>> >>>>>> a more
>> >>>>>> >> > > > exact-match/fine-grained
>> >>>>>> >> > > >   Index for queries containing different `Group By
>> >>>>>> dimensions`.
>> >>>>>> >> > > > - User-friendly UI for modeling.
>> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show
>> it
>> >>>>>> supports
>> >>>>>> >> > ODBC
>> >>>>>> >> > > > well)
>> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>> >>>>>> >> > > >
>> >>>>>> >> > > > ------------------------
>> >>>>>> >> > > > With warm regard
>> >>>>>> >> > > > Xiaoxiang Yu
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > >
>> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> >>>>>> <na...@vnpay.vn.invalid>
>> >>>>>> >> > > wrote:
>> >>>>>> >> > > >
>> >>>>>> >> > > >> Dear Xiaoxiang,
>> >>>>>> >> > > >> Sirs/Madams,
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> May I post my boss's question:
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>> >>>>>> compared to
>> >>>>>> >> > Pinot
>> >>>>>> >> > > >> and
>> >>>>>> >> > > >> Druid?
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Please kindly let me know
>> >>>>>> >> > > >>
>> >>>>>> >> > > >> Thank you very much and best regards
>> >>>>>> >> > > >>
>> >>>>>> >> > > >
>> >>>>>> >> > >
>> >>>>>> >> >
>> >>>>>> >>
>> >>>>>> >
>> >>>>>>
>> >>>>>
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you very much for your prompt response, I still have several
questions to seek for your help later.

Best regards and have a good day



On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> Done. Github branch changed to kylin5.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > A JIRA ticket has been opened, waiting for INFRA :
> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang, please update me when you have changed your default
> >> branch. In case people are impressed by the numbers then I hope to turn
> >> this situation to reverse direction.
> >>
> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >>> The default branch is for 4.X which is a maintained branch, the active
> >>> branch is kylin5.
> >>> I will change the default branch to kylin5 later.
> >>>
> >>> ------------------------
> >>> With warm regard
> >>> Xiaoxiang Yu
> >>>
> >>>
> >>>
> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>> wrote:
> >>>
> >>>> Hi Xiaoxiang, Sirs / Madams
> >>>>
> >>>> Can you see the atttached photo
> >>>>
> >>>> My boss asked that why druid commit code regularly but kylin had not
> >>>> been committed since July
> >>>>
> >>>>
> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
> >>>>
> >>>>> I think so.
> >>>>>
> >>>>> Response time is not the only factor to make a decision. Kylin could
> >>>>> be cheaper
> >>>>> when the query pattern is suitable for the Kylin model, and Kylin can
> >>>>> guarantee
> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >>>>> query scenario.
> >>>>>
> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >>>>> unified data analytics services for their customers.
> >>>>>
> >>>>> ------------------------
> >>>>> With warm regard
> >>>>> Xiaoxiang Yu
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Xiaoxiang, thank you
> >>>>>>
> >>>>>> In case my client uses cloud computing service like gcp or aws,
> which
> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> (incase
> >>>>>> of
> >>>>>> kylin, I have a thought that the query execution has been done once
> >>>>>> and
> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >>>>>> computation,
> >>>>>> is that true)?
> >>>>>>
> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >>>>>>
> >>>>>> > Following text is part of an article(
> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>>
> ===============================================================================
> >>>>>> >
> >>>>>> > Kylin is suitable for aggregation queries with fixed modes because
> >>>>>> of its
> >>>>>> > pre-calculated technology, for example, join, group by, and where
> >>>>>> condition
> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
> >>>>>> is, the
> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >>>>>> Kylin is
> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
> >>>>>> distinct),
> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >>>>>> de-weighting
> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >>>>>> especially
> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >>>>>> Dashboard, all
> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> user
> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
> >>>>>> to build
> >>>>>> > their data service platforms, providing millions to tens of
> >>>>>> millions of
> >>>>>> > queries per day, and most of the queries can be completed within 2
> >>>>>> - 3
> >>>>>> > seconds. There is no better alternative for such a high
> concurrency
> >>>>>> > scenario.
> >>>>>> >
> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >>>>>> power and
> >>>>>> > is more suitable when the query request is more flexible, or when
> >>>>>> there is
> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >>>>>> include: very
> >>>>>> > many columns and where conditions are arbitrarily combined with
> the
> >>>>>> user
> >>>>>> > label filtering, not a large amount of concurrency of complex
> >>>>>> on-the-spot
> >>>>>> > query and so on. If the amount of data and access is large, you
> >>>>>> need to
> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >>>>>> challenge for
> >>>>>> > operation and maintenance.
> >>>>>> >
> >>>>>> > If some queries are very flexible but infrequent, it is more
> >>>>>> > resource-efficient to use now-computing. Since the number of
> >>>>>> queries is
> >>>>>> > small, even if each query consumes a lot of computational
> >>>>>> resources, it is
> >>>>>> > still cost-effective overall. If some queries have a fixed pattern
> >>>>>> and the
> >>>>>> > query volume is large, it is more suitable for Kylin, because the
> >>>>>> query
> >>>>>> > volume is large, and by using large computational resources to
> save
> >>>>>> the
> >>>>>> > results, the upfront computational cost can be amortized over each
> >>>>>> query,
> >>>>>> > so it is the most economical.
> >>>>>> >
> >>>>>> > --- Translated with DeepL.com (free version)
> >>>>>> >
> >>>>>> >
> >>>>>> > ------------------------
> >>>>>> > With warm regard
> >>>>>> > Xiaoxiang Yu
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >>>>>> wrote:
> >>>>>> >
> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >>>>>> That's
> >>>>>> >> great.
> >>>>>> >>
> >>>>>> >> This morning there has been a new challenge to my team:
> clickhouse
> >>>>>> offered
> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
> is
> >>>>>> faster
> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
> >>>>>> rows in
> >>>>>> >> 2.9
> >>>>>> >> seconds)
> >>>>>> >>
> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
> so
> >>>>>> that I
> >>>>>> >> can defend my demonstration.
> >>>>>> >>
> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
> >>>>>> wrote:
> >>>>>> >>
> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> reason
> >>>>>> here is
> >>>>>> >> > that
> >>>>>> >> > kylin has lag time due to model update of new segment build, is
> >>>>>> that
> >>>>>> >> > correct?"
> >>>>>> >> >
> >>>>>> >> > You are correct.
> >>>>>> >> >
> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >>>>>> combination
> >>>>>> >> of
> >>>>>> >> > ... "
> >>>>>> >> >
> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> completed
> >>>>>> but not
> >>>>>> >> > released),
> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >>>>>> estimation
> >>>>>> >> but I
> >>>>>> >> > am
> >>>>>> >> > quite certain about it).
> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >>>>>> micro-batch
> >>>>>> >> > aggregation and persistence periodically. The price is that you
> >>>>>> need to
> >>>>>> >> run
> >>>>>> >> > and monitor a long-running
> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >>>>>> knowledge of
> >>>>>> >> > it.
> >>>>>> >> >
> >>>>>> >> > I am curious about what is the maximum time-lag your customers
> >>>>>> >> > can tolerate?
> >>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
> >>>>>> >> >
> >>>>>> >> > ------------------------
> >>>>>> >> > With warm regard
> >>>>>> >> > Xiaoxiang Yu
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> wrote:
> >>>>>> >> >
> >>>>>> >> > > Druid is better in
> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > >
> >>>>>> >> > > ==========================
> >>>>>> >> > >
> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >>>>>> >> > >
> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
> >>>>>> here is
> >>>>>> >> that
> >>>>>> >> > > kylin has lag time due to model update of new segment build,
> >>>>>> is that
> >>>>>> >> > > correct?
> >>>>>> >> > >
> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >>>>>> combination of
> >>>>>> >> :
> >>>>>> >> > >
> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >>>>>> >> > > realtime capability ?
> >>>>>> >> > >
> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
> >>>>>> >> integrate it
> >>>>>> >> > > with (time - lag kylin cube).
> >>>>>> >> > >
> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xxyu@apache.org
> >
> >>>>>> wrote:
> >>>>>> >> > >
> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> too
> >>>>>> much
> >>>>>> >> about
> >>>>>> >> > > >  the change of Druid in these two years. New features that
> I
> >>>>>> know
> >>>>>> >> are :
> >>>>>> >> > > > new UI, fully on K8s etc).
> >>>>>> >> > > >
> >>>>>> >> > > > Here are some cases you should consider using Druid other
> >>>>>> than Kylin
> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >>>>>> which I
> >>>>>> >> used
> >>>>>> >> > two
> >>>>>> >> > > > years ago):
> >>>>>> >> > > >
> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
> >>>>>> Druid had
> >>>>>> >> > > better
> >>>>>> >> > > > response time for small queries two years ago.)
> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >>>>>> >> K8S/public
> >>>>>> >> > > >   cloud platform as your deployment platform.
> >>>>>> >> > > >
> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> could
> >>>>>> be
> >>>>>> >> better,
> >>>>>> >> > > > like:
> >>>>>> >> > > >
> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> have
> >>>>>> a more
> >>>>>> >> > > > exact-match/fine-grained
> >>>>>> >> > > >   Index for queries containing different `Group By
> >>>>>> dimensions`.
> >>>>>> >> > > > - User-friendly UI for modeling.
> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
> >>>>>> supports
> >>>>>> >> > ODBC
> >>>>>> >> > > > well)
> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >>>>>> >> > > >
> >>>>>> >> > > > ------------------------
> >>>>>> >> > > > With warm regard
> >>>>>> >> > > > Xiaoxiang Yu
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> > > wrote:
> >>>>>> >> > > >
> >>>>>> >> > > >> Dear Xiaoxiang,
> >>>>>> >> > > >> Sirs/Madams,
> >>>>>> >> > > >>
> >>>>>> >> > > >> May I post my boss's question:
> >>>>>> >> > > >>
> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >>>>>> compared to
> >>>>>> >> > Pinot
> >>>>>> >> > > >> and
> >>>>>> >> > > >> Druid?
> >>>>>> >> > > >>
> >>>>>> >> > > >> Please kindly let me know
> >>>>>> >> > > >>
> >>>>>> >> > > >> Thank you very much and best regards
> >>>>>> >> > > >>
> >>>>>> >> > > >
> >>>>>> >> > >
> >>>>>> >> >
> >>>>>> >>
> >>>>>> >
> >>>>>>
> >>>>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you very much for your prompt response, I still have several
questions to seek for your help later.

Best regards and have a good day



On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> Done. Github branch changed to kylin5.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > A JIRA ticket has been opened, waiting for INFRA :
> > https://issues.apache.org/jira/browse/INFRA-25238 .
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang, please update me when you have changed your default
> >> branch. In case people are impressed by the numbers then I hope to turn
> >> this situation to reverse direction.
> >>
> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >>> The default branch is for 4.X which is a maintained branch, the active
> >>> branch is kylin5.
> >>> I will change the default branch to kylin5 later.
> >>>
> >>> ------------------------
> >>> With warm regard
> >>> Xiaoxiang Yu
> >>>
> >>>
> >>>
> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>> wrote:
> >>>
> >>>> Hi Xiaoxiang, Sirs / Madams
> >>>>
> >>>> Can you see the atttached photo
> >>>>
> >>>> My boss asked that why druid commit code regularly but kylin had not
> >>>> been committed since July
> >>>>
> >>>>
> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
> >>>>
> >>>>> I think so.
> >>>>>
> >>>>> Response time is not the only factor to make a decision. Kylin could
> >>>>> be cheaper
> >>>>> when the query pattern is suitable for the Kylin model, and Kylin can
> >>>>> guarantee
> >>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
> >>>>> query scenario.
> >>>>>
> >>>>> By the way, Youzan and Kyligence combine them together to provide
> >>>>> unified data analytics services for their customers.
> >>>>>
> >>>>> ------------------------
> >>>>> With warm regard
> >>>>> Xiaoxiang Yu
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Xiaoxiang, thank you
> >>>>>>
> >>>>>> In case my client uses cloud computing service like gcp or aws,
> which
> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> (incase
> >>>>>> of
> >>>>>> kylin, I have a thought that the query execution has been done once
> >>>>>> and
> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >>>>>> computation,
> >>>>>> is that true)?
> >>>>>>
> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >>>>>>
> >>>>>> > Following text is part of an article(
> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>>
> ===============================================================================
> >>>>>> >
> >>>>>> > Kylin is suitable for aggregation queries with fixed modes because
> >>>>>> of its
> >>>>>> > pre-calculated technology, for example, join, group by, and where
> >>>>>> condition
> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
> >>>>>> is, the
> >>>>>> > more obvious the advantages of using Kylin are; in particular,
> >>>>>> Kylin is
> >>>>>> > particularly advantageous in the scenarios of de-emphasis (count
> >>>>>> distinct),
> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >>>>>> de-weighting
> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >>>>>> especially
> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >>>>>> Dashboard, all
> >>>>>> > kinds of reports, large-screen display, traffic statistics, and
> user
> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
> >>>>>> to build
> >>>>>> > their data service platforms, providing millions to tens of
> >>>>>> millions of
> >>>>>> > queries per day, and most of the queries can be completed within 2
> >>>>>> - 3
> >>>>>> > seconds. There is no better alternative for such a high
> concurrency
> >>>>>> > scenario.
> >>>>>> >
> >>>>>> > ClickHouse, because of its MPP architecture, has high computing
> >>>>>> power and
> >>>>>> > is more suitable when the query request is more flexible, or when
> >>>>>> there is
> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >>>>>> include: very
> >>>>>> > many columns and where conditions are arbitrarily combined with
> the
> >>>>>> user
> >>>>>> > label filtering, not a large amount of concurrency of complex
> >>>>>> on-the-spot
> >>>>>> > query and so on. If the amount of data and access is large, you
> >>>>>> need to
> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >>>>>> challenge for
> >>>>>> > operation and maintenance.
> >>>>>> >
> >>>>>> > If some queries are very flexible but infrequent, it is more
> >>>>>> > resource-efficient to use now-computing. Since the number of
> >>>>>> queries is
> >>>>>> > small, even if each query consumes a lot of computational
> >>>>>> resources, it is
> >>>>>> > still cost-effective overall. If some queries have a fixed pattern
> >>>>>> and the
> >>>>>> > query volume is large, it is more suitable for Kylin, because the
> >>>>>> query
> >>>>>> > volume is large, and by using large computational resources to
> save
> >>>>>> the
> >>>>>> > results, the upfront computational cost can be amortized over each
> >>>>>> query,
> >>>>>> > so it is the most economical.
> >>>>>> >
> >>>>>> > --- Translated with DeepL.com (free version)
> >>>>>> >
> >>>>>> >
> >>>>>> > ------------------------
> >>>>>> > With warm regard
> >>>>>> > Xiaoxiang Yu
> >>>>>> >
> >>>>>> >
> >>>>>> >
> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >>>>>> wrote:
> >>>>>> >
> >>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
> >>>>>> That's
> >>>>>> >> great.
> >>>>>> >>
> >>>>>> >> This morning there has been a new challenge to my team:
> clickhouse
> >>>>>> offered
> >>>>>> >> us the speed of calculating 8 billion rows in millisecond which
> is
> >>>>>> faster
> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
> >>>>>> rows in
> >>>>>> >> 2.9
> >>>>>> >> seconds)
> >>>>>> >>
> >>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse
> so
> >>>>>> that I
> >>>>>> >> can defend my demonstration.
> >>>>>> >>
> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
> >>>>>> wrote:
> >>>>>> >>
> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> reason
> >>>>>> here is
> >>>>>> >> > that
> >>>>>> >> > kylin has lag time due to model update of new segment build, is
> >>>>>> that
> >>>>>> >> > correct?"
> >>>>>> >> >
> >>>>>> >> > You are correct.
> >>>>>> >> >
> >>>>>> >> > 2. "If that is true, then can you suggest a work-around of
> >>>>>> combination
> >>>>>> >> of
> >>>>>> >> > ... "
> >>>>>> >> >
> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> completed
> >>>>>> but not
> >>>>>> >> > released),
> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >>>>>> estimation
> >>>>>> >> but I
> >>>>>> >> > am
> >>>>>> >> > quite certain about it).
> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >>>>>> micro-batch
> >>>>>> >> > aggregation and persistence periodically. The price is that you
> >>>>>> need to
> >>>>>> >> run
> >>>>>> >> > and monitor a long-running
> >>>>>> >> >  job. This feature is based on Spark Streaming, so you need
> >>>>>> knowledge of
> >>>>>> >> > it.
> >>>>>> >> >
> >>>>>> >> > I am curious about what is the maximum time-lag your customers
> >>>>>> >> > can tolerate?
> >>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
> >>>>>> >> >
> >>>>>> >> > ------------------------
> >>>>>> >> > With warm regard
> >>>>>> >> > Xiaoxiang Yu
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> >
> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> wrote:
> >>>>>> >> >
> >>>>>> >> > > Druid is better in
> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > >
> >>>>>> >> > > ==========================
> >>>>>> >> > >
> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >>>>>> >> > >
> >>>>>> >> > > In this important scenario of realtime alalytics, the reason
> >>>>>> here is
> >>>>>> >> that
> >>>>>> >> > > kylin has lag time due to model update of new segment build,
> >>>>>> is that
> >>>>>> >> > > correct?
> >>>>>> >> > >
> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >>>>>> combination of
> >>>>>> >> :
> >>>>>> >> > >
> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >>>>>> >> > > realtime capability ?
> >>>>>> >> > >
> >>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
> >>>>>> >> integrate it
> >>>>>> >> > > with (time - lag kylin cube).
> >>>>>> >> > >
> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xxyu@apache.org
> >
> >>>>>> wrote:
> >>>>>> >> > >
> >>>>>> >> > > > I researched and tested Druid two years ago(I don't know
> too
> >>>>>> much
> >>>>>> >> about
> >>>>>> >> > > >  the change of Druid in these two years. New features that
> I
> >>>>>> know
> >>>>>> >> are :
> >>>>>> >> > > > new UI, fully on K8s etc).
> >>>>>> >> > > >
> >>>>>> >> > > > Here are some cases you should consider using Druid other
> >>>>>> than Kylin
> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
> >>>>>> which I
> >>>>>> >> used
> >>>>>> >> > two
> >>>>>> >> > > > years ago):
> >>>>>> >> > > >
> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >>>>>> >> > > > - Most queries are small(Based on my test result, I think
> >>>>>> Druid had
> >>>>>> >> > > better
> >>>>>> >> > > > response time for small queries two years ago.)
> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >>>>>> >> K8S/public
> >>>>>> >> > > >   cloud platform as your deployment platform.
> >>>>>> >> > > >
> >>>>>> >> > > > But I do think there are many scenarios in which Kylin
> could
> >>>>>> be
> >>>>>> >> better,
> >>>>>> >> > > > like:
> >>>>>> >> > > >
> >>>>>> >> > > > - Better performance for complex/big queries. Kylin can
> have
> >>>>>> a more
> >>>>>> >> > > > exact-match/fine-grained
> >>>>>> >> > > >   Index for queries containing different `Group By
> >>>>>> dimensions`.
> >>>>>> >> > > > - User-friendly UI for modeling.
> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
> >>>>>> supports
> >>>>>> >> > ODBC
> >>>>>> >> > > > well)
> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
> >>>>>> >> > > > Hope to help you, or you are free to share your opinion.
> >>>>>> >> > > >
> >>>>>> >> > > > ------------------------
> >>>>>> >> > > > With warm regard
> >>>>>> >> > > > Xiaoxiang Yu
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > >
> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >>>>>> <na...@vnpay.vn.invalid>
> >>>>>> >> > > wrote:
> >>>>>> >> > > >
> >>>>>> >> > > >> Dear Xiaoxiang,
> >>>>>> >> > > >> Sirs/Madams,
> >>>>>> >> > > >>
> >>>>>> >> > > >> May I post my boss's question:
> >>>>>> >> > > >>
> >>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
> >>>>>> compared to
> >>>>>> >> > Pinot
> >>>>>> >> > > >> and
> >>>>>> >> > > >> Druid?
> >>>>>> >> > > >>
> >>>>>> >> > > >> Please kindly let me know
> >>>>>> >> > > >>
> >>>>>> >> > > >> Thank you very much and best regards
> >>>>>> >> > > >>
> >>>>>> >> > > >
> >>>>>> >> > >
> >>>>>> >> >
> >>>>>> >>
> >>>>>> >
> >>>>>>
> >>>>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Done. Github branch changed to kylin5.

------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> A JIRA ticket has been opened, waiting for INFRA :
> https://issues.apache.org/jira/browse/INFRA-25238 .
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you Xiaoxiang, please update me when you have changed your default
>> branch. In case people are impressed by the numbers then I hope to turn
>> this situation to reverse direction.
>>
>> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>>> The default branch is for 4.X which is a maintained branch, the active
>>> branch is kylin5.
>>> I will change the default branch to kylin5 later.
>>>
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>>
>>>> Hi Xiaoxiang, Sirs / Madams
>>>>
>>>> Can you see the atttached photo
>>>>
>>>> My boss asked that why druid commit code regularly but kylin had not
>>>> been committed since July
>>>>
>>>>
>>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>
>>>>> I think so.
>>>>>
>>>>> Response time is not the only factor to make a decision. Kylin could
>>>>> be cheaper
>>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>>> guarantee
>>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>>>>> query scenario.
>>>>>
>>>>> By the way, Youzan and Kyligence combine them together to provide
>>>>> unified data analytics services for their customers.
>>>>>
>>>>> ------------------------
>>>>> With warm regard
>>>>> Xiaoxiang Yu
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>> wrote:
>>>>>
>>>>>> Hi Xiaoxiang, thank you
>>>>>>
>>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>>> of
>>>>>> kylin, I have a thought that the query execution has been done once
>>>>>> and
>>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>>> computation,
>>>>>> is that true)?
>>>>>>
>>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>>>
>>>>>> > Following text is part of an article(
>>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> ===============================================================================
>>>>>> >
>>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>>> of its
>>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>>> condition
>>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>>> is, the
>>>>>> > more obvious the advantages of using Kylin are; in particular,
>>>>>> Kylin is
>>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>>> distinct),
>>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>>> de-weighting
>>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>>> especially
>>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>>> Dashboard, all
>>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>>>>>> to build
>>>>>> > their data service platforms, providing millions to tens of
>>>>>> millions of
>>>>>> > queries per day, and most of the queries can be completed within 2
>>>>>> - 3
>>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>>> > scenario.
>>>>>> >
>>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>>> power and
>>>>>> > is more suitable when the query request is more flexible, or when
>>>>>> there is
>>>>>> > a need for detailed queries with low concurrency. Scenarios
>>>>>> include: very
>>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>>> user
>>>>>> > label filtering, not a large amount of concurrency of complex
>>>>>> on-the-spot
>>>>>> > query and so on. If the amount of data and access is large, you
>>>>>> need to
>>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>>>>>> challenge for
>>>>>> > operation and maintenance.
>>>>>> >
>>>>>> > If some queries are very flexible but infrequent, it is more
>>>>>> > resource-efficient to use now-computing. Since the number of
>>>>>> queries is
>>>>>> > small, even if each query consumes a lot of computational
>>>>>> resources, it is
>>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>>> and the
>>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>>> query
>>>>>> > volume is large, and by using large computational resources to save
>>>>>> the
>>>>>> > results, the upfront computational cost can be amortized over each
>>>>>> query,
>>>>>> > so it is the most economical.
>>>>>> >
>>>>>> > --- Translated with DeepL.com (free version)
>>>>>> >
>>>>>> >
>>>>>> > ------------------------
>>>>>> > With warm regard
>>>>>> > Xiaoxiang Yu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>>>>>> That's
>>>>>> >> great.
>>>>>> >>
>>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>>> offered
>>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>>> faster
>>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>>> rows in
>>>>>> >> 2.9
>>>>>> >> seconds)
>>>>>> >>
>>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>>> that I
>>>>>> >> can defend my demonstration.
>>>>>> >>
>>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>>> here is
>>>>>> >> > that
>>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>>> that
>>>>>> >> > correct?"
>>>>>> >> >
>>>>>> >> > You are correct.
>>>>>> >> >
>>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>>> combination
>>>>>> >> of
>>>>>> >> > ... "
>>>>>> >> >
>>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>>> but not
>>>>>> >> > released),
>>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>>> estimation
>>>>>> >> but I
>>>>>> >> > am
>>>>>> >> > quite certain about it).
>>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>>> micro-batch
>>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>>> need to
>>>>>> >> run
>>>>>> >> > and monitor a long-running
>>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>>> knowledge of
>>>>>> >> > it.
>>>>>> >> >
>>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>>> >> > can tolerate?
>>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>>> >> >
>>>>>> >> > ------------------------
>>>>>> >> > With warm regard
>>>>>> >> > Xiaoxiang Yu
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>>>>>> <na...@vnpay.vn.invalid>
>>>>>> >> wrote:
>>>>>> >> >
>>>>>> >> > > Druid is better in
>>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > >
>>>>>> >> > > ==========================
>>>>>> >> > >
>>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>>> >> > >
>>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>>> here is
>>>>>> >> that
>>>>>> >> > > kylin has lag time due to model update of new segment build,
>>>>>> is that
>>>>>> >> > > correct?
>>>>>> >> > >
>>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>>> combination of
>>>>>> >> :
>>>>>> >> > >
>>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>>> >> > > realtime capability ?
>>>>>> >> > >
>>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>>> >> integrate it
>>>>>> >> > > with (time - lag kylin cube).
>>>>>> >> > >
>>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>>>> wrote:
>>>>>> >> > >
>>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>>> much
>>>>>> >> about
>>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>>> know
>>>>>> >> are :
>>>>>> >> > > > new UI, fully on K8s etc).
>>>>>> >> > > >
>>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>>> than Kylin
>>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>>> which I
>>>>>> >> used
>>>>>> >> > two
>>>>>> >> > > > years ago):
>>>>>> >> > > >
>>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>>> Druid had
>>>>>> >> > > better
>>>>>> >> > > > response time for small queries two years ago.)
>>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>>> >> K8S/public
>>>>>> >> > > >   cloud platform as your deployment platform.
>>>>>> >> > > >
>>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>>> be
>>>>>> >> better,
>>>>>> >> > > > like:
>>>>>> >> > > >
>>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>>> a more
>>>>>> >> > > > exact-match/fine-grained
>>>>>> >> > > >   Index for queries containing different `Group By
>>>>>> dimensions`.
>>>>>> >> > > > - User-friendly UI for modeling.
>>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>>> supports
>>>>>> >> > ODBC
>>>>>> >> > > > well)
>>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>>> >> > > >
>>>>>> >> > > > ------------------------
>>>>>> >> > > > With warm regard
>>>>>> >> > > > Xiaoxiang Yu
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>>> <na...@vnpay.vn.invalid>
>>>>>> >> > > wrote:
>>>>>> >> > > >
>>>>>> >> > > >> Dear Xiaoxiang,
>>>>>> >> > > >> Sirs/Madams,
>>>>>> >> > > >>
>>>>>> >> > > >> May I post my boss's question:
>>>>>> >> > > >>
>>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>>> compared to
>>>>>> >> > Pinot
>>>>>> >> > > >> and
>>>>>> >> > > >> Druid?
>>>>>> >> > > >>
>>>>>> >> > > >> Please kindly let me know
>>>>>> >> > > >>
>>>>>> >> > > >> Thank you very much and best regards
>>>>>> >> > > >>
>>>>>> >> > > >
>>>>>> >> > >
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Done. Github branch changed to kylin5.

------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> A JIRA ticket has been opened, waiting for INFRA :
> https://issues.apache.org/jira/browse/INFRA-25238 .
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you Xiaoxiang, please update me when you have changed your default
>> branch. In case people are impressed by the numbers then I hope to turn
>> this situation to reverse direction.
>>
>> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>>> The default branch is for 4.X which is a maintained branch, the active
>>> branch is kylin5.
>>> I will change the default branch to kylin5 later.
>>>
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>>
>>>> Hi Xiaoxiang, Sirs / Madams
>>>>
>>>> Can you see the atttached photo
>>>>
>>>> My boss asked that why druid commit code regularly but kylin had not
>>>> been committed since July
>>>>
>>>>
>>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>
>>>>> I think so.
>>>>>
>>>>> Response time is not the only factor to make a decision. Kylin could
>>>>> be cheaper
>>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>>> guarantee
>>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc
>>>>> query scenario.
>>>>>
>>>>> By the way, Youzan and Kyligence combine them together to provide
>>>>> unified data analytics services for their customers.
>>>>>
>>>>> ------------------------
>>>>> With warm regard
>>>>> Xiaoxiang Yu
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>> wrote:
>>>>>
>>>>>> Hi Xiaoxiang, thank you
>>>>>>
>>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>>> of
>>>>>> kylin, I have a thought that the query execution has been done once
>>>>>> and
>>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>>> computation,
>>>>>> is that true)?
>>>>>>
>>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>>>
>>>>>> > Following text is part of an article(
>>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> ===============================================================================
>>>>>> >
>>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>>> of its
>>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>>> condition
>>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>>> is, the
>>>>>> > more obvious the advantages of using Kylin are; in particular,
>>>>>> Kylin is
>>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>>> distinct),
>>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>>> de-weighting
>>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>>> especially
>>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>>> Dashboard, all
>>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin
>>>>>> to build
>>>>>> > their data service platforms, providing millions to tens of
>>>>>> millions of
>>>>>> > queries per day, and most of the queries can be completed within 2
>>>>>> - 3
>>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>>> > scenario.
>>>>>> >
>>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>>> power and
>>>>>> > is more suitable when the query request is more flexible, or when
>>>>>> there is
>>>>>> > a need for detailed queries with low concurrency. Scenarios
>>>>>> include: very
>>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>>> user
>>>>>> > label filtering, not a large amount of concurrency of complex
>>>>>> on-the-spot
>>>>>> > query and so on. If the amount of data and access is large, you
>>>>>> need to
>>>>>> > deploy a distributed ClickHouse cluster, which is a higher
>>>>>> challenge for
>>>>>> > operation and maintenance.
>>>>>> >
>>>>>> > If some queries are very flexible but infrequent, it is more
>>>>>> > resource-efficient to use now-computing. Since the number of
>>>>>> queries is
>>>>>> > small, even if each query consumes a lot of computational
>>>>>> resources, it is
>>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>>> and the
>>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>>> query
>>>>>> > volume is large, and by using large computational resources to save
>>>>>> the
>>>>>> > results, the upfront computational cost can be amortized over each
>>>>>> query,
>>>>>> > so it is the most economical.
>>>>>> >
>>>>>> > --- Translated with DeepL.com (free version)
>>>>>> >
>>>>>> >
>>>>>> > ------------------------
>>>>>> > With warm regard
>>>>>> > Xiaoxiang Yu
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>>> wrote:
>>>>>> >
>>>>>> >> Thank you Xiaoxiang for the near real time streaming feature.
>>>>>> That's
>>>>>> >> great.
>>>>>> >>
>>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>>> offered
>>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>>> faster
>>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>>> rows in
>>>>>> >> 2.9
>>>>>> >> seconds)
>>>>>> >>
>>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>>> that I
>>>>>> >> can defend my demonstration.
>>>>>> >>
>>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>>> here is
>>>>>> >> > that
>>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>>> that
>>>>>> >> > correct?"
>>>>>> >> >
>>>>>> >> > You are correct.
>>>>>> >> >
>>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>>> combination
>>>>>> >> of
>>>>>> >> > ... "
>>>>>> >> >
>>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>>> but not
>>>>>> >> > released),
>>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>>> estimation
>>>>>> >> but I
>>>>>> >> > am
>>>>>> >> > quite certain about it).
>>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>>> micro-batch
>>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>>> need to
>>>>>> >> run
>>>>>> >> > and monitor a long-running
>>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>>> knowledge of
>>>>>> >> > it.
>>>>>> >> >
>>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>>> >> > can tolerate?
>>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>>> >> >
>>>>>> >> > ------------------------
>>>>>> >> > With warm regard
>>>>>> >> > Xiaoxiang Yu
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
>>>>>> <na...@vnpay.vn.invalid>
>>>>>> >> wrote:
>>>>>> >> >
>>>>>> >> > > Druid is better in
>>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > >
>>>>>> >> > > ==========================
>>>>>> >> > >
>>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>>> >> > >
>>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>>> here is
>>>>>> >> that
>>>>>> >> > > kylin has lag time due to model update of new segment build,
>>>>>> is that
>>>>>> >> > > correct?
>>>>>> >> > >
>>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>>> combination of
>>>>>> >> :
>>>>>> >> > >
>>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>>> >> > > realtime capability ?
>>>>>> >> > >
>>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>>> >> integrate it
>>>>>> >> > > with (time - lag kylin cube).
>>>>>> >> > >
>>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>>>> wrote:
>>>>>> >> > >
>>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>>> much
>>>>>> >> about
>>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>>> know
>>>>>> >> are :
>>>>>> >> > > > new UI, fully on K8s etc).
>>>>>> >> > > >
>>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>>> than Kylin
>>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>>> which I
>>>>>> >> used
>>>>>> >> > two
>>>>>> >> > > > years ago):
>>>>>> >> > > >
>>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>>> Druid had
>>>>>> >> > > better
>>>>>> >> > > > response time for small queries two years ago.)
>>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>>> >> K8S/public
>>>>>> >> > > >   cloud platform as your deployment platform.
>>>>>> >> > > >
>>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>>> be
>>>>>> >> better,
>>>>>> >> > > > like:
>>>>>> >> > > >
>>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>>> a more
>>>>>> >> > > > exact-match/fine-grained
>>>>>> >> > > >   Index for queries containing different `Group By
>>>>>> dimensions`.
>>>>>> >> > > > - User-friendly UI for modeling.
>>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>>> supports
>>>>>> >> > ODBC
>>>>>> >> > > > well)
>>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>>> >> > > >
>>>>>> >> > > > ------------------------
>>>>>> >> > > > With warm regard
>>>>>> >> > > > Xiaoxiang Yu
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > >
>>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>>> <na...@vnpay.vn.invalid>
>>>>>> >> > > wrote:
>>>>>> >> > > >
>>>>>> >> > > >> Dear Xiaoxiang,
>>>>>> >> > > >> Sirs/Madams,
>>>>>> >> > > >>
>>>>>> >> > > >> May I post my boss's question:
>>>>>> >> > > >>
>>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>>> compared to
>>>>>> >> > Pinot
>>>>>> >> > > >> and
>>>>>> >> > > >> Druid?
>>>>>> >> > > >>
>>>>>> >> > > >> Please kindly let me know
>>>>>> >> > > >>
>>>>>> >> > > >> Thank you very much and best regards
>>>>>> >> > > >>
>>>>>> >> > > >
>>>>>> >> > >
>>>>>> >> >
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
A JIRA ticket has been opened, waiting for INFRA :
https://issues.apache.org/jira/browse/INFRA-25238 .
------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang, please update me when you have changed your default
> branch. In case people are impressed by the numbers then I hope to turn
> this situation to reverse direction.
>
> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> The default branch is for 4.X which is a maintained branch, the active
>> branch is kylin5.
>> I will change the default branch to kylin5 later.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>>> Hi Xiaoxiang, Sirs / Madams
>>>
>>> Can you see the atttached photo
>>>
>>> My boss asked that why druid commit code regularly but kylin had not
>>> been committed since July
>>>
>>>
>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>>
>>>> I think so.
>>>>
>>>> Response time is not the only factor to make a decision. Kylin could be
>>>> cheaper
>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>> guarantee
>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>>>> scenario.
>>>>
>>>> By the way, Youzan and Kyligence combine them together to provide
>>>> unified data analytics services for their customers.
>>>>
>>>> ------------------------
>>>> With warm regard
>>>> Xiaoxiang Yu
>>>>
>>>>
>>>>
>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>> wrote:
>>>>
>>>>> Hi Xiaoxiang, thank you
>>>>>
>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>> of
>>>>> kylin, I have a thought that the query execution has been done once and
>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>> computation,
>>>>> is that true)?
>>>>>
>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>>
>>>>> > Following text is part of an article(
>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>> >
>>>>> >
>>>>> >
>>>>> ===============================================================================
>>>>> >
>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>> of its
>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>> condition
>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>> is, the
>>>>> > more obvious the advantages of using Kylin are; in particular, Kylin
>>>>> is
>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>> distinct),
>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>> de-weighting
>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>> especially
>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>> Dashboard, all
>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>>>> build
>>>>> > their data service platforms, providing millions to tens of millions
>>>>> of
>>>>> > queries per day, and most of the queries can be completed within 2 -
>>>>> 3
>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>> > scenario.
>>>>> >
>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>> power and
>>>>> > is more suitable when the query request is more flexible, or when
>>>>> there is
>>>>> > a need for detailed queries with low concurrency. Scenarios include:
>>>>> very
>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>> user
>>>>> > label filtering, not a large amount of concurrency of complex
>>>>> on-the-spot
>>>>> > query and so on. If the amount of data and access is large, you need
>>>>> to
>>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>>>> for
>>>>> > operation and maintenance.
>>>>> >
>>>>> > If some queries are very flexible but infrequent, it is more
>>>>> > resource-efficient to use now-computing. Since the number of queries
>>>>> is
>>>>> > small, even if each query consumes a lot of computational resources,
>>>>> it is
>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>> and the
>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>> query
>>>>> > volume is large, and by using large computational resources to save
>>>>> the
>>>>> > results, the upfront computational cost can be amortized over each
>>>>> query,
>>>>> > so it is the most economical.
>>>>> >
>>>>> > --- Translated with DeepL.com (free version)
>>>>> >
>>>>> >
>>>>> > ------------------------
>>>>> > With warm regard
>>>>> > Xiaoxiang Yu
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>> wrote:
>>>>> >
>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>>>> >> great.
>>>>> >>
>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>> offered
>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>> faster
>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>> rows in
>>>>> >> 2.9
>>>>> >> seconds)
>>>>> >>
>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>> that I
>>>>> >> can defend my demonstration.
>>>>> >>
>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>>>>> wrote:
>>>>> >>
>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>> here is
>>>>> >> > that
>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > correct?"
>>>>> >> >
>>>>> >> > You are correct.
>>>>> >> >
>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>> combination
>>>>> >> of
>>>>> >> > ... "
>>>>> >> >
>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>> but not
>>>>> >> > released),
>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>> estimation
>>>>> >> but I
>>>>> >> > am
>>>>> >> > quite certain about it).
>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>> micro-batch
>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>> need to
>>>>> >> run
>>>>> >> > and monitor a long-running
>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>> knowledge of
>>>>> >> > it.
>>>>> >> >
>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>> >> > can tolerate?
>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>> >> >
>>>>> >> > ------------------------
>>>>> >> > With warm regard
>>>>> >> > Xiaoxiang Yu
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>>>>> >
>>>>> >> wrote:
>>>>> >> >
>>>>> >> > > Druid is better in
>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>> >> > >
>>>>> >> > > ==========================
>>>>> >> > >
>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>> >> > >
>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>> here is
>>>>> >> that
>>>>> >> > > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > > correct?
>>>>> >> > >
>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>> combination of
>>>>> >> :
>>>>> >> > >
>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>> >> > > realtime capability ?
>>>>> >> > >
>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>> >> integrate it
>>>>> >> > > with (time - lag kylin cube).
>>>>> >> > >
>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>>> wrote:
>>>>> >> > >
>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>> much
>>>>> >> about
>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>> know
>>>>> >> are :
>>>>> >> > > > new UI, fully on K8s etc).
>>>>> >> > > >
>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>> than Kylin
>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>> which I
>>>>> >> used
>>>>> >> > two
>>>>> >> > > > years ago):
>>>>> >> > > >
>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>> Druid had
>>>>> >> > > better
>>>>> >> > > > response time for small queries two years ago.)
>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>> >> K8S/public
>>>>> >> > > >   cloud platform as your deployment platform.
>>>>> >> > > >
>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>> be
>>>>> >> better,
>>>>> >> > > > like:
>>>>> >> > > >
>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>> a more
>>>>> >> > > > exact-match/fine-grained
>>>>> >> > > >   Index for queries containing different `Group By
>>>>> dimensions`.
>>>>> >> > > > - User-friendly UI for modeling.
>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>> supports
>>>>> >> > ODBC
>>>>> >> > > > well)
>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>> >> > > >
>>>>> >> > > > ------------------------
>>>>> >> > > > With warm regard
>>>>> >> > > > Xiaoxiang Yu
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>> <na...@vnpay.vn.invalid>
>>>>> >> > > wrote:
>>>>> >> > > >
>>>>> >> > > >> Dear Xiaoxiang,
>>>>> >> > > >> Sirs/Madams,
>>>>> >> > > >>
>>>>> >> > > >> May I post my boss's question:
>>>>> >> > > >>
>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>> compared to
>>>>> >> > Pinot
>>>>> >> > > >> and
>>>>> >> > > >> Druid?
>>>>> >> > > >>
>>>>> >> > > >> Please kindly let me know
>>>>> >> > > >>
>>>>> >> > > >> Thank you very much and best regards
>>>>> >> > > >>
>>>>> >> > > >
>>>>> >> > >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
A JIRA ticket has been opened, waiting for INFRA :
https://issues.apache.org/jira/browse/INFRA-25238 .
------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang, please update me when you have changed your default
> branch. In case people are impressed by the numbers then I hope to turn
> this situation to reverse direction.
>
> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> The default branch is for 4.X which is a maintained branch, the active
>> branch is kylin5.
>> I will change the default branch to kylin5 later.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>>> Hi Xiaoxiang, Sirs / Madams
>>>
>>> Can you see the atttached photo
>>>
>>> My boss asked that why druid commit code regularly but kylin had not
>>> been committed since July
>>>
>>>
>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>>
>>>> I think so.
>>>>
>>>> Response time is not the only factor to make a decision. Kylin could be
>>>> cheaper
>>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>>> guarantee
>>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>>>> scenario.
>>>>
>>>> By the way, Youzan and Kyligence combine them together to provide
>>>> unified data analytics services for their customers.
>>>>
>>>> ------------------------
>>>> With warm regard
>>>> Xiaoxiang Yu
>>>>
>>>>
>>>>
>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>> wrote:
>>>>
>>>>> Hi Xiaoxiang, thank you
>>>>>
>>>>> In case my client uses cloud computing service like gcp or aws, which
>>>>> will cost more: precalculation feature of kylin or clickhouse (incase
>>>>> of
>>>>> kylin, I have a thought that the query execution has been done once and
>>>>> stored in cube to be used many times so kylin uses less cloud
>>>>> computation,
>>>>> is that true)?
>>>>>
>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>>
>>>>> > Following text is part of an article(
>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>>> >
>>>>> >
>>>>> >
>>>>> ===============================================================================
>>>>> >
>>>>> > Kylin is suitable for aggregation queries with fixed modes because
>>>>> of its
>>>>> > pre-calculated technology, for example, join, group by, and where
>>>>> condition
>>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>>> is, the
>>>>> > more obvious the advantages of using Kylin are; in particular, Kylin
>>>>> is
>>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>>> distinct),
>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>>> de-weighting
>>>>> > (count distinct), Top N, Percentile and other scenarios are
>>>>> especially
>>>>> > huge, and it is used in a large number of scenarios, such as
>>>>> Dashboard, all
>>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>>>> build
>>>>> > their data service platforms, providing millions to tens of millions
>>>>> of
>>>>> > queries per day, and most of the queries can be completed within 2 -
>>>>> 3
>>>>> > seconds. There is no better alternative for such a high concurrency
>>>>> > scenario.
>>>>> >
>>>>> > ClickHouse, because of its MPP architecture, has high computing
>>>>> power and
>>>>> > is more suitable when the query request is more flexible, or when
>>>>> there is
>>>>> > a need for detailed queries with low concurrency. Scenarios include:
>>>>> very
>>>>> > many columns and where conditions are arbitrarily combined with the
>>>>> user
>>>>> > label filtering, not a large amount of concurrency of complex
>>>>> on-the-spot
>>>>> > query and so on. If the amount of data and access is large, you need
>>>>> to
>>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>>>> for
>>>>> > operation and maintenance.
>>>>> >
>>>>> > If some queries are very flexible but infrequent, it is more
>>>>> > resource-efficient to use now-computing. Since the number of queries
>>>>> is
>>>>> > small, even if each query consumes a lot of computational resources,
>>>>> it is
>>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>>> and the
>>>>> > query volume is large, it is more suitable for Kylin, because the
>>>>> query
>>>>> > volume is large, and by using large computational resources to save
>>>>> the
>>>>> > results, the upfront computational cost can be amortized over each
>>>>> query,
>>>>> > so it is the most economical.
>>>>> >
>>>>> > --- Translated with DeepL.com (free version)
>>>>> >
>>>>> >
>>>>> > ------------------------
>>>>> > With warm regard
>>>>> > Xiaoxiang Yu
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>> wrote:
>>>>> >
>>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>>>> >> great.
>>>>> >>
>>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>>> offered
>>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>>> faster
>>>>> >> than my demonstration (I used Kylin to do calculating 1 billion
>>>>> rows in
>>>>> >> 2.9
>>>>> >> seconds)
>>>>> >>
>>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>>> that I
>>>>> >> can defend my demonstration.
>>>>> >>
>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org>
>>>>> wrote:
>>>>> >>
>>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>>> here is
>>>>> >> > that
>>>>> >> > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > correct?"
>>>>> >> >
>>>>> >> > You are correct.
>>>>> >> >
>>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>>> combination
>>>>> >> of
>>>>> >> > ... "
>>>>> >> >
>>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>>> but not
>>>>> >> > released),
>>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>>> estimation
>>>>> >> but I
>>>>> >> > am
>>>>> >> > quite certain about it).
>>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>>> micro-batch
>>>>> >> > aggregation and persistence periodically. The price is that you
>>>>> need to
>>>>> >> run
>>>>> >> > and monitor a long-running
>>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>>> knowledge of
>>>>> >> > it.
>>>>> >> >
>>>>> >> > I am curious about what is the maximum time-lag your customers
>>>>> >> > can tolerate?
>>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>>> >> >
>>>>> >> > ------------------------
>>>>> >> > With warm regard
>>>>> >> > Xiaoxiang Yu
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>>>>> >
>>>>> >> wrote:
>>>>> >> >
>>>>> >> > > Druid is better in
>>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>>> >> > >
>>>>> >> > > ==========================
>>>>> >> > >
>>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>>> >> > >
>>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>>> here is
>>>>> >> that
>>>>> >> > > kylin has lag time due to model update of new segment build, is
>>>>> that
>>>>> >> > > correct?
>>>>> >> > >
>>>>> >> > > If that is true, then can you suggest a work-around of
>>>>> combination of
>>>>> >> :
>>>>> >> > >
>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>>> >> > > realtime capability ?
>>>>> >> > >
>>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>>> >> integrate it
>>>>> >> > > with (time - lag kylin cube).
>>>>> >> > >
>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>>> wrote:
>>>>> >> > >
>>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>>> much
>>>>> >> about
>>>>> >> > > >  the change of Druid in these two years. New features that I
>>>>> know
>>>>> >> are :
>>>>> >> > > > new UI, fully on K8s etc).
>>>>> >> > > >
>>>>> >> > > > Here are some cases you should consider using Druid other
>>>>> than Kylin
>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid
>>>>> which I
>>>>> >> used
>>>>> >> > two
>>>>> >> > > > years ago):
>>>>> >> > > >
>>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>>> Druid had
>>>>> >> > > better
>>>>> >> > > > response time for small queries two years ago.)
>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>>> >> K8S/public
>>>>> >> > > >   cloud platform as your deployment platform.
>>>>> >> > > >
>>>>> >> > > > But I do think there are many scenarios in which Kylin could
>>>>> be
>>>>> >> better,
>>>>> >> > > > like:
>>>>> >> > > >
>>>>> >> > > > - Better performance for complex/big queries. Kylin can have
>>>>> a more
>>>>> >> > > > exact-match/fine-grained
>>>>> >> > > >   Index for queries containing different `Group By
>>>>> dimensions`.
>>>>> >> > > > - User-friendly UI for modeling.
>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>>> supports
>>>>> >> > ODBC
>>>>> >> > > > well)
>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>>> >> > > >
>>>>> >> > > > ------------------------
>>>>> >> > > > With warm regard
>>>>> >> > > > Xiaoxiang Yu
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > >
>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>>> <na...@vnpay.vn.invalid>
>>>>> >> > > wrote:
>>>>> >> > > >
>>>>> >> > > >> Dear Xiaoxiang,
>>>>> >> > > >> Sirs/Madams,
>>>>> >> > > >>
>>>>> >> > > >> May I post my boss's question:
>>>>> >> > > >>
>>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>>> compared to
>>>>> >> > Pinot
>>>>> >> > > >> and
>>>>> >> > > >> Druid?
>>>>> >> > > >>
>>>>> >> > > >> Please kindly let me know
>>>>> >> > > >>
>>>>> >> > > >> Thank you very much and best regards
>>>>> >> > > >>
>>>>> >> > > >
>>>>> >> > >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>>
>>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you Xiaoxiang, please update me when you have changed your default
branch. In case people are impressed by the numbers then I hope to turn
this situation to reverse direction.

On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> The default branch is for 4.X which is a maintained branch, the active
> branch is kylin5.
> I will change the default branch to kylin5 later.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, Sirs / Madams
>>
>> Can you see the atttached photo
>>
>> My boss asked that why druid commit code regularly but kylin had not been
>> committed since July
>>
>>
>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>>> I think so.
>>>
>>> Response time is not the only factor to make a decision. Kylin could be
>>> cheaper
>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>> guarantee
>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>>> scenario.
>>>
>>> By the way, Youzan and Kyligence combine them together to provide
>>> unified data analytics services for their customers.
>>>
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>>
>>>> Hi Xiaoxiang, thank you
>>>>
>>>> In case my client uses cloud computing service like gcp or aws, which
>>>> will cost more: precalculation feature of kylin or clickhouse (incase of
>>>> kylin, I have a thought that the query execution has been done once and
>>>> stored in cube to be used many times so kylin uses less cloud
>>>> computation,
>>>> is that true)?
>>>>
>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>
>>>> > Following text is part of an article(
>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>> >
>>>> >
>>>> >
>>>> ===============================================================================
>>>> >
>>>> > Kylin is suitable for aggregation queries with fixed modes because of
>>>> its
>>>> > pre-calculated technology, for example, join, group by, and where
>>>> condition
>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>> is, the
>>>> > more obvious the advantages of using Kylin are; in particular, Kylin
>>>> is
>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>> distinct),
>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>> de-weighting
>>>> > (count distinct), Top N, Percentile and other scenarios are especially
>>>> > huge, and it is used in a large number of scenarios, such as
>>>> Dashboard, all
>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>>> build
>>>> > their data service platforms, providing millions to tens of millions
>>>> of
>>>> > queries per day, and most of the queries can be completed within 2 - 3
>>>> > seconds. There is no better alternative for such a high concurrency
>>>> > scenario.
>>>> >
>>>> > ClickHouse, because of its MPP architecture, has high computing power
>>>> and
>>>> > is more suitable when the query request is more flexible, or when
>>>> there is
>>>> > a need for detailed queries with low concurrency. Scenarios include:
>>>> very
>>>> > many columns and where conditions are arbitrarily combined with the
>>>> user
>>>> > label filtering, not a large amount of concurrency of complex
>>>> on-the-spot
>>>> > query and so on. If the amount of data and access is large, you need
>>>> to
>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>>> for
>>>> > operation and maintenance.
>>>> >
>>>> > If some queries are very flexible but infrequent, it is more
>>>> > resource-efficient to use now-computing. Since the number of queries
>>>> is
>>>> > small, even if each query consumes a lot of computational resources,
>>>> it is
>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>> and the
>>>> > query volume is large, it is more suitable for Kylin, because the
>>>> query
>>>> > volume is large, and by using large computational resources to save
>>>> the
>>>> > results, the upfront computational cost can be amortized over each
>>>> query,
>>>> > so it is the most economical.
>>>> >
>>>> > --- Translated with DeepL.com (free version)
>>>> >
>>>> >
>>>> > ------------------------
>>>> > With warm regard
>>>> > Xiaoxiang Yu
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>> wrote:
>>>> >
>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>>> >> great.
>>>> >>
>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>> offered
>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>> faster
>>>> >> than my demonstration (I used Kylin to do calculating 1 billion rows
>>>> in
>>>> >> 2.9
>>>> >> seconds)
>>>> >>
>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>> that I
>>>> >> can defend my demonstration.
>>>> >>
>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>> >>
>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>> here is
>>>> >> > that
>>>> >> > kylin has lag time due to model update of new segment build, is
>>>> that
>>>> >> > correct?"
>>>> >> >
>>>> >> > You are correct.
>>>> >> >
>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>> combination
>>>> >> of
>>>> >> > ... "
>>>> >> >
>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>> but not
>>>> >> > released),
>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>> estimation
>>>> >> but I
>>>> >> > am
>>>> >> > quite certain about it).
>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>> micro-batch
>>>> >> > aggregation and persistence periodically. The price is that you
>>>> need to
>>>> >> run
>>>> >> > and monitor a long-running
>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>> knowledge of
>>>> >> > it.
>>>> >> >
>>>> >> > I am curious about what is the maximum time-lag your customers
>>>> >> > can tolerate?
>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>> >> >
>>>> >> > ------------------------
>>>> >> > With warm regard
>>>> >> > Xiaoxiang Yu
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > > Druid is better in
>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>> >> > >
>>>> >> > > ==========================
>>>> >> > >
>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>> >> > >
>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>> here is
>>>> >> that
>>>> >> > > kylin has lag time due to model update of new segment build, is
>>>> that
>>>> >> > > correct?
>>>> >> > >
>>>> >> > > If that is true, then can you suggest a work-around of
>>>> combination of
>>>> >> :
>>>> >> > >
>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>> >> > > realtime capability ?
>>>> >> > >
>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>> >> integrate it
>>>> >> > > with (time - lag kylin cube).
>>>> >> > >
>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>> wrote:
>>>> >> > >
>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>> much
>>>> >> about
>>>> >> > > >  the change of Druid in these two years. New features that I
>>>> know
>>>> >> are :
>>>> >> > > > new UI, fully on K8s etc).
>>>> >> > > >
>>>> >> > > > Here are some cases you should consider using Druid other than
>>>> Kylin
>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which
>>>> I
>>>> >> used
>>>> >> > two
>>>> >> > > > years ago):
>>>> >> > > >
>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>> Druid had
>>>> >> > > better
>>>> >> > > > response time for small queries two years ago.)
>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>> >> K8S/public
>>>> >> > > >   cloud platform as your deployment platform.
>>>> >> > > >
>>>> >> > > > But I do think there are many scenarios in which Kylin could be
>>>> >> better,
>>>> >> > > > like:
>>>> >> > > >
>>>> >> > > > - Better performance for complex/big queries. Kylin can have a
>>>> more
>>>> >> > > > exact-match/fine-grained
>>>> >> > > >   Index for queries containing different `Group By dimensions`.
>>>> >> > > > - User-friendly UI for modeling.
>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>> supports
>>>> >> > ODBC
>>>> >> > > > well)
>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>> >> > > >
>>>> >> > > > ------------------------
>>>> >> > > > With warm regard
>>>> >> > > > Xiaoxiang Yu
>>>> >> > > >
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>> <na...@vnpay.vn.invalid>
>>>> >> > > wrote:
>>>> >> > > >
>>>> >> > > >> Dear Xiaoxiang,
>>>> >> > > >> Sirs/Madams,
>>>> >> > > >>
>>>> >> > > >> May I post my boss's question:
>>>> >> > > >>
>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>> compared to
>>>> >> > Pinot
>>>> >> > > >> and
>>>> >> > > >> Druid?
>>>> >> > > >>
>>>> >> > > >> Please kindly let me know
>>>> >> > > >>
>>>> >> > > >> Thank you very much and best regards
>>>> >> > > >>
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you Xiaoxiang, please update me when you have changed your default
branch. In case people are impressed by the numbers then I hope to turn
this situation to reverse direction.

On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <xx...@apache.org> wrote:

> The default branch is for 4.X which is a maintained branch, the active
> branch is kylin5.
> I will change the default branch to kylin5 later.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, Sirs / Madams
>>
>> Can you see the atttached photo
>>
>> My boss asked that why druid commit code regularly but kylin had not been
>> committed since July
>>
>>
>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>>> I think so.
>>>
>>> Response time is not the only factor to make a decision. Kylin could be
>>> cheaper
>>> when the query pattern is suitable for the Kylin model, and Kylin can
>>> guarantee
>>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>>> scenario.
>>>
>>> By the way, Youzan and Kyligence combine them together to provide
>>> unified data analytics services for their customers.
>>>
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>>
>>>> Hi Xiaoxiang, thank you
>>>>
>>>> In case my client uses cloud computing service like gcp or aws, which
>>>> will cost more: precalculation feature of kylin or clickhouse (incase of
>>>> kylin, I have a thought that the query execution has been done once and
>>>> stored in cube to be used many times so kylin uses less cloud
>>>> computation,
>>>> is that true)?
>>>>
>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>>
>>>> > Following text is part of an article(
>>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>>> >
>>>> >
>>>> >
>>>> ===============================================================================
>>>> >
>>>> > Kylin is suitable for aggregation queries with fixed modes because of
>>>> its
>>>> > pre-calculated technology, for example, join, group by, and where
>>>> condition
>>>> > modes in SQL are relatively fixed, etc. The larger the data volume
>>>> is, the
>>>> > more obvious the advantages of using Kylin are; in particular, Kylin
>>>> is
>>>> > particularly advantageous in the scenarios of de-emphasis (count
>>>> distinct),
>>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>>> de-weighting
>>>> > (count distinct), Top N, Percentile and other scenarios are especially
>>>> > huge, and it is used in a large number of scenarios, such as
>>>> Dashboard, all
>>>> > kinds of reports, large-screen display, traffic statistics, and user
>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>>> build
>>>> > their data service platforms, providing millions to tens of millions
>>>> of
>>>> > queries per day, and most of the queries can be completed within 2 - 3
>>>> > seconds. There is no better alternative for such a high concurrency
>>>> > scenario.
>>>> >
>>>> > ClickHouse, because of its MPP architecture, has high computing power
>>>> and
>>>> > is more suitable when the query request is more flexible, or when
>>>> there is
>>>> > a need for detailed queries with low concurrency. Scenarios include:
>>>> very
>>>> > many columns and where conditions are arbitrarily combined with the
>>>> user
>>>> > label filtering, not a large amount of concurrency of complex
>>>> on-the-spot
>>>> > query and so on. If the amount of data and access is large, you need
>>>> to
>>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>>> for
>>>> > operation and maintenance.
>>>> >
>>>> > If some queries are very flexible but infrequent, it is more
>>>> > resource-efficient to use now-computing. Since the number of queries
>>>> is
>>>> > small, even if each query consumes a lot of computational resources,
>>>> it is
>>>> > still cost-effective overall. If some queries have a fixed pattern
>>>> and the
>>>> > query volume is large, it is more suitable for Kylin, because the
>>>> query
>>>> > volume is large, and by using large computational resources to save
>>>> the
>>>> > results, the upfront computational cost can be amortized over each
>>>> query,
>>>> > so it is the most economical.
>>>> >
>>>> > --- Translated with DeepL.com (free version)
>>>> >
>>>> >
>>>> > ------------------------
>>>> > With warm regard
>>>> > Xiaoxiang Yu
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>> wrote:
>>>> >
>>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>>> >> great.
>>>> >>
>>>> >> This morning there has been a new challenge to my team: clickhouse
>>>> offered
>>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>>> faster
>>>> >> than my demonstration (I used Kylin to do calculating 1 billion rows
>>>> in
>>>> >> 2.9
>>>> >> seconds)
>>>> >>
>>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>>> that I
>>>> >> can defend my demonstration.
>>>> >>
>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>> >>
>>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>>> here is
>>>> >> > that
>>>> >> > kylin has lag time due to model update of new segment build, is
>>>> that
>>>> >> > correct?"
>>>> >> >
>>>> >> > You are correct.
>>>> >> >
>>>> >> > 2. "If that is true, then can you suggest a work-around of
>>>> combination
>>>> >> of
>>>> >> > ... "
>>>> >> >
>>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>>> but not
>>>> >> > released),
>>>> >> > which can make the time-lag to about 3 minutes(that is my
>>>> estimation
>>>> >> but I
>>>> >> > am
>>>> >> > quite certain about it).
>>>> >> > NRT stands for 'near real-time', it will run a job and do
>>>> micro-batch
>>>> >> > aggregation and persistence periodically. The price is that you
>>>> need to
>>>> >> run
>>>> >> > and monitor a long-running
>>>> >> >  job. This feature is based on Spark Streaming, so you need
>>>> knowledge of
>>>> >> > it.
>>>> >> >
>>>> >> > I am curious about what is the maximum time-lag your customers
>>>> >> > can tolerate?
>>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>>> >> >
>>>> >> > ------------------------
>>>> >> > With warm regard
>>>> >> > Xiaoxiang Yu
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <namdd@vnpay.vn.invalid
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > > Druid is better in
>>>> >> > > - Have a real-time datasource like Kafka etc.
>>>> >> > >
>>>> >> > > ==========================
>>>> >> > >
>>>> >> > > Hi Xiaoxiang, thank you for your response.
>>>> >> > >
>>>> >> > > In this important scenario of realtime alalytics, the reason
>>>> here is
>>>> >> that
>>>> >> > > kylin has lag time due to model update of new segment build, is
>>>> that
>>>> >> > > correct?
>>>> >> > >
>>>> >> > > If that is true, then can you suggest a work-around of
>>>> combination of
>>>> >> :
>>>> >> > >
>>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>>> >> > > realtime capability ?
>>>> >> > >
>>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>>> >> integrate it
>>>> >> > > with (time - lag kylin cube).
>>>> >> > >
>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>>> wrote:
>>>> >> > >
>>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>>> much
>>>> >> about
>>>> >> > > >  the change of Druid in these two years. New features that I
>>>> know
>>>> >> are :
>>>> >> > > > new UI, fully on K8s etc).
>>>> >> > > >
>>>> >> > > > Here are some cases you should consider using Druid other than
>>>> Kylin
>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which
>>>> I
>>>> >> used
>>>> >> > two
>>>> >> > > > years ago):
>>>> >> > > >
>>>> >> > > > - Have a real-time datasource like Kafka etc.
>>>> >> > > > - Most queries are small(Based on my test result, I think
>>>> Druid had
>>>> >> > > better
>>>> >> > > > response time for small queries two years ago.)
>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>>> >> K8S/public
>>>> >> > > >   cloud platform as your deployment platform.
>>>> >> > > >
>>>> >> > > > But I do think there are many scenarios in which Kylin could be
>>>> >> better,
>>>> >> > > > like:
>>>> >> > > >
>>>> >> > > > - Better performance for complex/big queries. Kylin can have a
>>>> more
>>>> >> > > > exact-match/fine-grained
>>>> >> > > >   Index for queries containing different `Group By dimensions`.
>>>> >> > > > - User-friendly UI for modeling.
>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>>> supports
>>>> >> > ODBC
>>>> >> > > > well)
>>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>>> >> > > > Hope to help you, or you are free to share your opinion.
>>>> >> > > >
>>>> >> > > > ------------------------
>>>> >> > > > With warm regard
>>>> >> > > > Xiaoxiang Yu
>>>> >> > > >
>>>> >> > > >
>>>> >> > > >
>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>>> <na...@vnpay.vn.invalid>
>>>> >> > > wrote:
>>>> >> > > >
>>>> >> > > >> Dear Xiaoxiang,
>>>> >> > > >> Sirs/Madams,
>>>> >> > > >>
>>>> >> > > >> May I post my boss's question:
>>>> >> > > >>
>>>> >> > > >> What are the pros and cons of the OLAP platform Kylin
>>>> compared to
>>>> >> > Pinot
>>>> >> > > >> and
>>>> >> > > >> Druid?
>>>> >> > > >>
>>>> >> > > >> Please kindly let me know
>>>> >> > > >>
>>>> >> > > >> Thank you very much and best regards
>>>> >> > > >>
>>>> >> > > >
>>>> >> > >
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
The default branch is for 4.X which is a maintained branch, the active
branch is kylin5.
I will change the default branch to kylin5 later.

------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, Sirs / Madams
>
> Can you see the atttached photo
>
> My boss asked that why druid commit code regularly but kylin had not been
> committed since July
>
>
> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> I think so.
>>
>> Response time is not the only factor to make a decision. Kylin could be
>> cheaper
>> when the query pattern is suitable for the Kylin model, and Kylin can
>> guarantee
>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>> scenario.
>>
>> By the way, Youzan and Kyligence combine them together to provide
>> unified data analytics services for their customers.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>>> Hi Xiaoxiang, thank you
>>>
>>> In case my client uses cloud computing service like gcp or aws, which
>>> will cost more: precalculation feature of kylin or clickhouse (incase of
>>> kylin, I have a thought that the query execution has been done once and
>>> stored in cube to be used many times so kylin uses less cloud
>>> computation,
>>> is that true)?
>>>
>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>
>>> > Following text is part of an article(
>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>> >
>>> >
>>> >
>>> ===============================================================================
>>> >
>>> > Kylin is suitable for aggregation queries with fixed modes because of
>>> its
>>> > pre-calculated technology, for example, join, group by, and where
>>> condition
>>> > modes in SQL are relatively fixed, etc. The larger the data volume is,
>>> the
>>> > more obvious the advantages of using Kylin are; in particular, Kylin is
>>> > particularly advantageous in the scenarios of de-emphasis (count
>>> distinct),
>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>> de-weighting
>>> > (count distinct), Top N, Percentile and other scenarios are especially
>>> > huge, and it is used in a large number of scenarios, such as
>>> Dashboard, all
>>> > kinds of reports, large-screen display, traffic statistics, and user
>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>> build
>>> > their data service platforms, providing millions to tens of millions of
>>> > queries per day, and most of the queries can be completed within 2 - 3
>>> > seconds. There is no better alternative for such a high concurrency
>>> > scenario.
>>> >
>>> > ClickHouse, because of its MPP architecture, has high computing power
>>> and
>>> > is more suitable when the query request is more flexible, or when
>>> there is
>>> > a need for detailed queries with low concurrency. Scenarios include:
>>> very
>>> > many columns and where conditions are arbitrarily combined with the
>>> user
>>> > label filtering, not a large amount of concurrency of complex
>>> on-the-spot
>>> > query and so on. If the amount of data and access is large, you need to
>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>> for
>>> > operation and maintenance.
>>> >
>>> > If some queries are very flexible but infrequent, it is more
>>> > resource-efficient to use now-computing. Since the number of queries is
>>> > small, even if each query consumes a lot of computational resources,
>>> it is
>>> > still cost-effective overall. If some queries have a fixed pattern and
>>> the
>>> > query volume is large, it is more suitable for Kylin, because the query
>>> > volume is large, and by using large computational resources to save the
>>> > results, the upfront computational cost can be amortized over each
>>> query,
>>> > so it is the most economical.
>>> >
>>> > --- Translated with DeepL.com (free version)
>>> >
>>> >
>>> > ------------------------
>>> > With warm regard
>>> > Xiaoxiang Yu
>>> >
>>> >
>>> >
>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>> >
>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>> >> great.
>>> >>
>>> >> This morning there has been a new challenge to my team: clickhouse
>>> offered
>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>> faster
>>> >> than my demonstration (I used Kylin to do calculating 1 billion rows
>>> in
>>> >> 2.9
>>> >> seconds)
>>> >>
>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>> that I
>>> >> can defend my demonstration.
>>> >>
>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>> >>
>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>> here is
>>> >> > that
>>> >> > kylin has lag time due to model update of new segment build, is that
>>> >> > correct?"
>>> >> >
>>> >> > You are correct.
>>> >> >
>>> >> > 2. "If that is true, then can you suggest a work-around of
>>> combination
>>> >> of
>>> >> > ... "
>>> >> >
>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>> but not
>>> >> > released),
>>> >> > which can make the time-lag to about 3 minutes(that is my estimation
>>> >> but I
>>> >> > am
>>> >> > quite certain about it).
>>> >> > NRT stands for 'near real-time', it will run a job and do
>>> micro-batch
>>> >> > aggregation and persistence periodically. The price is that you
>>> need to
>>> >> run
>>> >> > and monitor a long-running
>>> >> >  job. This feature is based on Spark Streaming, so you need
>>> knowledge of
>>> >> > it.
>>> >> >
>>> >> > I am curious about what is the maximum time-lag your customers
>>> >> > can tolerate?
>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>> >> >
>>> >> > ------------------------
>>> >> > With warm regard
>>> >> > Xiaoxiang Yu
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> >> wrote:
>>> >> >
>>> >> > > Druid is better in
>>> >> > > - Have a real-time datasource like Kafka etc.
>>> >> > >
>>> >> > > ==========================
>>> >> > >
>>> >> > > Hi Xiaoxiang, thank you for your response.
>>> >> > >
>>> >> > > In this important scenario of realtime alalytics, the reason here
>>> is
>>> >> that
>>> >> > > kylin has lag time due to model update of new segment build, is
>>> that
>>> >> > > correct?
>>> >> > >
>>> >> > > If that is true, then can you suggest a work-around of
>>> combination of
>>> >> :
>>> >> > >
>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>> >> > > realtime capability ?
>>> >> > >
>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>> >> integrate it
>>> >> > > with (time - lag kylin cube).
>>> >> > >
>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>> wrote:
>>> >> > >
>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>> much
>>> >> about
>>> >> > > >  the change of Druid in these two years. New features that I
>>> know
>>> >> are :
>>> >> > > > new UI, fully on K8s etc).
>>> >> > > >
>>> >> > > > Here are some cases you should consider using Druid other than
>>> Kylin
>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>>> >> used
>>> >> > two
>>> >> > > > years ago):
>>> >> > > >
>>> >> > > > - Have a real-time datasource like Kafka etc.
>>> >> > > > - Most queries are small(Based on my test result, I think Druid
>>> had
>>> >> > > better
>>> >> > > > response time for small queries two years ago.)
>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>> >> K8S/public
>>> >> > > >   cloud platform as your deployment platform.
>>> >> > > >
>>> >> > > > But I do think there are many scenarios in which Kylin could be
>>> >> better,
>>> >> > > > like:
>>> >> > > >
>>> >> > > > - Better performance for complex/big queries. Kylin can have a
>>> more
>>> >> > > > exact-match/fine-grained
>>> >> > > >   Index for queries containing different `Group By dimensions`.
>>> >> > > > - User-friendly UI for modeling.
>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>> supports
>>> >> > ODBC
>>> >> > > > well)
>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>> >> > > >
>>> >> > > >
>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>> >> > > > Hope to help you, or you are free to share your opinion.
>>> >> > > >
>>> >> > > > ------------------------
>>> >> > > > With warm regard
>>> >> > > > Xiaoxiang Yu
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>> <na...@vnpay.vn.invalid>
>>> >> > > wrote:
>>> >> > > >
>>> >> > > >> Dear Xiaoxiang,
>>> >> > > >> Sirs/Madams,
>>> >> > > >>
>>> >> > > >> May I post my boss's question:
>>> >> > > >>
>>> >> > > >> What are the pros and cons of the OLAP platform Kylin compared
>>> to
>>> >> > Pinot
>>> >> > > >> and
>>> >> > > >> Druid?
>>> >> > > >>
>>> >> > > >> Please kindly let me know
>>> >> > > >>
>>> >> > > >> Thank you very much and best regards
>>> >> > > >>
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> >
>>>
>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
The default branch is for 4.X which is a maintained branch, the active
branch is kylin5.
I will change the default branch to kylin5 later.

------------------------
With warm regard
Xiaoxiang Yu



On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, Sirs / Madams
>
> Can you see the atttached photo
>
> My boss asked that why druid commit code regularly but kylin had not been
> committed since July
>
>
> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:
>
>> I think so.
>>
>> Response time is not the only factor to make a decision. Kylin could be
>> cheaper
>> when the query pattern is suitable for the Kylin model, and Kylin can
>> guarantee
>> reasonable query latency. Clickhouse will be quicker in an ad hoc query
>> scenario.
>>
>> By the way, Youzan and Kyligence combine them together to provide
>> unified data analytics services for their customers.
>>
>> ------------------------
>> With warm regard
>> Xiaoxiang Yu
>>
>>
>>
>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>>
>>> Hi Xiaoxiang, thank you
>>>
>>> In case my client uses cloud computing service like gcp or aws, which
>>> will cost more: precalculation feature of kylin or clickhouse (incase of
>>> kylin, I have a thought that the query execution has been done once and
>>> stored in cube to be used many times so kylin uses less cloud
>>> computation,
>>> is that true)?
>>>
>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>>
>>> > Following text is part of an article(
>>> > https://zhuanlan.zhihu.com/p/343394287) .
>>> >
>>> >
>>> >
>>> ===============================================================================
>>> >
>>> > Kylin is suitable for aggregation queries with fixed modes because of
>>> its
>>> > pre-calculated technology, for example, join, group by, and where
>>> condition
>>> > modes in SQL are relatively fixed, etc. The larger the data volume is,
>>> the
>>> > more obvious the advantages of using Kylin are; in particular, Kylin is
>>> > particularly advantageous in the scenarios of de-emphasis (count
>>> distinct),
>>> > Top N, and Percentile. In particular, Kylin's advantages in
>>> de-weighting
>>> > (count distinct), Top N, Percentile and other scenarios are especially
>>> > huge, and it is used in a large number of scenarios, such as
>>> Dashboard, all
>>> > kinds of reports, large-screen display, traffic statistics, and user
>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>>> build
>>> > their data service platforms, providing millions to tens of millions of
>>> > queries per day, and most of the queries can be completed within 2 - 3
>>> > seconds. There is no better alternative for such a high concurrency
>>> > scenario.
>>> >
>>> > ClickHouse, because of its MPP architecture, has high computing power
>>> and
>>> > is more suitable when the query request is more flexible, or when
>>> there is
>>> > a need for detailed queries with low concurrency. Scenarios include:
>>> very
>>> > many columns and where conditions are arbitrarily combined with the
>>> user
>>> > label filtering, not a large amount of concurrency of complex
>>> on-the-spot
>>> > query and so on. If the amount of data and access is large, you need to
>>> > deploy a distributed ClickHouse cluster, which is a higher challenge
>>> for
>>> > operation and maintenance.
>>> >
>>> > If some queries are very flexible but infrequent, it is more
>>> > resource-efficient to use now-computing. Since the number of queries is
>>> > small, even if each query consumes a lot of computational resources,
>>> it is
>>> > still cost-effective overall. If some queries have a fixed pattern and
>>> the
>>> > query volume is large, it is more suitable for Kylin, because the query
>>> > volume is large, and by using large computational resources to save the
>>> > results, the upfront computational cost can be amortized over each
>>> query,
>>> > so it is the most economical.
>>> >
>>> > --- Translated with DeepL.com (free version)
>>> >
>>> >
>>> > ------------------------
>>> > With warm regard
>>> > Xiaoxiang Yu
>>> >
>>> >
>>> >
>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> wrote:
>>> >
>>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>>> >> great.
>>> >>
>>> >> This morning there has been a new challenge to my team: clickhouse
>>> offered
>>> >> us the speed of calculating 8 billion rows in millisecond which is
>>> faster
>>> >> than my demonstration (I used Kylin to do calculating 1 billion rows
>>> in
>>> >> 2.9
>>> >> seconds)
>>> >>
>>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>>> that I
>>> >> can defend my demonstration.
>>> >>
>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>> >>
>>> >> > 1. "In this important scenario of realtime analytics, the reason
>>> here is
>>> >> > that
>>> >> > kylin has lag time due to model update of new segment build, is that
>>> >> > correct?"
>>> >> >
>>> >> > You are correct.
>>> >> >
>>> >> > 2. "If that is true, then can you suggest a work-around of
>>> combination
>>> >> of
>>> >> > ... "
>>> >> >
>>> >> > Kylin is planning to introduce NRT streaming(coding is completed
>>> but not
>>> >> > released),
>>> >> > which can make the time-lag to about 3 minutes(that is my estimation
>>> >> but I
>>> >> > am
>>> >> > quite certain about it).
>>> >> > NRT stands for 'near real-time', it will run a job and do
>>> micro-batch
>>> >> > aggregation and persistence periodically. The price is that you
>>> need to
>>> >> run
>>> >> > and monitor a long-running
>>> >> >  job. This feature is based on Spark Streaming, so you need
>>> knowledge of
>>> >> > it.
>>> >> >
>>> >> > I am curious about what is the maximum time-lag your customers
>>> >> > can tolerate?
>>> >> > Personally, I guess minute level time-lag is ok for most cases.
>>> >> >
>>> >> > ------------------------
>>> >> > With warm regard
>>> >> > Xiaoxiang Yu
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>> >> wrote:
>>> >> >
>>> >> > > Druid is better in
>>> >> > > - Have a real-time datasource like Kafka etc.
>>> >> > >
>>> >> > > ==========================
>>> >> > >
>>> >> > > Hi Xiaoxiang, thank you for your response.
>>> >> > >
>>> >> > > In this important scenario of realtime alalytics, the reason here
>>> is
>>> >> that
>>> >> > > kylin has lag time due to model update of new segment build, is
>>> that
>>> >> > > correct?
>>> >> > >
>>> >> > > If that is true, then can you suggest a work-around of
>>> combination of
>>> >> :
>>> >> > >
>>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>>> >> > > realtime capability ?
>>> >> > >
>>> >> > > IMO, the point here is to find that (realtime DB update) and
>>> >> integrate it
>>> >> > > with (time - lag kylin cube).
>>> >> > >
>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>>> wrote:
>>> >> > >
>>> >> > > > I researched and tested Druid two years ago(I don't know too
>>> much
>>> >> about
>>> >> > > >  the change of Druid in these two years. New features that I
>>> know
>>> >> are :
>>> >> > > > new UI, fully on K8s etc).
>>> >> > > >
>>> >> > > > Here are some cases you should consider using Druid other than
>>> Kylin
>>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>>> >> used
>>> >> > two
>>> >> > > > years ago):
>>> >> > > >
>>> >> > > > - Have a real-time datasource like Kafka etc.
>>> >> > > > - Most queries are small(Based on my test result, I think Druid
>>> had
>>> >> > > better
>>> >> > > > response time for small queries two years ago.)
>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>>> >> K8S/public
>>> >> > > >   cloud platform as your deployment platform.
>>> >> > > >
>>> >> > > > But I do think there are many scenarios in which Kylin could be
>>> >> better,
>>> >> > > > like:
>>> >> > > >
>>> >> > > > - Better performance for complex/big queries. Kylin can have a
>>> more
>>> >> > > > exact-match/fine-grained
>>> >> > > >   Index for queries containing different `Group By dimensions`.
>>> >> > > > - User-friendly UI for modeling.
>>> >> > > > - Support 'Join' better? (Not sure at the moment)
>>> >> > > > - ODBC driver for different BI.(its website did not show it
>>> supports
>>> >> > ODBC
>>> >> > > > well)
>>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>>> >> > > >
>>> >> > > >
>>> >> > > > I don't know Pinot, so I have nothing to say about it.
>>> >> > > > Hope to help you, or you are free to share your opinion.
>>> >> > > >
>>> >> > > > ------------------------
>>> >> > > > With warm regard
>>> >> > > > Xiaoxiang Yu
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>>> <na...@vnpay.vn.invalid>
>>> >> > > wrote:
>>> >> > > >
>>> >> > > >> Dear Xiaoxiang,
>>> >> > > >> Sirs/Madams,
>>> >> > > >>
>>> >> > > >> May I post my boss's question:
>>> >> > > >>
>>> >> > > >> What are the pros and cons of the OLAP platform Kylin compared
>>> to
>>> >> > Pinot
>>> >> > > >> and
>>> >> > > >> Druid?
>>> >> > > >>
>>> >> > > >> Please kindly let me know
>>> >> > > >>
>>> >> > > >> Thank you very much and best regards
>>> >> > > >>
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>> >
>>>
>>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Hi Xiaoxiang, Sirs / Madams

Can you see the atttached photo

My boss asked that why druid commit code regularly but kylin had not been
committed since July


On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:

> I think so.
>
> Response time is not the only factor to make a decision. Kylin could be
> cheaper
> when the query pattern is suitable for the Kylin model, and Kylin can
> guarantee
> reasonable query latency. Clickhouse will be quicker in an ad hoc query
> scenario.
>
> By the way, Youzan and Kyligence combine them together to provide
> unified data analytics services for their customers.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, thank you
>>
>> In case my client uses cloud computing service like gcp or aws, which
>> will cost more: precalculation feature of kylin or clickhouse (incase of
>> kylin, I have a thought that the query execution has been done once and
>> stored in cube to be used many times so kylin uses less cloud computation,
>> is that true)?
>>
>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > Following text is part of an article(
>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >
>> >
>> >
>> ===============================================================================
>> >
>> > Kylin is suitable for aggregation queries with fixed modes because of
>> its
>> > pre-calculated technology, for example, join, group by, and where
>> condition
>> > modes in SQL are relatively fixed, etc. The larger the data volume is,
>> the
>> > more obvious the advantages of using Kylin are; in particular, Kylin is
>> > particularly advantageous in the scenarios of de-emphasis (count
>> distinct),
>> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
>> > (count distinct), Top N, Percentile and other scenarios are especially
>> > huge, and it is used in a large number of scenarios, such as Dashboard,
>> all
>> > kinds of reports, large-screen display, traffic statistics, and user
>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>> build
>> > their data service platforms, providing millions to tens of millions of
>> > queries per day, and most of the queries can be completed within 2 - 3
>> > seconds. There is no better alternative for such a high concurrency
>> > scenario.
>> >
>> > ClickHouse, because of its MPP architecture, has high computing power
>> and
>> > is more suitable when the query request is more flexible, or when there
>> is
>> > a need for detailed queries with low concurrency. Scenarios include:
>> very
>> > many columns and where conditions are arbitrarily combined with the user
>> > label filtering, not a large amount of concurrency of complex
>> on-the-spot
>> > query and so on. If the amount of data and access is large, you need to
>> > deploy a distributed ClickHouse cluster, which is a higher challenge for
>> > operation and maintenance.
>> >
>> > If some queries are very flexible but infrequent, it is more
>> > resource-efficient to use now-computing. Since the number of queries is
>> > small, even if each query consumes a lot of computational resources, it
>> is
>> > still cost-effective overall. If some queries have a fixed pattern and
>> the
>> > query volume is large, it is more suitable for Kylin, because the query
>> > volume is large, and by using large computational resources to save the
>> > results, the upfront computational cost can be amortized over each
>> query,
>> > so it is the most economical.
>> >
>> > --- Translated with DeepL.com (free version)
>> >
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>> >> great.
>> >>
>> >> This morning there has been a new challenge to my team: clickhouse
>> offered
>> >> us the speed of calculating 8 billion rows in millisecond which is
>> faster
>> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> >> 2.9
>> >> seconds)
>> >>
>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>> that I
>> >> can defend my demonstration.
>> >>
>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >> > 1. "In this important scenario of realtime analytics, the reason
>> here is
>> >> > that
>> >> > kylin has lag time due to model update of new segment build, is that
>> >> > correct?"
>> >> >
>> >> > You are correct.
>> >> >
>> >> > 2. "If that is true, then can you suggest a work-around of
>> combination
>> >> of
>> >> > ... "
>> >> >
>> >> > Kylin is planning to introduce NRT streaming(coding is completed but
>> not
>> >> > released),
>> >> > which can make the time-lag to about 3 minutes(that is my estimation
>> >> but I
>> >> > am
>> >> > quite certain about it).
>> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> >> > aggregation and persistence periodically. The price is that you need
>> to
>> >> run
>> >> > and monitor a long-running
>> >> >  job. This feature is based on Spark Streaming, so you need
>> knowledge of
>> >> > it.
>> >> >
>> >> > I am curious about what is the maximum time-lag your customers
>> >> > can tolerate?
>> >> > Personally, I guess minute level time-lag is ok for most cases.
>> >> >
>> >> > ------------------------
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> wrote:
>> >> >
>> >> > > Druid is better in
>> >> > > - Have a real-time datasource like Kafka etc.
>> >> > >
>> >> > > ==========================
>> >> > >
>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> > >
>> >> > > In this important scenario of realtime alalytics, the reason here
>> is
>> >> that
>> >> > > kylin has lag time due to model update of new segment build, is
>> that
>> >> > > correct?
>> >> > >
>> >> > > If that is true, then can you suggest a work-around of combination
>> of
>> >> :
>> >> > >
>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >> > > realtime capability ?
>> >> > >
>> >> > > IMO, the point here is to find that (realtime DB update) and
>> >> integrate it
>> >> > > with (time - lag kylin cube).
>> >> > >
>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> > >
>> >> > > > I researched and tested Druid two years ago(I don't know too much
>> >> about
>> >> > > >  the change of Druid in these two years. New features that I know
>> >> are :
>> >> > > > new UI, fully on K8s etc).
>> >> > > >
>> >> > > > Here are some cases you should consider using Druid other than
>> Kylin
>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>> >> used
>> >> > two
>> >> > > > years ago):
>> >> > > >
>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> > > > - Most queries are small(Based on my test result, I think Druid
>> had
>> >> > > better
>> >> > > > response time for small queries two years ago.)
>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> >> K8S/public
>> >> > > >   cloud platform as your deployment platform.
>> >> > > >
>> >> > > > But I do think there are many scenarios in which Kylin could be
>> >> better,
>> >> > > > like:
>> >> > > >
>> >> > > > - Better performance for complex/big queries. Kylin can have a
>> more
>> >> > > > exact-match/fine-grained
>> >> > > >   Index for queries containing different `Group By dimensions`.
>> >> > > > - User-friendly UI for modeling.
>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> > > > - ODBC driver for different BI.(its website did not show it
>> supports
>> >> > ODBC
>> >> > > > well)
>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >> > > >
>> >> > > >
>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >> > > > Hope to help you, or you are free to share your opinion.
>> >> > > >
>> >> > > > ------------------------
>> >> > > > With warm regard
>> >> > > > Xiaoxiang Yu
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> > > wrote:
>> >> > > >
>> >> > > >> Dear Xiaoxiang,
>> >> > > >> Sirs/Madams,
>> >> > > >>
>> >> > > >> May I post my boss's question:
>> >> > > >>
>> >> > > >> What are the pros and cons of the OLAP platform Kylin compared
>> to
>> >> > Pinot
>> >> > > >> and
>> >> > > >> Druid?
>> >> > > >>
>> >> > > >> Please kindly let me know
>> >> > > >>
>> >> > > >> Thank you very much and best regards
>> >> > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Hi Xiaoxiang, Sirs / Madams

Can you see the atttached photo

My boss asked that why druid commit code regularly but kylin had not been
committed since July


On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <xx...@apache.org> wrote:

> I think so.
>
> Response time is not the only factor to make a decision. Kylin could be
> cheaper
> when the query pattern is suitable for the Kylin model, and Kylin can
> guarantee
> reasonable query latency. Clickhouse will be quicker in an ad hoc query
> scenario.
>
> By the way, Youzan and Kyligence combine them together to provide
> unified data analytics services for their customers.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, thank you
>>
>> In case my client uses cloud computing service like gcp or aws, which
>> will cost more: precalculation feature of kylin or clickhouse (incase of
>> kylin, I have a thought that the query execution has been done once and
>> stored in cube to be used many times so kylin uses less cloud computation,
>> is that true)?
>>
>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > Following text is part of an article(
>> > https://zhuanlan.zhihu.com/p/343394287) .
>> >
>> >
>> >
>> ===============================================================================
>> >
>> > Kylin is suitable for aggregation queries with fixed modes because of
>> its
>> > pre-calculated technology, for example, join, group by, and where
>> condition
>> > modes in SQL are relatively fixed, etc. The larger the data volume is,
>> the
>> > more obvious the advantages of using Kylin are; in particular, Kylin is
>> > particularly advantageous in the scenarios of de-emphasis (count
>> distinct),
>> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
>> > (count distinct), Top N, Percentile and other scenarios are especially
>> > huge, and it is used in a large number of scenarios, such as Dashboard,
>> all
>> > kinds of reports, large-screen display, traffic statistics, and user
>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
>> build
>> > their data service platforms, providing millions to tens of millions of
>> > queries per day, and most of the queries can be completed within 2 - 3
>> > seconds. There is no better alternative for such a high concurrency
>> > scenario.
>> >
>> > ClickHouse, because of its MPP architecture, has high computing power
>> and
>> > is more suitable when the query request is more flexible, or when there
>> is
>> > a need for detailed queries with low concurrency. Scenarios include:
>> very
>> > many columns and where conditions are arbitrarily combined with the user
>> > label filtering, not a large amount of concurrency of complex
>> on-the-spot
>> > query and so on. If the amount of data and access is large, you need to
>> > deploy a distributed ClickHouse cluster, which is a higher challenge for
>> > operation and maintenance.
>> >
>> > If some queries are very flexible but infrequent, it is more
>> > resource-efficient to use now-computing. Since the number of queries is
>> > small, even if each query consumes a lot of computational resources, it
>> is
>> > still cost-effective overall. If some queries have a fixed pattern and
>> the
>> > query volume is large, it is more suitable for Kylin, because the query
>> > volume is large, and by using large computational resources to save the
>> > results, the upfront computational cost can be amortized over each
>> query,
>> > so it is the most economical.
>> >
>> > --- Translated with DeepL.com (free version)
>> >
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> >> Thank you Xiaoxiang for the near real time streaming feature. That's
>> >> great.
>> >>
>> >> This morning there has been a new challenge to my team: clickhouse
>> offered
>> >> us the speed of calculating 8 billion rows in millisecond which is
>> faster
>> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> >> 2.9
>> >> seconds)
>> >>
>> >> Can you briefly suggest the advantages of kylin over clickhouse so
>> that I
>> >> can defend my demonstration.
>> >>
>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> >>
>> >> > 1. "In this important scenario of realtime analytics, the reason
>> here is
>> >> > that
>> >> > kylin has lag time due to model update of new segment build, is that
>> >> > correct?"
>> >> >
>> >> > You are correct.
>> >> >
>> >> > 2. "If that is true, then can you suggest a work-around of
>> combination
>> >> of
>> >> > ... "
>> >> >
>> >> > Kylin is planning to introduce NRT streaming(coding is completed but
>> not
>> >> > released),
>> >> > which can make the time-lag to about 3 minutes(that is my estimation
>> >> but I
>> >> > am
>> >> > quite certain about it).
>> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> >> > aggregation and persistence periodically. The price is that you need
>> to
>> >> run
>> >> > and monitor a long-running
>> >> >  job. This feature is based on Spark Streaming, so you need
>> knowledge of
>> >> > it.
>> >> >
>> >> > I am curious about what is the maximum time-lag your customers
>> >> > can tolerate?
>> >> > Personally, I guess minute level time-lag is ok for most cases.
>> >> >
>> >> > ------------------------
>> >> > With warm regard
>> >> > Xiaoxiang Yu
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> >> wrote:
>> >> >
>> >> > > Druid is better in
>> >> > > - Have a real-time datasource like Kafka etc.
>> >> > >
>> >> > > ==========================
>> >> > >
>> >> > > Hi Xiaoxiang, thank you for your response.
>> >> > >
>> >> > > In this important scenario of realtime alalytics, the reason here
>> is
>> >> that
>> >> > > kylin has lag time due to model update of new segment build, is
>> that
>> >> > > correct?
>> >> > >
>> >> > > If that is true, then can you suggest a work-around of combination
>> of
>> >> :
>> >> > >
>> >> > > (time - lag kylin cube) + (realtime DB update) to provide
>> >> > > realtime capability ?
>> >> > >
>> >> > > IMO, the point here is to find that (realtime DB update) and
>> >> integrate it
>> >> > > with (time - lag kylin cube).
>> >> > >
>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
>> wrote:
>> >> > >
>> >> > > > I researched and tested Druid two years ago(I don't know too much
>> >> about
>> >> > > >  the change of Druid in these two years. New features that I know
>> >> are :
>> >> > > > new UI, fully on K8s etc).
>> >> > > >
>> >> > > > Here are some cases you should consider using Druid other than
>> Kylin
>> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>> >> used
>> >> > two
>> >> > > > years ago):
>> >> > > >
>> >> > > > - Have a real-time datasource like Kafka etc.
>> >> > > > - Most queries are small(Based on my test result, I think Druid
>> had
>> >> > > better
>> >> > > > response time for small queries two years ago.)
>> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> >> K8S/public
>> >> > > >   cloud platform as your deployment platform.
>> >> > > >
>> >> > > > But I do think there are many scenarios in which Kylin could be
>> >> better,
>> >> > > > like:
>> >> > > >
>> >> > > > - Better performance for complex/big queries. Kylin can have a
>> more
>> >> > > > exact-match/fine-grained
>> >> > > >   Index for queries containing different `Group By dimensions`.
>> >> > > > - User-friendly UI for modeling.
>> >> > > > - Support 'Join' better? (Not sure at the moment)
>> >> > > > - ODBC driver for different BI.(its website did not show it
>> supports
>> >> > ODBC
>> >> > > > well)
>> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> >> > > >
>> >> > > >
>> >> > > > I don't know Pinot, so I have nothing to say about it.
>> >> > > > Hope to help you, or you are free to share your opinion.
>> >> > > >
>> >> > > > ------------------------
>> >> > > > With warm regard
>> >> > > > Xiaoxiang Yu
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
>> <na...@vnpay.vn.invalid>
>> >> > > wrote:
>> >> > > >
>> >> > > >> Dear Xiaoxiang,
>> >> > > >> Sirs/Madams,
>> >> > > >>
>> >> > > >> May I post my boss's question:
>> >> > > >>
>> >> > > >> What are the pros and cons of the OLAP platform Kylin compared
>> to
>> >> > Pinot
>> >> > > >> and
>> >> > > >> Druid?
>> >> > > >>
>> >> > > >> Please kindly let me know
>> >> > > >>
>> >> > > >> Thank you very much and best regards
>> >> > > >>
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I think so.

Response time is not the only factor to make a decision. Kylin could be
cheaper
when the query pattern is suitable for the Kylin model, and Kylin can
guarantee
reasonable query latency. Clickhouse will be quicker in an ad hoc query
scenario.

By the way, Youzan and Kyligence combine them together to provide
unified data analytics services for their customers.

------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===============================================================================
> >
> > Kylin is suitable for aggregation queries with fixed modes because of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume is,
> the
> > more obvious the advantages of using Kylin are; in particular, Kylin is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> > (count distinct), Top N, Percentile and other scenarios are especially
> > huge, and it is used in a large number of scenarios, such as Dashboard,
> all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions of
> > queries per day, and most of the queries can be completed within 2 - 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing power and
> > is more suitable when the query request is more flexible, or when there
> is
> > a need for detailed queries with low concurrency. Scenarios include: very
> > many columns and where conditions are arbitrarily combined with the user
> > label filtering, not a large amount of concurrency of complex on-the-spot
> > query and so on. If the amount of data and access is large, you need to
> > deploy a distributed ClickHouse cluster, which is a higher challenge for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries is
> > small, even if each query consumes a lot of computational resources, it
> is
> > still cost-effective overall. If some queries have a fixed pattern and
> the
> > query volume is large, it is more suitable for Kylin, because the query
> > volume is large, and by using large computational resources to save the
> > results, the upfront computational cost can be amortized over each query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's
> >> great.
> >>
> >> This morning there has been a new challenge to my team: clickhouse
> offered
> >> us the speed of calculating 8 billion rows in millisecond which is
> faster
> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
> >> 2.9
> >> seconds)
> >>
> >> Can you briefly suggest the advantages of kylin over clickhouse so that
> I
> >> can defend my demonstration.
> >>
> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > 1. "In this important scenario of realtime analytics, the reason here
> is
> >> > that
> >> > kylin has lag time due to model update of new segment build, is that
> >> > correct?"
> >> >
> >> > You are correct.
> >> >
> >> > 2. "If that is true, then can you suggest a work-around of combination
> >> of
> >> > ... "
> >> >
> >> > Kylin is planning to introduce NRT streaming(coding is completed but
> not
> >> > released),
> >> > which can make the time-lag to about 3 minutes(that is my estimation
> >> but I
> >> > am
> >> > quite certain about it).
> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
> >> > aggregation and persistence periodically. The price is that you need
> to
> >> run
> >> > and monitor a long-running
> >> >  job. This feature is based on Spark Streaming, so you need knowledge
> of
> >> > it.
> >> >
> >> > I am curious about what is the maximum time-lag your customers
> >> > can tolerate?
> >> > Personally, I guess minute level time-lag is ok for most cases.
> >> >
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> > > Druid is better in
> >> > > - Have a real-time datasource like Kafka etc.
> >> > >
> >> > > ==========================
> >> > >
> >> > > Hi Xiaoxiang, thank you for your response.
> >> > >
> >> > > In this important scenario of realtime alalytics, the reason here is
> >> that
> >> > > kylin has lag time due to model update of new segment build, is that
> >> > > correct?
> >> > >
> >> > > If that is true, then can you suggest a work-around of combination
> of
> >> :
> >> > >
> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> > > realtime capability ?
> >> > >
> >> > > IMO, the point here is to find that (realtime DB update) and
> >> integrate it
> >> > > with (time - lag kylin cube).
> >> > >
> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> > >
> >> > > > I researched and tested Druid two years ago(I don't know too much
> >> about
> >> > > >  the change of Druid in these two years. New features that I know
> >> are :
> >> > > > new UI, fully on K8s etc).
> >> > > >
> >> > > > Here are some cases you should consider using Druid other than
> Kylin
> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
> >> used
> >> > two
> >> > > > years ago):
> >> > > >
> >> > > > - Have a real-time datasource like Kafka etc.
> >> > > > - Most queries are small(Based on my test result, I think Druid
> had
> >> > > better
> >> > > > response time for small queries two years ago.)
> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >> K8S/public
> >> > > >   cloud platform as your deployment platform.
> >> > > >
> >> > > > But I do think there are many scenarios in which Kylin could be
> >> better,
> >> > > > like:
> >> > > >
> >> > > > - Better performance for complex/big queries. Kylin can have a
> more
> >> > > > exact-match/fine-grained
> >> > > >   Index for queries containing different `Group By dimensions`.
> >> > > > - User-friendly UI for modeling.
> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> > > > - ODBC driver for different BI.(its website did not show it
> supports
> >> > ODBC
> >> > > > well)
> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> > > >
> >> > > >
> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> > > > Hope to help you, or you are free to share your opinion.
> >> > > >
> >> > > > ------------------------
> >> > > > With warm regard
> >> > > > Xiaoxiang Yu
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> > > wrote:
> >> > > >
> >> > > >> Dear Xiaoxiang,
> >> > > >> Sirs/Madams,
> >> > > >>
> >> > > >> May I post my boss's question:
> >> > > >>
> >> > > >> What are the pros and cons of the OLAP platform Kylin compared to
> >> > Pinot
> >> > > >> and
> >> > > >> Druid?
> >> > > >>
> >> > > >> Please kindly let me know
> >> > > >>
> >> > > >> Thank you very much and best regards
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I think so.

Response time is not the only factor to make a decision. Kylin could be
cheaper
when the query pattern is suitable for the Kylin model, and Kylin can
guarantee
reasonable query latency. Clickhouse will be quicker in an ad hoc query
scenario.

By the way, Youzan and Kyligence combine them together to provide
unified data analytics services for their customers.

------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Hi Xiaoxiang, thank you
>
> In case my client uses cloud computing service like gcp or aws, which
> will cost more: precalculation feature of kylin or clickhouse (incase of
> kylin, I have a thought that the query execution has been done once and
> stored in cube to be used many times so kylin uses less cloud computation,
> is that true)?
>
> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > Following text is part of an article(
> > https://zhuanlan.zhihu.com/p/343394287) .
> >
> >
> >
> ===============================================================================
> >
> > Kylin is suitable for aggregation queries with fixed modes because of its
> > pre-calculated technology, for example, join, group by, and where
> condition
> > modes in SQL are relatively fixed, etc. The larger the data volume is,
> the
> > more obvious the advantages of using Kylin are; in particular, Kylin is
> > particularly advantageous in the scenarios of de-emphasis (count
> distinct),
> > Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> > (count distinct), Top N, Percentile and other scenarios are especially
> > huge, and it is used in a large number of scenarios, such as Dashboard,
> all
> > kinds of reports, large-screen display, traffic statistics, and user
> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to
> build
> > their data service platforms, providing millions to tens of millions of
> > queries per day, and most of the queries can be completed within 2 - 3
> > seconds. There is no better alternative for such a high concurrency
> > scenario.
> >
> > ClickHouse, because of its MPP architecture, has high computing power and
> > is more suitable when the query request is more flexible, or when there
> is
> > a need for detailed queries with low concurrency. Scenarios include: very
> > many columns and where conditions are arbitrarily combined with the user
> > label filtering, not a large amount of concurrency of complex on-the-spot
> > query and so on. If the amount of data and access is large, you need to
> > deploy a distributed ClickHouse cluster, which is a higher challenge for
> > operation and maintenance.
> >
> > If some queries are very flexible but infrequent, it is more
> > resource-efficient to use now-computing. Since the number of queries is
> > small, even if each query consumes a lot of computational resources, it
> is
> > still cost-effective overall. If some queries have a fixed pattern and
> the
> > query volume is large, it is more suitable for Kylin, because the query
> > volume is large, and by using large computational resources to save the
> > results, the upfront computational cost can be amortized over each query,
> > so it is the most economical.
> >
> > --- Translated with DeepL.com (free version)
> >
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Thank you Xiaoxiang for the near real time streaming feature. That's
> >> great.
> >>
> >> This morning there has been a new challenge to my team: clickhouse
> offered
> >> us the speed of calculating 8 billion rows in millisecond which is
> faster
> >> than my demonstration (I used Kylin to do calculating 1 billion rows in
> >> 2.9
> >> seconds)
> >>
> >> Can you briefly suggest the advantages of kylin over clickhouse so that
> I
> >> can defend my demonstration.
> >>
> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >>
> >> > 1. "In this important scenario of realtime analytics, the reason here
> is
> >> > that
> >> > kylin has lag time due to model update of new segment build, is that
> >> > correct?"
> >> >
> >> > You are correct.
> >> >
> >> > 2. "If that is true, then can you suggest a work-around of combination
> >> of
> >> > ... "
> >> >
> >> > Kylin is planning to introduce NRT streaming(coding is completed but
> not
> >> > released),
> >> > which can make the time-lag to about 3 minutes(that is my estimation
> >> but I
> >> > am
> >> > quite certain about it).
> >> > NRT stands for 'near real-time', it will run a job and do micro-batch
> >> > aggregation and persistence periodically. The price is that you need
> to
> >> run
> >> > and monitor a long-running
> >> >  job. This feature is based on Spark Streaming, so you need knowledge
> of
> >> > it.
> >> >
> >> > I am curious about what is the maximum time-lag your customers
> >> > can tolerate?
> >> > Personally, I guess minute level time-lag is ok for most cases.
> >> >
> >> > ------------------------
> >> > With warm regard
> >> > Xiaoxiang Yu
> >> >
> >> >
> >> >
> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> >> wrote:
> >> >
> >> > > Druid is better in
> >> > > - Have a real-time datasource like Kafka etc.
> >> > >
> >> > > ==========================
> >> > >
> >> > > Hi Xiaoxiang, thank you for your response.
> >> > >
> >> > > In this important scenario of realtime alalytics, the reason here is
> >> that
> >> > > kylin has lag time due to model update of new segment build, is that
> >> > > correct?
> >> > >
> >> > > If that is true, then can you suggest a work-around of combination
> of
> >> :
> >> > >
> >> > > (time - lag kylin cube) + (realtime DB update) to provide
> >> > > realtime capability ?
> >> > >
> >> > > IMO, the point here is to find that (realtime DB update) and
> >> integrate it
> >> > > with (time - lag kylin cube).
> >> > >
> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org>
> wrote:
> >> > >
> >> > > > I researched and tested Druid two years ago(I don't know too much
> >> about
> >> > > >  the change of Druid in these two years. New features that I know
> >> are :
> >> > > > new UI, fully on K8s etc).
> >> > > >
> >> > > > Here are some cases you should consider using Druid other than
> Kylin
> >> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
> >> used
> >> > two
> >> > > > years ago):
> >> > > >
> >> > > > - Have a real-time datasource like Kafka etc.
> >> > > > - Most queries are small(Based on my test result, I think Druid
> had
> >> > > better
> >> > > > response time for small queries two years ago.)
> >> > > > - Don't know how to optimize Spark/Hadoop, want to use the
> >> K8S/public
> >> > > >   cloud platform as your deployment platform.
> >> > > >
> >> > > > But I do think there are many scenarios in which Kylin could be
> >> better,
> >> > > > like:
> >> > > >
> >> > > > - Better performance for complex/big queries. Kylin can have a
> more
> >> > > > exact-match/fine-grained
> >> > > >   Index for queries containing different `Group By dimensions`.
> >> > > > - User-friendly UI for modeling.
> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> > > > - ODBC driver for different BI.(its website did not show it
> supports
> >> > ODBC
> >> > > > well)
> >> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> >> > > >
> >> > > >
> >> > > > I don't know Pinot, so I have nothing to say about it.
> >> > > > Hope to help you, or you are free to share your opinion.
> >> > > >
> >> > > > ------------------------
> >> > > > With warm regard
> >> > > > Xiaoxiang Yu
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <namdd@vnpay.vn.invalid
> >
> >> > > wrote:
> >> > > >
> >> > > >> Dear Xiaoxiang,
> >> > > >> Sirs/Madams,
> >> > > >>
> >> > > >> May I post my boss's question:
> >> > > >>
> >> > > >> What are the pros and cons of the OLAP platform Kylin compared to
> >> > Pinot
> >> > > >> and
> >> > > >> Druid?
> >> > > >>
> >> > > >> Please kindly let me know
> >> > > >>
> >> > > >> Thank you very much and best regards
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Hi Xiaoxiang, thank you

In case my client uses cloud computing service like gcp or aws, which
will cost more: precalculation feature of kylin or clickhouse (incase of
kylin, I have a thought that the query execution has been done once and
stored in cube to be used many times so kylin uses less cloud computation,
is that true)?

On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> Following text is part of an article(
> https://zhuanlan.zhihu.com/p/343394287) .
>
>
> ===============================================================================
>
> Kylin is suitable for aggregation queries with fixed modes because of its
> pre-calculated technology, for example, join, group by, and where condition
> modes in SQL are relatively fixed, etc. The larger the data volume is, the
> more obvious the advantages of using Kylin are; in particular, Kylin is
> particularly advantageous in the scenarios of de-emphasis (count distinct),
> Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> (count distinct), Top N, Percentile and other scenarios are especially
> huge, and it is used in a large number of scenarios, such as Dashboard, all
> kinds of reports, large-screen display, traffic statistics, and user
> behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
> their data service platforms, providing millions to tens of millions of
> queries per day, and most of the queries can be completed within 2 - 3
> seconds. There is no better alternative for such a high concurrency
> scenario.
>
> ClickHouse, because of its MPP architecture, has high computing power and
> is more suitable when the query request is more flexible, or when there is
> a need for detailed queries with low concurrency. Scenarios include: very
> many columns and where conditions are arbitrarily combined with the user
> label filtering, not a large amount of concurrency of complex on-the-spot
> query and so on. If the amount of data and access is large, you need to
> deploy a distributed ClickHouse cluster, which is a higher challenge for
> operation and maintenance.
>
> If some queries are very flexible but infrequent, it is more
> resource-efficient to use now-computing. Since the number of queries is
> small, even if each query consumes a lot of computational resources, it is
> still cost-effective overall. If some queries have a fixed pattern and the
> query volume is large, it is more suitable for Kylin, because the query
> volume is large, and by using large computational resources to save the
> results, the upfront computational cost can be amortized over each query,
> so it is the most economical.
>
> --- Translated with DeepL.com (free version)
>
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you Xiaoxiang for the near real time streaming feature. That's
>> great.
>>
>> This morning there has been a new challenge to my team: clickhouse offered
>> us the speed of calculating 8 billion rows in millisecond which is faster
>> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> 2.9
>> seconds)
>>
>> Can you briefly suggest the advantages of kylin over clickhouse so that I
>> can defend my demonstration.
>>
>> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > 1. "In this important scenario of realtime analytics, the reason here is
>> > that
>> > kylin has lag time due to model update of new segment build, is that
>> > correct?"
>> >
>> > You are correct.
>> >
>> > 2. "If that is true, then can you suggest a work-around of combination
>> of
>> > ... "
>> >
>> > Kylin is planning to introduce NRT streaming(coding is completed but not
>> > released),
>> > which can make the time-lag to about 3 minutes(that is my estimation
>> but I
>> > am
>> > quite certain about it).
>> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> > aggregation and persistence periodically. The price is that you need to
>> run
>> > and monitor a long-running
>> >  job. This feature is based on Spark Streaming, so you need knowledge of
>> > it.
>> >
>> > I am curious about what is the maximum time-lag your customers
>> > can tolerate?
>> > Personally, I guess minute level time-lag is ok for most cases.
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> > > Druid is better in
>> > > - Have a real-time datasource like Kafka etc.
>> > >
>> > > ==========================
>> > >
>> > > Hi Xiaoxiang, thank you for your response.
>> > >
>> > > In this important scenario of realtime alalytics, the reason here is
>> that
>> > > kylin has lag time due to model update of new segment build, is that
>> > > correct?
>> > >
>> > > If that is true, then can you suggest a work-around of combination of
>> :
>> > >
>> > > (time - lag kylin cube) + (realtime DB update) to provide
>> > > realtime capability ?
>> > >
>> > > IMO, the point here is to find that (realtime DB update) and
>> integrate it
>> > > with (time - lag kylin cube).
>> > >
>> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> > >
>> > > > I researched and tested Druid two years ago(I don't know too much
>> about
>> > > >  the change of Druid in these two years. New features that I know
>> are :
>> > > > new UI, fully on K8s etc).
>> > > >
>> > > > Here are some cases you should consider using Druid other than Kylin
>> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>> used
>> > two
>> > > > years ago):
>> > > >
>> > > > - Have a real-time datasource like Kafka etc.
>> > > > - Most queries are small(Based on my test result, I think Druid had
>> > > better
>> > > > response time for small queries two years ago.)
>> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> K8S/public
>> > > >   cloud platform as your deployment platform.
>> > > >
>> > > > But I do think there are many scenarios in which Kylin could be
>> better,
>> > > > like:
>> > > >
>> > > > - Better performance for complex/big queries. Kylin can have a more
>> > > > exact-match/fine-grained
>> > > >   Index for queries containing different `Group By dimensions`.
>> > > > - User-friendly UI for modeling.
>> > > > - Support 'Join' better? (Not sure at the moment)
>> > > > - ODBC driver for different BI.(its website did not show it supports
>> > ODBC
>> > > > well)
>> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> > > >
>> > > >
>> > > > I don't know Pinot, so I have nothing to say about it.
>> > > > Hope to help you, or you are free to share your opinion.
>> > > >
>> > > > ------------------------
>> > > > With warm regard
>> > > > Xiaoxiang Yu
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > > wrote:
>> > > >
>> > > >> Dear Xiaoxiang,
>> > > >> Sirs/Madams,
>> > > >>
>> > > >> May I post my boss's question:
>> > > >>
>> > > >> What are the pros and cons of the OLAP platform Kylin compared to
>> > Pinot
>> > > >> and
>> > > >> Druid?
>> > > >>
>> > > >> Please kindly let me know
>> > > >>
>> > > >> Thank you very much and best regards
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Hi Xiaoxiang, thank you

In case my client uses cloud computing service like gcp or aws, which
will cost more: precalculation feature of kylin or clickhouse (incase of
kylin, I have a thought that the query execution has been done once and
stored in cube to be used many times so kylin uses less cloud computation,
is that true)?

On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> Following text is part of an article(
> https://zhuanlan.zhihu.com/p/343394287) .
>
>
> ===============================================================================
>
> Kylin is suitable for aggregation queries with fixed modes because of its
> pre-calculated technology, for example, join, group by, and where condition
> modes in SQL are relatively fixed, etc. The larger the data volume is, the
> more obvious the advantages of using Kylin are; in particular, Kylin is
> particularly advantageous in the scenarios of de-emphasis (count distinct),
> Top N, and Percentile. In particular, Kylin's advantages in de-weighting
> (count distinct), Top N, Percentile and other scenarios are especially
> huge, and it is used in a large number of scenarios, such as Dashboard, all
> kinds of reports, large-screen display, traffic statistics, and user
> behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
> their data service platforms, providing millions to tens of millions of
> queries per day, and most of the queries can be completed within 2 - 3
> seconds. There is no better alternative for such a high concurrency
> scenario.
>
> ClickHouse, because of its MPP architecture, has high computing power and
> is more suitable when the query request is more flexible, or when there is
> a need for detailed queries with low concurrency. Scenarios include: very
> many columns and where conditions are arbitrarily combined with the user
> label filtering, not a large amount of concurrency of complex on-the-spot
> query and so on. If the amount of data and access is large, you need to
> deploy a distributed ClickHouse cluster, which is a higher challenge for
> operation and maintenance.
>
> If some queries are very flexible but infrequent, it is more
> resource-efficient to use now-computing. Since the number of queries is
> small, even if each query consumes a lot of computational resources, it is
> still cost-effective overall. If some queries have a fixed pattern and the
> query volume is large, it is more suitable for Kylin, because the query
> volume is large, and by using large computational resources to save the
> results, the upfront computational cost can be amortized over each query,
> so it is the most economical.
>
> --- Translated with DeepL.com (free version)
>
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Thank you Xiaoxiang for the near real time streaming feature. That's
>> great.
>>
>> This morning there has been a new challenge to my team: clickhouse offered
>> us the speed of calculating 8 billion rows in millisecond which is faster
>> than my demonstration (I used Kylin to do calculating 1 billion rows in
>> 2.9
>> seconds)
>>
>> Can you briefly suggest the advantages of kylin over clickhouse so that I
>> can defend my demonstration.
>>
>> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>>
>> > 1. "In this important scenario of realtime analytics, the reason here is
>> > that
>> > kylin has lag time due to model update of new segment build, is that
>> > correct?"
>> >
>> > You are correct.
>> >
>> > 2. "If that is true, then can you suggest a work-around of combination
>> of
>> > ... "
>> >
>> > Kylin is planning to introduce NRT streaming(coding is completed but not
>> > released),
>> > which can make the time-lag to about 3 minutes(that is my estimation
>> but I
>> > am
>> > quite certain about it).
>> > NRT stands for 'near real-time', it will run a job and do micro-batch
>> > aggregation and persistence periodically. The price is that you need to
>> run
>> > and monitor a long-running
>> >  job. This feature is based on Spark Streaming, so you need knowledge of
>> > it.
>> >
>> > I am curious about what is the maximum time-lag your customers
>> > can tolerate?
>> > Personally, I guess minute level time-lag is ok for most cases.
>> >
>> > ------------------------
>> > With warm regard
>> > Xiaoxiang Yu
>> >
>> >
>> >
>> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> wrote:
>> >
>> > > Druid is better in
>> > > - Have a real-time datasource like Kafka etc.
>> > >
>> > > ==========================
>> > >
>> > > Hi Xiaoxiang, thank you for your response.
>> > >
>> > > In this important scenario of realtime alalytics, the reason here is
>> that
>> > > kylin has lag time due to model update of new segment build, is that
>> > > correct?
>> > >
>> > > If that is true, then can you suggest a work-around of combination of
>> :
>> > >
>> > > (time - lag kylin cube) + (realtime DB update) to provide
>> > > realtime capability ?
>> > >
>> > > IMO, the point here is to find that (realtime DB update) and
>> integrate it
>> > > with (time - lag kylin cube).
>> > >
>> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>> > >
>> > > > I researched and tested Druid two years ago(I don't know too much
>> about
>> > > >  the change of Druid in these two years. New features that I know
>> are :
>> > > > new UI, fully on K8s etc).
>> > > >
>> > > > Here are some cases you should consider using Druid other than Kylin
>> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I
>> used
>> > two
>> > > > years ago):
>> > > >
>> > > > - Have a real-time datasource like Kafka etc.
>> > > > - Most queries are small(Based on my test result, I think Druid had
>> > > better
>> > > > response time for small queries two years ago.)
>> > > > - Don't know how to optimize Spark/Hadoop, want to use the
>> K8S/public
>> > > >   cloud platform as your deployment platform.
>> > > >
>> > > > But I do think there are many scenarios in which Kylin could be
>> better,
>> > > > like:
>> > > >
>> > > > - Better performance for complex/big queries. Kylin can have a more
>> > > > exact-match/fine-grained
>> > > >   Index for queries containing different `Group By dimensions`.
>> > > > - User-friendly UI for modeling.
>> > > > - Support 'Join' better? (Not sure at the moment)
>> > > > - ODBC driver for different BI.(its website did not show it supports
>> > ODBC
>> > > > well)
>> > > > - Looks like Kylin supports ANSI SQL better than Druid.
>> > > >
>> > > >
>> > > > I don't know Pinot, so I have nothing to say about it.
>> > > > Hope to help you, or you are free to share your opinion.
>> > > >
>> > > > ------------------------
>> > > > With warm regard
>> > > > Xiaoxiang Yu
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>> > > wrote:
>> > > >
>> > > >> Dear Xiaoxiang,
>> > > >> Sirs/Madams,
>> > > >>
>> > > >> May I post my boss's question:
>> > > >>
>> > > >> What are the pros and cons of the OLAP platform Kylin compared to
>> > Pinot
>> > > >> and
>> > > >> Druid?
>> > > >>
>> > > >> Please kindly let me know
>> > > >>
>> > > >> Thank you very much and best regards
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Following text is part of an article(https://zhuanlan.zhihu.com/p/343394287)
.

===============================================================================

Kylin is suitable for aggregation queries with fixed modes because of its
pre-calculated technology, for example, join, group by, and where condition
modes in SQL are relatively fixed, etc. The larger the data volume is, the
more obvious the advantages of using Kylin are; in particular, Kylin is
particularly advantageous in the scenarios of de-emphasis (count distinct),
Top N, and Percentile. In particular, Kylin's advantages in de-weighting
(count distinct), Top N, Percentile and other scenarios are especially
huge, and it is used in a large number of scenarios, such as Dashboard, all
kinds of reports, large-screen display, traffic statistics, and user
behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
their data service platforms, providing millions to tens of millions of
queries per day, and most of the queries can be completed within 2 - 3
seconds. There is no better alternative for such a high concurrency
scenario.

ClickHouse, because of its MPP architecture, has high computing power and
is more suitable when the query request is more flexible, or when there is
a need for detailed queries with low concurrency. Scenarios include: very
many columns and where conditions are arbitrarily combined with the user
label filtering, not a large amount of concurrency of complex on-the-spot
query and so on. If the amount of data and access is large, you need to
deploy a distributed ClickHouse cluster, which is a higher challenge for
operation and maintenance.

If some queries are very flexible but infrequent, it is more
resource-efficient to use now-computing. Since the number of queries is
small, even if each query consumes a lot of computational resources, it is
still cost-effective overall. If some queries have a fixed pattern and the
query volume is large, it is more suitable for Kylin, because the query
volume is large, and by using large computational resources to save the
results, the upfront computational cost can be amortized over each query,
so it is the most economical.

--- Translated with DeepL.com (free version)


------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang for the near real time streaming feature. That's great.
>
> This morning there has been a new challenge to my team: clickhouse offered
> us the speed of calculating 8 billion rows in millisecond which is faster
> than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
> seconds)
>
> Can you briefly suggest the advantages of kylin over clickhouse so that I
> can defend my demonstration.
>
> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > 1. "In this important scenario of realtime analytics, the reason here is
> > that
> > kylin has lag time due to model update of new segment build, is that
> > correct?"
> >
> > You are correct.
> >
> > 2. "If that is true, then can you suggest a work-around of combination of
> > ... "
> >
> > Kylin is planning to introduce NRT streaming(coding is completed but not
> > released),
> > which can make the time-lag to about 3 minutes(that is my estimation but
> I
> > am
> > quite certain about it).
> > NRT stands for 'near real-time', it will run a job and do micro-batch
> > aggregation and persistence periodically. The price is that you need to
> run
> > and monitor a long-running
> >  job. This feature is based on Spark Streaming, so you need knowledge of
> > it.
> >
> > I am curious about what is the maximum time-lag your customers
> > can tolerate?
> > Personally, I guess minute level time-lag is ok for most cases.
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> > > Druid is better in
> > > - Have a real-time datasource like Kafka etc.
> > >
> > > ==========================
> > >
> > > Hi Xiaoxiang, thank you for your response.
> > >
> > > In this important scenario of realtime alalytics, the reason here is
> that
> > > kylin has lag time due to model update of new segment build, is that
> > > correct?
> > >
> > > If that is true, then can you suggest a work-around of combination of :
> > >
> > > (time - lag kylin cube) + (realtime DB update) to provide
> > > realtime capability ?
> > >
> > > IMO, the point here is to find that (realtime DB update) and integrate
> it
> > > with (time - lag kylin cube).
> > >
> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >
> > > > I researched and tested Druid two years ago(I don't know too much
> about
> > > >  the change of Druid in these two years. New features that I know
> are :
> > > > new UI, fully on K8s etc).
> > > >
> > > > Here are some cases you should consider using Druid other than Kylin
> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> > two
> > > > years ago):
> > > >
> > > > - Have a real-time datasource like Kafka etc.
> > > > - Most queries are small(Based on my test result, I think Druid had
> > > better
> > > > response time for small queries two years ago.)
> > > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > > >   cloud platform as your deployment platform.
> > > >
> > > > But I do think there are many scenarios in which Kylin could be
> better,
> > > > like:
> > > >
> > > > - Better performance for complex/big queries. Kylin can have a more
> > > > exact-match/fine-grained
> > > >   Index for queries containing different `Group By dimensions`.
> > > > - User-friendly UI for modeling.
> > > > - Support 'Join' better? (Not sure at the moment)
> > > > - ODBC driver for different BI.(its website did not show it supports
> > ODBC
> > > > well)
> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> > > >
> > > >
> > > > I don't know Pinot, so I have nothing to say about it.
> > > > Hope to help you, or you are free to share your opinion.
> > > >
> > > > ------------------------
> > > > With warm regard
> > > > Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > > wrote:
> > > >
> > > >> Dear Xiaoxiang,
> > > >> Sirs/Madams,
> > > >>
> > > >> May I post my boss's question:
> > > >>
> > > >> What are the pros and cons of the OLAP platform Kylin compared to
> > Pinot
> > > >> and
> > > >> Druid?
> > > >>
> > > >> Please kindly let me know
> > > >>
> > > >> Thank you very much and best regards
> > > >>
> > > >
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
Following text is part of an article(https://zhuanlan.zhihu.com/p/343394287)
.

===============================================================================

Kylin is suitable for aggregation queries with fixed modes because of its
pre-calculated technology, for example, join, group by, and where condition
modes in SQL are relatively fixed, etc. The larger the data volume is, the
more obvious the advantages of using Kylin are; in particular, Kylin is
particularly advantageous in the scenarios of de-emphasis (count distinct),
Top N, and Percentile. In particular, Kylin's advantages in de-weighting
(count distinct), Top N, Percentile and other scenarios are especially
huge, and it is used in a large number of scenarios, such as Dashboard, all
kinds of reports, large-screen display, traffic statistics, and user
behavior analysis. Meituan, Aurora, Shell Housing, etc. use Kylin to build
their data service platforms, providing millions to tens of millions of
queries per day, and most of the queries can be completed within 2 - 3
seconds. There is no better alternative for such a high concurrency
scenario.

ClickHouse, because of its MPP architecture, has high computing power and
is more suitable when the query request is more flexible, or when there is
a need for detailed queries with low concurrency. Scenarios include: very
many columns and where conditions are arbitrarily combined with the user
label filtering, not a large amount of concurrency of complex on-the-spot
query and so on. If the amount of data and access is large, you need to
deploy a distributed ClickHouse cluster, which is a higher challenge for
operation and maintenance.

If some queries are very flexible but infrequent, it is more
resource-efficient to use now-computing. Since the number of queries is
small, even if each query consumes a lot of computational resources, it is
still cost-effective overall. If some queries have a fixed pattern and the
query volume is large, it is more suitable for Kylin, because the query
volume is large, and by using large computational resources to save the
results, the upfront computational cost can be amortized over each query,
so it is the most economical.

--- Translated with DeepL.com (free version)


------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you Xiaoxiang for the near real time streaming feature. That's great.
>
> This morning there has been a new challenge to my team: clickhouse offered
> us the speed of calculating 8 billion rows in millisecond which is faster
> than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
> seconds)
>
> Can you briefly suggest the advantages of kylin over clickhouse so that I
> can defend my demonstration.
>
> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > 1. "In this important scenario of realtime analytics, the reason here is
> > that
> > kylin has lag time due to model update of new segment build, is that
> > correct?"
> >
> > You are correct.
> >
> > 2. "If that is true, then can you suggest a work-around of combination of
> > ... "
> >
> > Kylin is planning to introduce NRT streaming(coding is completed but not
> > released),
> > which can make the time-lag to about 3 minutes(that is my estimation but
> I
> > am
> > quite certain about it).
> > NRT stands for 'near real-time', it will run a job and do micro-batch
> > aggregation and persistence periodically. The price is that you need to
> run
> > and monitor a long-running
> >  job. This feature is based on Spark Streaming, so you need knowledge of
> > it.
> >
> > I am curious about what is the maximum time-lag your customers
> > can tolerate?
> > Personally, I guess minute level time-lag is ok for most cases.
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> > > Druid is better in
> > > - Have a real-time datasource like Kafka etc.
> > >
> > > ==========================
> > >
> > > Hi Xiaoxiang, thank you for your response.
> > >
> > > In this important scenario of realtime alalytics, the reason here is
> that
> > > kylin has lag time due to model update of new segment build, is that
> > > correct?
> > >
> > > If that is true, then can you suggest a work-around of combination of :
> > >
> > > (time - lag kylin cube) + (realtime DB update) to provide
> > > realtime capability ?
> > >
> > > IMO, the point here is to find that (realtime DB update) and integrate
> it
> > > with (time - lag kylin cube).
> > >
> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> > >
> > > > I researched and tested Druid two years ago(I don't know too much
> about
> > > >  the change of Druid in these two years. New features that I know
> are :
> > > > new UI, fully on K8s etc).
> > > >
> > > > Here are some cases you should consider using Druid other than Kylin
> > > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> > two
> > > > years ago):
> > > >
> > > > - Have a real-time datasource like Kafka etc.
> > > > - Most queries are small(Based on my test result, I think Druid had
> > > better
> > > > response time for small queries two years ago.)
> > > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > > >   cloud platform as your deployment platform.
> > > >
> > > > But I do think there are many scenarios in which Kylin could be
> better,
> > > > like:
> > > >
> > > > - Better performance for complex/big queries. Kylin can have a more
> > > > exact-match/fine-grained
> > > >   Index for queries containing different `Group By dimensions`.
> > > > - User-friendly UI for modeling.
> > > > - Support 'Join' better? (Not sure at the moment)
> > > > - ODBC driver for different BI.(its website did not show it supports
> > ODBC
> > > > well)
> > > > - Looks like Kylin supports ANSI SQL better than Druid.
> > > >
> > > >
> > > > I don't know Pinot, so I have nothing to say about it.
> > > > Hope to help you, or you are free to share your opinion.
> > > >
> > > > ------------------------
> > > > With warm regard
> > > > Xiaoxiang Yu
> > > >
> > > >
> > > >
> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > > wrote:
> > > >
> > > >> Dear Xiaoxiang,
> > > >> Sirs/Madams,
> > > >>
> > > >> May I post my boss's question:
> > > >>
> > > >> What are the pros and cons of the OLAP platform Kylin compared to
> > Pinot
> > > >> and
> > > >> Druid?
> > > >>
> > > >> Please kindly let me know
> > > >>
> > > >> Thank you very much and best regards
> > > >>
> > > >
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Thank you Xiaoxiang for the near real time streaming feature. That's great.

This morning there has been a new challenge to my team: clickhouse offered
us the speed of calculating 8 billion rows in millisecond which is faster
than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
seconds)

Can you briefly suggest the advantages of kylin over clickhouse so that I
can defend my demonstration.

On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. "In this important scenario of realtime analytics, the reason here is
> that
> kylin has lag time due to model update of new segment build, is that
> correct?"
>
> You are correct.
>
> 2. "If that is true, then can you suggest a work-around of combination of
> ... "
>
> Kylin is planning to introduce NRT streaming(coding is completed but not
> released),
> which can make the time-lag to about 3 minutes(that is my estimation but I
> am
> quite certain about it).
> NRT stands for 'near real-time', it will run a job and do micro-batch
> aggregation and persistence periodically. The price is that you need to run
> and monitor a long-running
>  job. This feature is based on Spark Streaming, so you need knowledge of
> it.
>
> I am curious about what is the maximum time-lag your customers
> can tolerate?
> Personally, I guess minute level time-lag is ok for most cases.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Druid is better in
> > - Have a real-time datasource like Kafka etc.
> >
> > ==========================
> >
> > Hi Xiaoxiang, thank you for your response.
> >
> > In this important scenario of realtime alalytics, the reason here is that
> > kylin has lag time due to model update of new segment build, is that
> > correct?
> >
> > If that is true, then can you suggest a work-around of combination of :
> >
> > (time - lag kylin cube) + (realtime DB update) to provide
> > realtime capability ?
> >
> > IMO, the point here is to find that (realtime DB update) and integrate it
> > with (time - lag kylin cube).
> >
> > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I researched and tested Druid two years ago(I don't know too much about
> > >  the change of Druid in these two years. New features that I know are :
> > > new UI, fully on K8s etc).
> > >
> > > Here are some cases you should consider using Druid other than Kylin
> > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> two
> > > years ago):
> > >
> > > - Have a real-time datasource like Kafka etc.
> > > - Most queries are small(Based on my test result, I think Druid had
> > better
> > > response time for small queries two years ago.)
> > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > >   cloud platform as your deployment platform.
> > >
> > > But I do think there are many scenarios in which Kylin could be better,
> > > like:
> > >
> > > - Better performance for complex/big queries. Kylin can have a more
> > > exact-match/fine-grained
> > >   Index for queries containing different `Group By dimensions`.
> > > - User-friendly UI for modeling.
> > > - Support 'Join' better? (Not sure at the moment)
> > > - ODBC driver for different BI.(its website did not show it supports
> ODBC
> > > well)
> > > - Looks like Kylin supports ANSI SQL better than Druid.
> > >
> > >
> > > I don't know Pinot, so I have nothing to say about it.
> > > Hope to help you, or you are free to share your opinion.
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Dear Xiaoxiang,
> > >> Sirs/Madams,
> > >>
> > >> May I post my boss's question:
> > >>
> > >> What are the pros and cons of the OLAP platform Kylin compared to
> Pinot
> > >> and
> > >> Druid?
> > >>
> > >> Please kindly let me know
> > >>
> > >> Thank you very much and best regards
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Thank you Xiaoxiang for the near real time streaming feature. That's great.

This morning there has been a new challenge to my team: clickhouse offered
us the speed of calculating 8 billion rows in millisecond which is faster
than my demonstration (I used Kylin to do calculating 1 billion rows in 2.9
seconds)

Can you briefly suggest the advantages of kylin over clickhouse so that I
can defend my demonstration.

On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> 1. "In this important scenario of realtime analytics, the reason here is
> that
> kylin has lag time due to model update of new segment build, is that
> correct?"
>
> You are correct.
>
> 2. "If that is true, then can you suggest a work-around of combination of
> ... "
>
> Kylin is planning to introduce NRT streaming(coding is completed but not
> released),
> which can make the time-lag to about 3 minutes(that is my estimation but I
> am
> quite certain about it).
> NRT stands for 'near real-time', it will run a job and do micro-batch
> aggregation and persistence periodically. The price is that you need to run
> and monitor a long-running
>  job. This feature is based on Spark Streaming, so you need knowledge of
> it.
>
> I am curious about what is the maximum time-lag your customers
> can tolerate?
> Personally, I guess minute level time-lag is ok for most cases.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Druid is better in
> > - Have a real-time datasource like Kafka etc.
> >
> > ==========================
> >
> > Hi Xiaoxiang, thank you for your response.
> >
> > In this important scenario of realtime alalytics, the reason here is that
> > kylin has lag time due to model update of new segment build, is that
> > correct?
> >
> > If that is true, then can you suggest a work-around of combination of :
> >
> > (time - lag kylin cube) + (realtime DB update) to provide
> > realtime capability ?
> >
> > IMO, the point here is to find that (realtime DB update) and integrate it
> > with (time - lag kylin cube).
> >
> > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
> >
> > > I researched and tested Druid two years ago(I don't know too much about
> > >  the change of Druid in these two years. New features that I know are :
> > > new UI, fully on K8s etc).
> > >
> > > Here are some cases you should consider using Druid other than Kylin
> > > at the moment (using Kylin 5.0-beta to compare the Druid which I used
> two
> > > years ago):
> > >
> > > - Have a real-time datasource like Kafka etc.
> > > - Most queries are small(Based on my test result, I think Druid had
> > better
> > > response time for small queries two years ago.)
> > > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> > >   cloud platform as your deployment platform.
> > >
> > > But I do think there are many scenarios in which Kylin could be better,
> > > like:
> > >
> > > - Better performance for complex/big queries. Kylin can have a more
> > > exact-match/fine-grained
> > >   Index for queries containing different `Group By dimensions`.
> > > - User-friendly UI for modeling.
> > > - Support 'Join' better? (Not sure at the moment)
> > > - ODBC driver for different BI.(its website did not show it supports
> ODBC
> > > well)
> > > - Looks like Kylin supports ANSI SQL better than Druid.
> > >
> > >
> > > I don't know Pinot, so I have nothing to say about it.
> > > Hope to help you, or you are free to share your opinion.
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > >> Dear Xiaoxiang,
> > >> Sirs/Madams,
> > >>
> > >> May I post my boss's question:
> > >>
> > >> What are the pros and cons of the OLAP platform Kylin compared to
> Pinot
> > >> and
> > >> Druid?
> > >>
> > >> Please kindly let me know
> > >>
> > >> Thank you very much and best regards
> > >>
> > >
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
1. "In this important scenario of realtime analytics, the reason here is
that
kylin has lag time due to model update of new segment build, is that
correct?"

You are correct.

2. "If that is true, then can you suggest a work-around of combination of
... "

Kylin is planning to introduce NRT streaming(coding is completed but not
released),
which can make the time-lag to about 3 minutes(that is my estimation but I
am
quite certain about it).
NRT stands for 'near real-time', it will run a job and do micro-batch
aggregation and persistence periodically. The price is that you need to run
and monitor a long-running
 job. This feature is based on Spark Streaming, so you need knowledge of it.

I am curious about what is the maximum time-lag your customers
can tolerate?
Personally, I guess minute level time-lag is ok for most cases.

------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Druid is better in
> - Have a real-time datasource like Kafka etc.
>
> ==========================
>
> Hi Xiaoxiang, thank you for your response.
>
> In this important scenario of realtime alalytics, the reason here is that
> kylin has lag time due to model update of new segment build, is that
> correct?
>
> If that is true, then can you suggest a work-around of combination of :
>
> (time - lag kylin cube) + (realtime DB update) to provide
> realtime capability ?
>
> IMO, the point here is to find that (realtime DB update) and integrate it
> with (time - lag kylin cube).
>
> On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > I researched and tested Druid two years ago(I don't know too much about
> >  the change of Druid in these two years. New features that I know are :
> > new UI, fully on K8s etc).
> >
> > Here are some cases you should consider using Druid other than Kylin
> > at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> > years ago):
> >
> > - Have a real-time datasource like Kafka etc.
> > - Most queries are small(Based on my test result, I think Druid had
> better
> > response time for small queries two years ago.)
> > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> >   cloud platform as your deployment platform.
> >
> > But I do think there are many scenarios in which Kylin could be better,
> > like:
> >
> > - Better performance for complex/big queries. Kylin can have a more
> > exact-match/fine-grained
> >   Index for queries containing different `Group By dimensions`.
> > - User-friendly UI for modeling.
> > - Support 'Join' better? (Not sure at the moment)
> > - ODBC driver for different BI.(its website did not show it supports ODBC
> > well)
> > - Looks like Kylin supports ANSI SQL better than Druid.
> >
> >
> > I don't know Pinot, so I have nothing to say about it.
> > Hope to help you, or you are free to share your opinion.
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Dear Xiaoxiang,
> >> Sirs/Madams,
> >>
> >> May I post my boss's question:
> >>
> >> What are the pros and cons of the OLAP platform Kylin compared to Pinot
> >> and
> >> Druid?
> >>
> >> Please kindly let me know
> >>
> >> Thank you very much and best regards
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
1. "In this important scenario of realtime analytics, the reason here is
that
kylin has lag time due to model update of new segment build, is that
correct?"

You are correct.

2. "If that is true, then can you suggest a work-around of combination of
... "

Kylin is planning to introduce NRT streaming(coding is completed but not
released),
which can make the time-lag to about 3 minutes(that is my estimation but I
am
quite certain about it).
NRT stands for 'near real-time', it will run a job and do micro-batch
aggregation and persistence periodically. The price is that you need to run
and monitor a long-running
 job. This feature is based on Spark Streaming, so you need knowledge of it.

I am curious about what is the maximum time-lag your customers
can tolerate?
Personally, I guess minute level time-lag is ok for most cases.

------------------------
With warm regard
Xiaoxiang Yu



On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Druid is better in
> - Have a real-time datasource like Kafka etc.
>
> ==========================
>
> Hi Xiaoxiang, thank you for your response.
>
> In this important scenario of realtime alalytics, the reason here is that
> kylin has lag time due to model update of new segment build, is that
> correct?
>
> If that is true, then can you suggest a work-around of combination of :
>
> (time - lag kylin cube) + (realtime DB update) to provide
> realtime capability ?
>
> IMO, the point here is to find that (realtime DB update) and integrate it
> with (time - lag kylin cube).
>
> On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:
>
> > I researched and tested Druid two years ago(I don't know too much about
> >  the change of Druid in these two years. New features that I know are :
> > new UI, fully on K8s etc).
> >
> > Here are some cases you should consider using Druid other than Kylin
> > at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> > years ago):
> >
> > - Have a real-time datasource like Kafka etc.
> > - Most queries are small(Based on my test result, I think Druid had
> better
> > response time for small queries two years ago.)
> > - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
> >   cloud platform as your deployment platform.
> >
> > But I do think there are many scenarios in which Kylin could be better,
> > like:
> >
> > - Better performance for complex/big queries. Kylin can have a more
> > exact-match/fine-grained
> >   Index for queries containing different `Group By dimensions`.
> > - User-friendly UI for modeling.
> > - Support 'Join' better? (Not sure at the moment)
> > - ODBC driver for different BI.(its website did not show it supports ODBC
> > well)
> > - Looks like Kylin supports ANSI SQL better than Druid.
> >
> >
> > I don't know Pinot, so I have nothing to say about it.
> > Hope to help you, or you are free to share your opinion.
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Dear Xiaoxiang,
> >> Sirs/Madams,
> >>
> >> May I post my boss's question:
> >>
> >> What are the pros and cons of the OLAP platform Kylin compared to Pinot
> >> and
> >> Druid?
> >>
> >> Please kindly let me know
> >>
> >> Thank you very much and best regards
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy via user <us...@kylin.apache.org>.
Druid is better in
- Have a real-time datasource like Kafka etc.

==========================

Hi Xiaoxiang, thank you for your response.

In this important scenario of realtime alalytics, the reason here is that
kylin has lag time due to model update of new segment build, is that
correct?

If that is true, then can you suggest a work-around of combination of :

(time - lag kylin cube) + (realtime DB update) to provide
realtime capability ?

IMO, the point here is to find that (realtime DB update) and integrate it
with (time - lag kylin cube).

On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> I researched and tested Druid two years ago(I don't know too much about
>  the change of Druid in these two years. New features that I know are :
> new UI, fully on K8s etc).
>
> Here are some cases you should consider using Druid other than Kylin
> at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> years ago):
>
> - Have a real-time datasource like Kafka etc.
> - Most queries are small(Based on my test result, I think Druid had better
> response time for small queries two years ago.)
> - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
>   cloud platform as your deployment platform.
>
> But I do think there are many scenarios in which Kylin could be better,
> like:
>
> - Better performance for complex/big queries. Kylin can have a more
> exact-match/fine-grained
>   Index for queries containing different `Group By dimensions`.
> - User-friendly UI for modeling.
> - Support 'Join' better? (Not sure at the moment)
> - ODBC driver for different BI.(its website did not show it supports ODBC
> well)
> - Looks like Kylin supports ANSI SQL better than Druid.
>
>
> I don't know Pinot, so I have nothing to say about it.
> Hope to help you, or you are free to share your opinion.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Dear Xiaoxiang,
>> Sirs/Madams,
>>
>> May I post my boss's question:
>>
>> What are the pros and cons of the OLAP platform Kylin compared to Pinot
>> and
>> Druid?
>>
>> Please kindly let me know
>>
>> Thank you very much and best regards
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Nam Đỗ Duy <na...@vnpay.vn.INVALID>.
Druid is better in
- Have a real-time datasource like Kafka etc.

==========================

Hi Xiaoxiang, thank you for your response.

In this important scenario of realtime alalytics, the reason here is that
kylin has lag time due to model update of new segment build, is that
correct?

If that is true, then can you suggest a work-around of combination of :

(time - lag kylin cube) + (realtime DB update) to provide
realtime capability ?

IMO, the point here is to find that (realtime DB update) and integrate it
with (time - lag kylin cube).

On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <xx...@apache.org> wrote:

> I researched and tested Druid two years ago(I don't know too much about
>  the change of Druid in these two years. New features that I know are :
> new UI, fully on K8s etc).
>
> Here are some cases you should consider using Druid other than Kylin
> at the moment (using Kylin 5.0-beta to compare the Druid which I used two
> years ago):
>
> - Have a real-time datasource like Kafka etc.
> - Most queries are small(Based on my test result, I think Druid had better
> response time for small queries two years ago.)
> - Don't know how to optimize Spark/Hadoop, want to use the K8S/public
>   cloud platform as your deployment platform.
>
> But I do think there are many scenarios in which Kylin could be better,
> like:
>
> - Better performance for complex/big queries. Kylin can have a more
> exact-match/fine-grained
>   Index for queries containing different `Group By dimensions`.
> - User-friendly UI for modeling.
> - Support 'Join' better? (Not sure at the moment)
> - ODBC driver for different BI.(its website did not show it supports ODBC
> well)
> - Looks like Kylin supports ANSI SQL better than Druid.
>
>
> I don't know Pinot, so I have nothing to say about it.
> Hope to help you, or you are free to share your opinion.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Dear Xiaoxiang,
>> Sirs/Madams,
>>
>> May I post my boss's question:
>>
>> What are the pros and cons of the OLAP platform Kylin compared to Pinot
>> and
>> Druid?
>>
>> Please kindly let me know
>>
>> Thank you very much and best regards
>>
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I researched and tested Druid two years ago(I don't know too much about
 the change of Druid in these two years. New features that I know are : new
UI, fully on K8s etc).

Here are some cases you should consider using Druid other than Kylin
at the moment (using Kylin 5.0-beta to compare the Druid which I used two
years ago):

- Have a real-time datasource like Kafka etc.
- Most queries are small(Based on my test result, I think Druid had better
response time for small queries two years ago.)
- Don't know how to optimize Spark/Hadoop, want to use the K8S/public
  cloud platform as your deployment platform.

But I do think there are many scenarios in which Kylin could be better,
like:

- Better performance for complex/big queries. Kylin can have a more
exact-match/fine-grained
  Index for queries containing different `Group By dimensions`.
- User-friendly UI for modeling.
- Support 'Join' better? (Not sure at the moment)
- ODBC driver for different BI.(its website did not show it supports ODBC
well)
- Looks like Kylin supports ANSI SQL better than Druid.


I don't know Pinot, so I have nothing to say about it.
Hope to help you, or you are free to share your opinion.

------------------------
With warm regard
Xiaoxiang Yu



On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Dear Xiaoxiang,
> Sirs/Madams,
>
> May I post my boss's question:
>
> What are the pros and cons of the OLAP platform Kylin compared to Pinot and
> Druid?
>
> Please kindly let me know
>
> Thank you very much and best regards
>

Re: Pinot/Kylin/Druid quick comparision

Posted by Xiaoxiang Yu <xx...@apache.org>.
I researched and tested Druid two years ago(I don't know too much about
 the change of Druid in these two years. New features that I know are : new
UI, fully on K8s etc).

Here are some cases you should consider using Druid other than Kylin
at the moment (using Kylin 5.0-beta to compare the Druid which I used two
years ago):

- Have a real-time datasource like Kafka etc.
- Most queries are small(Based on my test result, I think Druid had better
response time for small queries two years ago.)
- Don't know how to optimize Spark/Hadoop, want to use the K8S/public
  cloud platform as your deployment platform.

But I do think there are many scenarios in which Kylin could be better,
like:

- Better performance for complex/big queries. Kylin can have a more
exact-match/fine-grained
  Index for queries containing different `Group By dimensions`.
- User-friendly UI for modeling.
- Support 'Join' better? (Not sure at the moment)
- ODBC driver for different BI.(its website did not show it supports ODBC
well)
- Looks like Kylin supports ANSI SQL better than Druid.


I don't know Pinot, so I have nothing to say about it.
Hope to help you, or you are free to share your opinion.

------------------------
With warm regard
Xiaoxiang Yu



On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Dear Xiaoxiang,
> Sirs/Madams,
>
> May I post my boss's question:
>
> What are the pros and cons of the OLAP platform Kylin compared to Pinot and
> Druid?
>
> Please kindly let me know
>
> Thank you very much and best regards
>