You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by moon soo Lee <le...@gmail.com> on 2012/09/14 09:44:53 UTC

contributing to drill

Hi Drillers.

My name is Leemoonsoo, i'm so excited to find this interesting project, and
i'd very much like to contribute in any ways.

I'm a cheif architect and developer, NFLabs, the company have hadoop based
big data processing solution and it's customers in production.
And recently i was working on similar concept to drills.

I've read mailing list and some read materials related to it.

One my question is How Drill gonna be different, compare to SenseiDB(
http://senseidb.com/) ?
SenseiDB also scalable, SQL like query supported near realtime system.
(without nested data model)

Hope i can make real contribution.

Thanks.

--
Leemoonsoo

Re: contributing to drill

Posted by moon soo Lee <le...@gmail.com>.

Thanks min, very helpful.

However SenseiDB can be extended by Sensei-MapReduce. (yes.. map-reduce)
Group by multiple fields are possible through sensei-mapreduce.
(http://http://senseidb.github.com/sensei/map-reduce.html)

As far as i know, whole data (or filtered data) can be processed through
Sensei-MapReduce. While map-reduce can implement wide range of algorithm,
it's very hard to think senseiDB is limited.

Thanks.

-----
Leemoonsoo
moon@nflabs.com



On Thu, Sep 20, 2012 at 3:57 PM, Min Zhou <co...@gmail.com> wrote:

> We are using senseidb in the production. Generally speaking, senseidb
> has the some slight different scope of application from Drill.
>
> Senseidb is suitable to retrieve records according to boolean expressions,
> like "select row form tbl where a and b or c or (not d)", and can do some
> aggregation as well. but it's not applicable for group by aggregations,
> it can only do facet aggregations, which is not real group by aggrs and
> is much limited in functionality. You can't do a query group by two or
> more fields and can't group by expressions. To be further, facet must
> be built before a query start. In a few words, the applicable scope of
> senseidb is limited by its full-text index.
>
> Drill could cover those scenario because it just brute-forcibly scan
> the whole columnar data rather than seek data by indices. And also
> due to this reason, we can expect that drill should be slower than
> senseidb in some area.
>
> Regards,
> Min
>
>
>
> On Sat, Sep 15, 2012 at 11:31 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Actually, it would be quite plausible to write an input scanner for Drill
> > that reads Lucene indexes (which is what Sensei uses).
> >
> > At that point, you will probably get better performance from Drill if
> lazy
> > record assembly is done well.
> >
> > But this is all premature since sensei is pretty mature and Drill is
> > definitely not.
> >
> > On Fri, Sep 14, 2012 at 5:16 PM, moon soo Lee <le...@gmail.com>
> > wrote:
> >
> > > Thanks for the links.
> > >
> > > I understand the difference. drill uses columnar file representations,
> > > senseidb uses distributed indexes.
> > >
> > > Assume a user(who does not know internal architecture) uses both
> > software.
> > > Will this user experience differences between Drill and Senseidb? then
> > what
> > > they will be?
> > >
> > >
> > >
> > > On Fri, Sep 14, 2012 at 11:37 PM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Senseidb is a database and uses indexes in a distributed way.
> > > >
> > > > Drill plans to do full table scans across data, especially data with
> > > nested
> > > > columnar file representations.
> > > >
> > > > See this page for some links:
> > > > https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links
> > > >
> > > > On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <leemoonsoo@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Drillers.
> > > > >
> > > > > My name is Leemoonsoo, i'm so excited to find this interesting
> > project,
> > > > and
> > > > > i'd very much like to contribute in any ways.
> > > > >
> > > > > I'm a cheif architect and developer, NFLabs, the company have
> hadoop
> > > > based
> > > > > big data processing solution and it's customers in production.
> > > > > And recently i was working on similar concept to drills.
> > > > >
> > > > > I've read mailing list and some read materials related to it.
> > > > >
> > > > > One my question is How Drill gonna be different, compare to
> SenseiDB(
> > > > > http://senseidb.com/) ?
> > > > > SenseiDB also scalable, SQL like query supported near realtime
> > system.
> > > > > (without nested data model)
> > > > >
> > > > > Hope i can make real contribution.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > --
> > > > > Leemoonsoo
> > > > >
> > > >
> > >
> >
>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com
>

Re: contributing to drill

Posted by Min Zhou <co...@gmail.com>.

We are using senseidb in the production. Generally speaking, senseidb
has the some slight different scope of application from Drill.

Senseidb is suitable to retrieve records according to boolean expressions,
like "select row form tbl where a and b or c or (not d)", and can do some
aggregation as well. but it's not applicable for group by aggregations,
it can only do facet aggregations, which is not real group by aggrs and
is much limited in functionality. You can't do a query group by two or
more fields and can't group by expressions. To be further, facet must
be built before a query start. In a few words, the applicable scope of
senseidb is limited by its full-text index.

Drill could cover those scenario because it just brute-forcibly scan
the whole columnar data rather than seek data by indices. And also
due to this reason, we can expect that drill should be slower than
senseidb in some area.

Regards,
Min



On Sat, Sep 15, 2012 at 11:31 AM, Ted Dunning <te...@gmail.com> wrote:

> Actually, it would be quite plausible to write an input scanner for Drill
> that reads Lucene indexes (which is what Sensei uses).
>
> At that point, you will probably get better performance from Drill if lazy
> record assembly is done well.
>
> But this is all premature since sensei is pretty mature and Drill is
> definitely not.
>
> On Fri, Sep 14, 2012 at 5:16 PM, moon soo Lee <le...@gmail.com>
> wrote:
>
> > Thanks for the links.
> >
> > I understand the difference. drill uses columnar file representations,
> > senseidb uses distributed indexes.
> >
> > Assume a user(who does not know internal architecture) uses both
> software.
> > Will this user experience differences between Drill and Senseidb? then
> what
> > they will be?
> >
> >
> >
> > On Fri, Sep 14, 2012 at 11:37 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Senseidb is a database and uses indexes in a distributed way.
> > >
> > > Drill plans to do full table scans across data, especially data with
> > nested
> > > columnar file representations.
> > >
> > > See this page for some links:
> > > https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links
> > >
> > > On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com>
> > > wrote:
> > >
> > > > Hi Drillers.
> > > >
> > > > My name is Leemoonsoo, i'm so excited to find this interesting
> project,
> > > and
> > > > i'd very much like to contribute in any ways.
> > > >
> > > > I'm a cheif architect and developer, NFLabs, the company have hadoop
> > > based
> > > > big data processing solution and it's customers in production.
> > > > And recently i was working on similar concept to drills.
> > > >
> > > > I've read mailing list and some read materials related to it.
> > > >
> > > > One my question is How Drill gonna be different, compare to SenseiDB(
> > > > http://senseidb.com/) ?
> > > > SenseiDB also scalable, SQL like query supported near realtime
> system.
> > > > (without nested data model)
> > > >
> > > > Hope i can make real contribution.
> > > >
> > > > Thanks.
> > > >
> > > > --
> > > > Leemoonsoo
> > > >
> > >
> >
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: contributing to drill

Posted by Ted Dunning <te...@gmail.com>.

Actually, it would be quite plausible to write an input scanner for Drill
that reads Lucene indexes (which is what Sensei uses).

At that point, you will probably get better performance from Drill if lazy
record assembly is done well.

But this is all premature since sensei is pretty mature and Drill is
definitely not.

On Fri, Sep 14, 2012 at 5:16 PM, moon soo Lee <le...@gmail.com> wrote:

> Thanks for the links.
>
> I understand the difference. drill uses columnar file representations,
> senseidb uses distributed indexes.
>
> Assume a user(who does not know internal architecture) uses both software.
> Will this user experience differences between Drill and Senseidb? then what
> they will be?
>
>
>
> On Fri, Sep 14, 2012 at 11:37 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Senseidb is a database and uses indexes in a distributed way.
> >
> > Drill plans to do full table scans across data, especially data with
> nested
> > columnar file representations.
> >
> > See this page for some links:
> > https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links
> >
> > On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com>
> > wrote:
> >
> > > Hi Drillers.
> > >
> > > My name is Leemoonsoo, i'm so excited to find this interesting project,
> > and
> > > i'd very much like to contribute in any ways.
> > >
> > > I'm a cheif architect and developer, NFLabs, the company have hadoop
> > based
> > > big data processing solution and it's customers in production.
> > > And recently i was working on similar concept to drills.
> > >
> > > I've read mailing list and some read materials related to it.
> > >
> > > One my question is How Drill gonna be different, compare to SenseiDB(
> > > http://senseidb.com/) ?
> > > SenseiDB also scalable, SQL like query supported near realtime system.
> > > (without nested data model)
> > >
> > > Hope i can make real contribution.
> > >
> > > Thanks.
> > >
> > > --
> > > Leemoonsoo
> > >
> >
>

Re: contributing to drill

Posted by azuryy yu <az...@gmail.com>.

Hi,

I think this is a good question, as for user, if they have no knowledge
about architecture of Drill or Senseidb, and they just want to do some
query, I think there is no difference.

but If there are other operations, such as update, data load etc, it's very
different, at least I think so. by the way , in this mail list, we never
discuss update and data loading until now. I don't think Drill support
insert or update, just support data batch loading.



On Sat, Sep 15, 2012 at 8:16 AM, moon soo Lee <le...@gmail.com> wrote:

> Thanks for the links.
>
> I understand the difference. drill uses columnar file representations,
> senseidb uses distributed indexes.
>
> Assume a user(who does not know internal architecture) uses both software.
> Will this user experience differences between Drill and Senseidb? then what
> they will be?
>
>
>
> On Fri, Sep 14, 2012 at 11:37 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Senseidb is a database and uses indexes in a distributed way.
> >
> > Drill plans to do full table scans across data, especially data with
> nested
> > columnar file representations.
> >
> > See this page for some links:
> > https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links
> >
> > On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com>
> > wrote:
> >
> > > Hi Drillers.
> > >
> > > My name is Leemoonsoo, i'm so excited to find this interesting project,
> > and
> > > i'd very much like to contribute in any ways.
> > >
> > > I'm a cheif architect and developer, NFLabs, the company have hadoop
> > based
> > > big data processing solution and it's customers in production.
> > > And recently i was working on similar concept to drills.
> > >
> > > I've read mailing list and some read materials related to it.
> > >
> > > One my question is How Drill gonna be different, compare to SenseiDB(
> > > http://senseidb.com/) ?
> > > SenseiDB also scalable, SQL like query supported near realtime system.
> > > (without nested data model)
> > >
> > > Hope i can make real contribution.
> > >
> > > Thanks.
> > >
> > > --
> > > Leemoonsoo
> > >
> >
>

Re: contributing to drill

Posted by moon soo Lee <le...@gmail.com>.

Thanks for the links.

I understand the difference. drill uses columnar file representations,
senseidb uses distributed indexes.

Assume a user(who does not know internal architecture) uses both software.
Will this user experience differences between Drill and Senseidb? then what
they will be?



On Fri, Sep 14, 2012 at 11:37 PM, Ted Dunning <te...@gmail.com> wrote:

> Senseidb is a database and uses indexes in a distributed way.
>
> Drill plans to do full table scans across data, especially data with nested
> columnar file representations.
>
> See this page for some links:
> https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links
>
> On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com>
> wrote:
>
> > Hi Drillers.
> >
> > My name is Leemoonsoo, i'm so excited to find this interesting project,
> and
> > i'd very much like to contribute in any ways.
> >
> > I'm a cheif architect and developer, NFLabs, the company have hadoop
> based
> > big data processing solution and it's customers in production.
> > And recently i was working on similar concept to drills.
> >
> > I've read mailing list and some read materials related to it.
> >
> > One my question is How Drill gonna be different, compare to SenseiDB(
> > http://senseidb.com/) ?
> > SenseiDB also scalable, SQL like query supported near realtime system.
> > (without nested data model)
> >
> > Hope i can make real contribution.
> >
> > Thanks.
> >
> > --
> > Leemoonsoo
> >
>

Re: contributing to drill

Posted by Ted Dunning <te...@gmail.com>.

Senseidb is a database and uses indexes in a distributed way.

Drill plans to do full table scans across data, especially data with nested
columnar file representations.

See this page for some links:
https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links

On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com> wrote:

> Hi Drillers.
>
> My name is Leemoonsoo, i'm so excited to find this interesting project, and
> i'd very much like to contribute in any ways.
>
> I'm a cheif architect and developer, NFLabs, the company have hadoop based
> big data processing solution and it's customers in production.
> And recently i was working on similar concept to drills.
>
> I've read mailing list and some read materials related to it.
>
> One my question is How Drill gonna be different, compare to SenseiDB(
> http://senseidb.com/) ?
> SenseiDB also scalable, SQL like query supported near realtime system.
> (without nested data model)
>
> Hope i can make real contribution.
>
> Thanks.
>
> --
> Leemoonsoo
>

Re: contributing to drill

Posted by Ted Dunning <te...@gmail.com>.

Check out these links:
https://github.com/ApacheDrill/Brainstorm/wiki/Apache-Drill-Links

We will be posting more materials from meetups there until we get the
actual apache site up (which should happen shortly).

On Fri, Sep 14, 2012 at 6:24 AM, NAVEEN MAANJU <
naveen.maanju.apache@gmail.com> wrote:

> Hi  Leemoonsoo ,
>
> I am also interested in contributing to this project. I am very new to
> Hadoop and all related stuff. Can you please advise any learning as a
> prerequisite so that can start contributing efficiently?
>
> Thanks,
> N M
>
> On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com>
> wrote:
>
> > Hi Drillers.
> >
> > My name is Leemoonsoo, i'm so excited to find this interesting project,
> and
> > i'd very much like to contribute in any ways.
> >
> > I'm a cheif architect and developer, NFLabs, the company have hadoop
> based
> > big data processing solution and it's customers in production.
> > And recently i was working on similar concept to drills.
> >
> > I've read mailing list and some read materials related to it.
> >
> > One my question is How Drill gonna be different, compare to SenseiDB(
> > http://senseidb.com/) ?
> > SenseiDB also scalable, SQL like query supported near realtime system.
> > (without nested data model)
> >
> > Hope i can make real contribution.
> >
> > Thanks.
> >
> > --
> > Leemoonsoo
> >
>

Re: contributing to drill

Posted by NAVEEN MAANJU <na...@gmail.com>.

Hi  Leemoonsoo ,

I am also interested in contributing to this project. I am very new to
Hadoop and all related stuff. Can you please advise any learning as a
prerequisite so that can start contributing efficiently?

Thanks,
N M

On Fri, Sep 14, 2012 at 12:44 AM, moon soo Lee <le...@gmail.com> wrote:

> Hi Drillers.
>
> My name is Leemoonsoo, i'm so excited to find this interesting project, and
> i'd very much like to contribute in any ways.
>
> I'm a cheif architect and developer, NFLabs, the company have hadoop based
> big data processing solution and it's customers in production.
> And recently i was working on similar concept to drills.
>
> I've read mailing list and some read materials related to it.
>
> One my question is How Drill gonna be different, compare to SenseiDB(
> http://senseidb.com/) ?
> SenseiDB also scalable, SQL like query supported near realtime system.
> (without nested data model)
>
> Hope i can make real contribution.
>
> Thanks.
>
> --
> Leemoonsoo
>