You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Alexei Scherbakov <al...@gmail.com> on 2017/01/11 17:29:26 UTC

Allow distributed SQL query execution over explicit set of partitions

Guys,

I've just submitted a PR for
https://issues.apache.org/jira/browse/IGNITE-4523.

Please review API changes while waiting for TC results.

-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Vladimir, do you have any idea how specific partition settings will work
for queries over replicated caches, when every node will have full data
set. ?

The query will only be parallelized on nodes defined by passed partitions,
or something else ?



2017-01-19 17:27 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> 1. OK.
>
> 2. Agreed. In future we might split query execution between nodes, but for
> now query is routed to random node in grid,
>
> 3. OK, let's mark getter/setter as deprecated.
>
> 4. Query must be executed locally only for defined partitions. Currently
> this setting is ignored for local queries.
>
> 5. I have the same understanding. Distributed joins will ignore the
> setting.
> This is not implemented yet..
>
>
> 2017-01-19 15:39 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
>
>> Agree, lets remove everything related to partition ranges. Looks like
>> unnecessary complication.
>>
>> Sergi
>>
>> 2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>>
>> > Several side notes about API.
>> >
>> > 1) I would avoid ranges even in this form.for the sake of simplicity.
>> > Ignite do not have any notion of "partition range" in affinity API, so
>> I do
>> > not understand how users are going to work on ranges unless they have
>> some
>> > very special custom affinity function, which is rather unlikely case.
>> >
>> > 2) The fact that this property is ignored in REPLICATED cache is
>> confusing.
>> > REPLICATED cache still divides partitions into primaries and backups.
>> If I
>> > have very large data set and want to execute some query, I would
>> definitely
>> > expect that Ignite will take advantage of distributed computing and
>> spread
>> > the load between nodes. I understand that currently SQL queries do not
>> work
>> > this way, but this is clear disadvantage for certain scenarios, which we
>> > may improve in future. I would remove this paragraph from docs.
>> >
>> > 3) We already have ScanQuery.partition getter/setter. We need to make
>> sure
>> > that they are "merged" somehow. For instance, we may deprecate two
>> methods
>> > in ScanQuery class, and advise users to use Query.partitions, with
>> > clarification - only single partition is supported for ScanQuery at the
>> > moment.
>> >
>> > 4) What should happen if "partitions" are defined and "local" flag is
>> set?
>> >
>> > As per distributed joins - how are we going to execute them when
>> partitions
>> > are set explicitly? As far as I understand, partitions should apply
>> only to
>> > map step and only for the cache query was created from, This way
>> > distributed join execution should effectively ignore partitions?
>> >
>> > Vladimir.
>> >
>> >
>> > On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
>> > alexey.scherbakoff@gmail.com> wrote:
>> >
>> > > I mean distributed joins.
>> > >
>> > > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
>> > alexey.scherbakoff@gmail.com>
>> > > :
>> > >
>> > > > Guys,
>> > > >
>> > > > I've finished adding API changes and implemented proper nodes
>> routing.
>> > > >
>> > > > Currently it doesn't work with distributed queries.But I think this
>> > > > feature should be compatible with it.
>> > > >
>> > > > Could anyone take a look at current branch state while I'm looking
>> > deeper
>> > > > into dsitributed queries code?
>> > > >
>> > > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
>> > > > PR: https://github.com/apache/ignite/pull/1418
>> > > >
>> > > >
>> > > >
>> > > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
>> > > alexey.scherbakoff@gmail.com
>> > > > >:
>> > > >
>> > > >> OK, let's do it this way.
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <
>> sergi.vladykin@gmail.com>:
>> > > >>
>> > > >>> Internally we still use int[] when we send partitions (see
>> > > >>> GridH2QueryRequest.parts). It looks like we only do more work with
>> > > >>> PartitionSet.
>> > > >>>
>> > > >>> I like the idea of bitset for partitions, but
>> > > >>>
>> > > >>> 1. We have to change internals first to use it, otherwise the
>> > > >>> optimization
>> > > >>> makes no sense.
>> > > >>> 2. We will need to have a method SqlQuery.setPartitions(int...
>> parts)
>> > > for
>> > > >>> usability reasons anyways.
>> > > >>>
>> > > >>> Thus I suggest for now to go the straightforward way with int[]
>> and
>> > > >>> create
>> > > >>> a separate ticket describing the optimization with bitset.
>> > > >>>
>> > > >>> Sergi
>> > > >>>
>> > > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
>> > > >>> alexey.scherbakoff@gmail.com>:
>> > > >>>
>> > > >>> > PartitionSet hides internal implementation of int array.
>> > > >>> >
>> > > >>> > This allows as to efficiently represent contiguous range of
>> > > partitions
>> > > >>> and
>> > > >>> > defines clear API for ordered iteration over partitions and
>> > > containment
>> > > >>> > check.
>> > > >>> >
>> > > >>> > Even better to go with compressed bitmap, as I mentioned in
>> ticket
>> > > >>> comment.
>> > > >>> > This will allow us to minimize heap footprint for this object.
>> > > >>> >
>> > > >>> > Moreover, it will be useful to create reusable compressed bitmap
>> > > >>> > implementation in Ignite and use it in other cases, on example,
>> for
>> > > >>> > replacing H2's IntArray and Set<Integer>.
>> > > >>> >
>> > > >>> > Should I create a ticket for this ?
>> > > >>> >
>> > > >>> > .
>> > > >>> >
>> > > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <
>> dsetrakyan@apache.org
>> > >:
>> > > >>> >
>> > > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
>> > > >>> > sergi.vladykin@gmail.com>
>> > > >>> > > wrote:
>> > > >>> > >
>> > > >>> > > > I looked at the code. The PartitionSet concept looks
>> > > >>> overengineered to
>> > > >>> > > me,
>> > > >>> > > > why wouldn't we just go with int[]?
>> > > >>> > > >
>> > > >>> > >
>> > > >>> > > Agree.
>> > > >>> > >
>> > > >>> > >
>> > > >>> > > >
>> > > >>> > > > Sergi
>> > > >>> > > >
>> > > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
>> > > >>> > > alexey.scherbakoff@gmail.com
>> > > >>> > > > >:
>> > > >>> > > >
>> > > >>> > > > > Done.
>> > > >>> > > > >
>> > > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
>> > > >>> dsetrakyan@apache.org
>> > > >>> > >:
>> > > >>> > > > >
>> > > >>> > > > > > Alexey,
>> > > >>> > > > > >
>> > > >>> > > > > > I am not sure I am seeing the API changes documented in
>> the
>> > > >>> ticket.
>> > > >>> > > Can
>> > > >>> > > > > you
>> > > >>> > > > > > please either document them or add GIT links for the new
>> > > >>> classes?
>> > > >>> > > > > >
>> > > >>> > > > > > D.
>> > > >>> > > > > >
>> > > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
>> > > >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
>> > > >>> > > > > >
>> > > >>> > > > > > > Guys,
>> > > >>> > > > > > >
>> > > >>> > > > > > > I've just submitted a PR for
>> > > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
>> > > >>> > > > > > >
>> > > >>> > > > > > > Please review API changes while waiting for TC
>> results.
>> > > >>> > > > > > >
>> > > >>> > > > > > > --
>> > > >>> > > > > > >
>> > > >>> > > > > > > Best regards,
>> > > >>> > > > > > > Alexei Scherbakov
>> > > >>> > > > > > >
>> > > >>> > > > > >
>> > > >>> > > > >
>> > > >>> > > > >
>> > > >>> > > > >
>> > > >>> > > > > --
>> > > >>> > > > >
>> > > >>> > > > > Best regards,
>> > > >>> > > > > Alexei Scherbakov
>> > > >>> > > > >
>> > > >>> > > >
>> > > >>> > >
>> > > >>> >
>> > > >>> >
>> > > >>> >
>> > > >>> > --
>> > > >>> >
>> > > >>> > Best regards,
>> > > >>> > Alexei Scherbakov
>> > > >>> >
>> > > >>>
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >>
>> > > >> Best regards,
>> > > >> Alexei Scherbakov
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Best regards,
>> > > > Alexei Scherbakov
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > > Alexei Scherbakov
>> > >
>> >
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Mon, Jan 23, 2017 at 11:16 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Dmitriy,
>
> This still can make sense for some scenarios, because we could limit number
> of initial map requests reducing overall query overhead.
>
> Are you still sure we need to throw an exception ?
>

The outcome and the resulting behavior needs to be absolutely clear to our
users. If we can't provide any sort of guarantee here, I would disallow it
altogether.


>
> 2017-01-23 1:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > On Sun, Jan 22, 2017 at 5:06 AM, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > Yes, it will be possible because distributed joins are executed using
> > > broadcast queries.
> > >
> > >
> > In this case why even bother supporting non-collocated joins? We need to
> > throw an exception in this case.
> >
> > 2017-01-22 15:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > >
> > > > On Sun, Jan 22, 2017 at 4:46 AM, Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > ScanQueries currently support only one partition. I will extend it
> to
> > > > > support multiple partitions.
> > > > >
> > > > > For distributed joins partitions will only be applied on "map"
> query
> > > > step.
> > > > >
> > > >
> > > > Will it still be possible to get data from the partitions that were
> not
> > > > specified?
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Dmitriy,

This still can make sense for some scenarios, because we could limit number
of initial map requests reducing overall query overhead.

Are you still sure we need to throw an exception ?

2017-01-23 1:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Sun, Jan 22, 2017 at 5:06 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Yes, it will be possible because distributed joins are executed using
> > broadcast queries.
> >
> >
> In this case why even bother supporting non-collocated joins? We need to
> throw an exception in this case.
>
> 2017-01-22 15:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > On Sun, Jan 22, 2017 at 4:46 AM, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > Dmitriy,
> > > >
> > > > ScanQueries currently support only one partition. I will extend it to
> > > > support multiple partitions.
> > > >
> > > > For distributed joins partitions will only be applied on "map" query
> > > step.
> > > >
> > >
> > > Will it still be possible to get data from the partitions that were not
> > > specified?
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Sun, Jan 22, 2017 at 5:06 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Yes, it will be possible because distributed joins are executed using
> broadcast queries.
>
>
In this case why even bother supporting non-collocated joins? We need to
throw an exception in this case.

2017-01-22 15:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > On Sun, Jan 22, 2017 at 4:46 AM, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > Dmitriy,
> > >
> > > ScanQueries currently support only one partition. I will extend it to
> > > support multiple partitions.
> > >
> > > For distributed joins partitions will only be applied on "map" query
> > step.
> > >
> >
> > Will it still be possible to get data from the partitions that were not
> > specified?
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Yes, it will be possible because distributed joins are executed using
broadcast queries.

2017-01-22 15:49 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Sun, Jan 22, 2017 at 4:46 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Dmitriy,
> >
> > ScanQueries currently support only one partition. I will extend it to
> > support multiple partitions.
> >
> > For distributed joins partitions will only be applied on "map" query
> step.
> >
>
> Will it still be possible to get data from the partitions that were not
> specified?
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Sun, Jan 22, 2017 at 4:46 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Dmitriy,
>
> ScanQueries currently support only one partition. I will extend it to
> support multiple partitions.
>
> For distributed joins partitions will only be applied on "map" query step.
>

Will it still be possible to get data from the partitions that were not
specified?

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Dmitriy,

ScanQueries currently support only one partition. I will extend it to
support multiple partitions.

For distributed joins partitions will only be applied on "map" query step.

2017-01-21 21:19 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Sat, Jan 21, 2017 at 1:53 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > >> > > > 5. I have the same understanding. Distributed joins will ignore
> > the
> > >> > > > setting.
> > >> > > > This is not implemented yet..
> > >> > > >
> > >> > >
> > >> > > And again, this will be very confusing to users. Any chance we can
> > >> throw
> > >> > an
> > >> > > exception with a proper error message here?
> > >> > >
> > >> >
> > >> > I hope to make it working too. But first I need a review of current
> PR
> > >> > state to understand whether I'm moving in right direction or not.
> > >> >
> > >>
> > >> What behavior are you proposing to implement?
> > >>
> > >> Alexey, I have noticed in your comments that you are adding this
> support
> > >> only for the SQL queries. Why not make it consistent across all the
> > >> queries?
> > >>
> > >
> > > Initially I had no such intentions, because I do not use other query
> > types.
> > > But if the community has the need of this, why not.
> > > I'll start working on it next week.
> > >
> >
>
> Alexey, I think ScanQueries already had this support. If you are
> deprecating the old methods, then you need to move this logic to the new
> methods.
>
> Also, what behavior are you implementing for the distributed joins?
>
> D.
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Sat, Jan 21, 2017 at 1:53 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> >> > > > 5. I have the same understanding. Distributed joins will ignore
> the
> >> > > > setting.
> >> > > > This is not implemented yet..
> >> > > >
> >> > >
> >> > > And again, this will be very confusing to users. Any chance we can
> >> throw
> >> > an
> >> > > exception with a proper error message here?
> >> > >
> >> >
> >> > I hope to make it working too. But first I need a review of current PR
> >> > state to understand whether I'm moving in right direction or not.
> >> >
> >>
> >> What behavior are you proposing to implement?
> >>
> >> Alexey, I have noticed in your comments that you are adding this support
> >> only for the SQL queries. Why not make it consistent across all the
> >> queries?
> >>
> >
> > Initially I had no such intentions, because I do not use other query
> types.
> > But if the community has the need of this, why not.
> > I'll start working on it next week.
> >
>

Alexey, I think ScanQueries already had this support. If you are
deprecating the old methods, then you need to move this logic to the new
methods.

Also, what behavior are you implementing for the distributed joins?

D.

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

BTW, I still need SQL queries part review.

2017-01-21 12:52 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

>
>
> 2017-01-20 20:31 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
>> On Fri, Jan 20, 2017 at 7:00 AM, Alexei Scherbakov <
>> alexey.scherbakoff@gmail.com> wrote:
>>
>> > Dmitriy,
>> >
>> > Honestly, I don't understand your issues with a reviewing, because I've
>> > provided PR link in the JIRA ticket.
>> >
>> > Just open it in browser and enjoy :)
>> >
>> > Nevertheless, I've added short description for API changes in the JIRA
>> > ticket.
>> >
>>
>> Well, as you see, once you have added the changes to the ticket, you have
>> immediately received more comments from the community. Let us all make it
>> a
>> habit to document all API changes directly in the ticket at all times.
>>
>>
> OK, I got it. Probably this should be emphasised on contributions page.
>
>
>>
>> >
>> > Other comments are below.
>> >
>> > 2017-01-19 20:35 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> >
>> > >
>> > > 4. Query must be executed locally only for defined partitions.
>> Currently
>> > > > this setting is ignored for local queries.
>> > > >
>> > >
>> > > This is again the wrong behavior. We should not "ignore" anything.
>> Let's
>> > > throw an exception with a correct error message.
>> > >
>> >
>> > It's already working.
>> >
>>
>> What is the behavior that you have implemented?
>>
>
> Local query result set will containt data only from specified partitions.
>
>
>>
>>
>> > >
>> > >
>> > > > 5. I have the same understanding. Distributed joins will ignore the
>> > > > setting.
>> > > > This is not implemented yet..
>> > > >
>> > >
>> > > And again, this will be very confusing to users. Any chance we can
>> throw
>> > an
>> > > exception with a proper error message here?
>> > >
>> >
>> > I hope to make it working too. But first I need a review of current PR
>> > state to understand whether I'm moving in right direction or not.
>> >
>>
>> What behavior are you proposing to implement?
>>
>> Alexey, I have noticed in your comments that you are adding this support
>> only for the SQL queries. Why not make it consistent across all the
>> queries?
>>
>
> Initially I had no such intentions, because I do not use other query types.
> But if the community has the need of this, why not.
> I'll start working on it next week.
>
>
>>
>> D.
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

2017-01-20 20:31 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Fri, Jan 20, 2017 at 7:00 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Dmitriy,
> >
> > Honestly, I don't understand your issues with a reviewing, because I've
> > provided PR link in the JIRA ticket.
> >
> > Just open it in browser and enjoy :)
> >
> > Nevertheless, I've added short description for API changes in the JIRA
> > ticket.
> >
>
> Well, as you see, once you have added the changes to the ticket, you have
> immediately received more comments from the community. Let us all make it a
> habit to document all API changes directly in the ticket at all times.
>
>
OK, I got it. Probably this should be emphasised on contributions page.


>
> >
> > Other comments are below.
> >
> > 2017-01-19 20:35 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > >
> > > 4. Query must be executed locally only for defined partitions.
> Currently
> > > > this setting is ignored for local queries.
> > > >
> > >
> > > This is again the wrong behavior. We should not "ignore" anything.
> Let's
> > > throw an exception with a correct error message.
> > >
> >
> > It's already working.
> >
>
> What is the behavior that you have implemented?
>

Local query result set will containt data only from specified partitions.


>
>
> > >
> > >
> > > > 5. I have the same understanding. Distributed joins will ignore the
> > > > setting.
> > > > This is not implemented yet..
> > > >
> > >
> > > And again, this will be very confusing to users. Any chance we can
> throw
> > an
> > > exception with a proper error message here?
> > >
> >
> > I hope to make it working too. But first I need a review of current PR
> > state to understand whether I'm moving in right direction or not.
> >
>
> What behavior are you proposing to implement?
>
> Alexey, I have noticed in your comments that you are adding this support
> only for the SQL queries. Why not make it consistent across all the
> queries?
>

Initially I had no such intentions, because I do not use other query types.
But if the community has the need of this, why not.
I'll start working on it next week.


>
> D.
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Fri, Jan 20, 2017 at 7:00 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Dmitriy,
>
> Honestly, I don't understand your issues with a reviewing, because I've
> provided PR link in the JIRA ticket.
>
> Just open it in browser and enjoy :)
>
> Nevertheless, I've added short description for API changes in the JIRA
> ticket.
>

Well, as you see, once you have added the changes to the ticket, you have
immediately received more comments from the community. Let us all make it a
habit to document all API changes directly in the ticket at all times.


>
> Other comments are below.
>
> 2017-01-19 20:35 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> >
> > 4. Query must be executed locally only for defined partitions. Currently
> > > this setting is ignored for local queries.
> > >
> >
> > This is again the wrong behavior. We should not "ignore" anything. Let's
> > throw an exception with a correct error message.
> >
>
> It's already working.
>

What is the behavior that you have implemented?


> >
> >
> > > 5. I have the same understanding. Distributed joins will ignore the
> > > setting.
> > > This is not implemented yet..
> > >
> >
> > And again, this will be very confusing to users. Any chance we can throw
> an
> > exception with a proper error message here?
> >
>
> I hope to make it working too. But first I need a review of current PR
> state to understand whether I'm moving in right direction or not.
>

What behavior are you proposing to implement?

Alexey, I have noticed in your comments that you are adding this support
only for the SQL queries. Why not make it consistent across all the queries?

D.

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Dmitriy,

Honestly, I don't understand your issues with a reviewing, because I've
provided PR link in the JIRA ticket.

Just open it in browser and enjoy :)

Nevertheless, I've added short description for API changes in the JIRA
ticket.

Other comments are below.

2017-01-19 20:35 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Alexey S,
>
> My comments are below. I would like to ask you again to provide all the API
> changes explicitly in the Jira ticket without asking the community to
> download the whole GIT repo.
>
> D.
>
> On Thu, Jan 19, 2017 at 6:27 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > 1. OK.
> >
>
> Agree.
>
>
> >
> > 2. Agreed. In future we might split query execution between nodes, but
> for
> > now query is routed to random node in grid,
> >
>
> If you are talking about REPLICATED caches, then sending query to a random
> node when a user explicitly specifies the partitions is just deceiving. I
> would throw an exception in case if a user specifies partitions for
> REPLICATED caches.
>
> Exception is added.



>
> >
> > 3. OK, let's mark getter/setter as deprecated.
> >
>
> I still do not know the proposed new API and why we are deprecating methods
> on the old API. I have asked many times to post all the API changes in the
> ticket.
>
> Alexey S., can you please do it, so the community members can review them
> without installing the project on their mobile devices?
>

Description is added to the JIRA ticket.


>
> 4. Query must be executed locally only for defined partitions. Currently
> > this setting is ignored for local queries.
> >
>
> This is again the wrong behavior. We should not "ignore" anything. Let's
> throw an exception with a correct error message.
>

It's already working.


>
>
> > 5. I have the same understanding. Distributed joins will ignore the
> > setting.
> > This is not implemented yet..
> >
>
> And again, this will be very confusing to users. Any chance we can throw an
> exception with a proper error message here?
>

I hope to make it working too. But first I need a review of current PR
state to understand whether I'm moving in right direction or not.


>
>
> >
> >
> > 2017-01-19 15:39 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
> >
> > > Agree, lets remove everything related to partition ranges. Looks like
> > > unnecessary complication.
> > >
> > > Sergi
> > >
> > > 2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> > >
> > > > Several side notes about API.
> > > >
> > > > 1) I would avoid ranges even in this form.for the sake of simplicity.
> > > > Ignite do not have any notion of "partition range" in affinity API,
> so
> > I
> > > do
> > > > not understand how users are going to work on ranges unless they have
> > > some
> > > > very special custom affinity function, which is rather unlikely case.
> > > >
> > > > 2) The fact that this property is ignored in REPLICATED cache is
> > > confusing.
> > > > REPLICATED cache still divides partitions into primaries and backups.
> > If
> > > I
> > > > have very large data set and want to execute some query, I would
> > > definitely
> > > > expect that Ignite will take advantage of distributed computing and
> > > spread
> > > > the load between nodes. I understand that currently SQL queries do
> not
> > > work
> > > > this way, but this is clear disadvantage for certain scenarios, which
> > we
> > > > may improve in future. I would remove this paragraph from docs.
> > > >
> > > > 3) We already have ScanQuery.partition getter/setter. We need to make
> > > sure
> > > > that they are "merged" somehow. For instance, we may deprecate two
> > > methods
> > > > in ScanQuery class, and advise users to use Query.partitions, with
> > > > clarification - only single partition is supported for ScanQuery at
> the
> > > > moment.
> > > >
> > > > 4) What should happen if "partitions" are defined and "local" flag is
> > > set?
> > > >
> > > > As per distributed joins - how are we going to execute them when
> > > partitions
> > > > are set explicitly? As far as I understand, partitions should apply
> > only
> > > to
> > > > map step and only for the cache query was created from, This way
> > > > distributed join execution should effectively ignore partitions?
> > > >
> > > > Vladimir.
> > > >
> > > >
> > > > On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > I mean distributed joins.
> > > > >
> > > > > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com>
> > > > > :
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > I've finished adding API changes and implemented proper nodes
> > > routing.
> > > > > >
> > > > > > Currently it doesn't work with distributed queries.But I think
> this
> > > > > > feature should be compatible with it.
> > > > > >
> > > > > > Could anyone take a look at current branch state while I'm
> looking
> > > > deeper
> > > > > > into dsitributed queries code?
> > > > > >
> > > > > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> > > > > > PR: https://github.com/apache/ignite/pull/1418
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com
> > > > > > >:
> > > > > >
> > > > > >> OK, let's do it this way.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <
> > sergi.vladykin@gmail.com
> > > >:
> > > > > >>
> > > > > >>> Internally we still use int[] when we send partitions (see
> > > > > >>> GridH2QueryRequest.parts). It looks like we only do more work
> > with
> > > > > >>> PartitionSet.
> > > > > >>>
> > > > > >>> I like the idea of bitset for partitions, but
> > > > > >>>
> > > > > >>> 1. We have to change internals first to use it, otherwise the
> > > > > >>> optimization
> > > > > >>> makes no sense.
> > > > > >>> 2. We will need to have a method SqlQuery.setPartitions(int...
> > > parts)
> > > > > for
> > > > > >>> usability reasons anyways.
> > > > > >>>
> > > > > >>> Thus I suggest for now to go the straightforward way with int[]
> > and
> > > > > >>> create
> > > > > >>> a separate ticket describing the optimization with bitset.
> > > > > >>>
> > > > > >>> Sergi
> > > > > >>>
> > > > > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
> > > > > >>> alexey.scherbakoff@gmail.com>:
> > > > > >>>
> > > > > >>> > PartitionSet hides internal implementation of int array.
> > > > > >>> >
> > > > > >>> > This allows as to efficiently represent contiguous range of
> > > > > partitions
> > > > > >>> and
> > > > > >>> > defines clear API for ordered iteration over partitions and
> > > > > containment
> > > > > >>> > check.
> > > > > >>> >
> > > > > >>> > Even better to go with compressed bitmap, as I mentioned in
> > > ticket
> > > > > >>> comment.
> > > > > >>> > This will allow us to minimize heap footprint for this
> object.
> > > > > >>> >
> > > > > >>> > Moreover, it will be useful to create reusable compressed
> > bitmap
> > > > > >>> > implementation in Ignite and use it in other cases, on
> example,
> > > for
> > > > > >>> > replacing H2's IntArray and Set<Integer>.
> > > > > >>> >
> > > > > >>> > Should I create a ticket for this ?
> > > > > >>> >
> > > > > >>> > .
> > > > > >>> >
> > > > > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <
> > > dsetrakyan@apache.org
> > > > >:
> > > > > >>> >
> > > > > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> > > > > >>> > sergi.vladykin@gmail.com>
> > > > > >>> > > wrote:
> > > > > >>> > >
> > > > > >>> > > > I looked at the code. The PartitionSet concept looks
> > > > > >>> overengineered to
> > > > > >>> > > me,
> > > > > >>> > > > why wouldn't we just go with int[]?
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> > > Agree.
> > > > > >>> > >
> > > > > >>> > >
> > > > > >>> > > >
> > > > > >>> > > > Sergi
> > > > > >>> > > >
> > > > > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > > > > >>> > > alexey.scherbakoff@gmail.com
> > > > > >>> > > > >:
> > > > > >>> > > >
> > > > > >>> > > > > Done.
> > > > > >>> > > > >
> > > > > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> > > > > >>> dsetrakyan@apache.org
> > > > > >>> > >:
> > > > > >>> > > > >
> > > > > >>> > > > > > Alexey,
> > > > > >>> > > > > >
> > > > > >>> > > > > > I am not sure I am seeing the API changes documented
> in
> > > the
> > > > > >>> ticket.
> > > > > >>> > > Can
> > > > > >>> > > > > you
> > > > > >>> > > > > > please either document them or add GIT links for the
> > new
> > > > > >>> classes?
> > > > > >>> > > > > >
> > > > > >>> > > > > > D.
> > > > > >>> > > > > >
> > > > > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > > > >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > >>> > > > > >
> > > > > >>> > > > > > > Guys,
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > I've just submitted a PR for
> > > > > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > Please review API changes while waiting for TC
> > results.
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > --
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > Best regards,
> > > > > >>> > > > > > > Alexei Scherbakov
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > >
> > > > > >>> > > > > --
> > > > > >>> > > > >
> > > > > >>> > > > > Best regards,
> > > > > >>> > > > > Alexei Scherbakov
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>> >
> > > > > >>> >
> > > > > >>> > --
> > > > > >>> >
> > > > > >>> > Best regards,
> > > > > >>> > Alexei Scherbakov
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Alexei Scherbakov
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Alexey S,

My comments are below. I would like to ask you again to provide all the API
changes explicitly in the Jira ticket without asking the community to
download the whole GIT repo.

D.

On Thu, Jan 19, 2017 at 6:27 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> 1. OK.
>

Agree.


>
> 2. Agreed. In future we might split query execution between nodes, but for
> now query is routed to random node in grid,
>

If you are talking about REPLICATED caches, then sending query to a random
node when a user explicitly specifies the partitions is just deceiving. I
would throw an exception in case if a user specifies partitions for
REPLICATED caches.


>
> 3. OK, let's mark getter/setter as deprecated.
>

I still do not know the proposed new API and why we are deprecating methods
on the old API. I have asked many times to post all the API changes in the
ticket.

Alexey S., can you please do it, so the community members can review them
without installing the project on their mobile devices?

4. Query must be executed locally only for defined partitions. Currently
> this setting is ignored for local queries.
>

This is again the wrong behavior. We should not "ignore" anything. Let's
throw an exception with a correct error message.


> 5. I have the same understanding. Distributed joins will ignore the
> setting.
> This is not implemented yet..
>

And again, this will be very confusing to users. Any chance we can throw an
exception with a proper error message here?


>
>
> 2017-01-19 15:39 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
>
> > Agree, lets remove everything related to partition ranges. Looks like
> > unnecessary complication.
> >
> > Sergi
> >
> > 2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
> >
> > > Several side notes about API.
> > >
> > > 1) I would avoid ranges even in this form.for the sake of simplicity.
> > > Ignite do not have any notion of "partition range" in affinity API, so
> I
> > do
> > > not understand how users are going to work on ranges unless they have
> > some
> > > very special custom affinity function, which is rather unlikely case.
> > >
> > > 2) The fact that this property is ignored in REPLICATED cache is
> > confusing.
> > > REPLICATED cache still divides partitions into primaries and backups.
> If
> > I
> > > have very large data set and want to execute some query, I would
> > definitely
> > > expect that Ignite will take advantage of distributed computing and
> > spread
> > > the load between nodes. I understand that currently SQL queries do not
> > work
> > > this way, but this is clear disadvantage for certain scenarios, which
> we
> > > may improve in future. I would remove this paragraph from docs.
> > >
> > > 3) We already have ScanQuery.partition getter/setter. We need to make
> > sure
> > > that they are "merged" somehow. For instance, we may deprecate two
> > methods
> > > in ScanQuery class, and advise users to use Query.partitions, with
> > > clarification - only single partition is supported for ScanQuery at the
> > > moment.
> > >
> > > 4) What should happen if "partitions" are defined and "local" flag is
> > set?
> > >
> > > As per distributed joins - how are we going to execute them when
> > partitions
> > > are set explicitly? As far as I understand, partitions should apply
> only
> > to
> > > map step and only for the cache query was created from, This way
> > > distributed join execution should effectively ignore partitions?
> > >
> > > Vladimir.
> > >
> > >
> > > On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > I mean distributed joins.
> > > >
> > > > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com>
> > > > :
> > > >
> > > > > Guys,
> > > > >
> > > > > I've finished adding API changes and implemented proper nodes
> > routing.
> > > > >
> > > > > Currently it doesn't work with distributed queries.But I think this
> > > > > feature should be compatible with it.
> > > > >
> > > > > Could anyone take a look at current branch state while I'm looking
> > > deeper
> > > > > into dsitributed queries code?
> > > > >
> > > > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> > > > > PR: https://github.com/apache/ignite/pull/1418
> > > > >
> > > > >
> > > > >
> > > > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com
> > > > > >:
> > > > >
> > > > >> OK, let's do it this way.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <
> sergi.vladykin@gmail.com
> > >:
> > > > >>
> > > > >>> Internally we still use int[] when we send partitions (see
> > > > >>> GridH2QueryRequest.parts). It looks like we only do more work
> with
> > > > >>> PartitionSet.
> > > > >>>
> > > > >>> I like the idea of bitset for partitions, but
> > > > >>>
> > > > >>> 1. We have to change internals first to use it, otherwise the
> > > > >>> optimization
> > > > >>> makes no sense.
> > > > >>> 2. We will need to have a method SqlQuery.setPartitions(int...
> > parts)
> > > > for
> > > > >>> usability reasons anyways.
> > > > >>>
> > > > >>> Thus I suggest for now to go the straightforward way with int[]
> and
> > > > >>> create
> > > > >>> a separate ticket describing the optimization with bitset.
> > > > >>>
> > > > >>> Sergi
> > > > >>>
> > > > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
> > > > >>> alexey.scherbakoff@gmail.com>:
> > > > >>>
> > > > >>> > PartitionSet hides internal implementation of int array.
> > > > >>> >
> > > > >>> > This allows as to efficiently represent contiguous range of
> > > > partitions
> > > > >>> and
> > > > >>> > defines clear API for ordered iteration over partitions and
> > > > containment
> > > > >>> > check.
> > > > >>> >
> > > > >>> > Even better to go with compressed bitmap, as I mentioned in
> > ticket
> > > > >>> comment.
> > > > >>> > This will allow us to minimize heap footprint for this object.
> > > > >>> >
> > > > >>> > Moreover, it will be useful to create reusable compressed
> bitmap
> > > > >>> > implementation in Ignite and use it in other cases, on example,
> > for
> > > > >>> > replacing H2's IntArray and Set<Integer>.
> > > > >>> >
> > > > >>> > Should I create a ticket for this ?
> > > > >>> >
> > > > >>> > .
> > > > >>> >
> > > > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >:
> > > > >>> >
> > > > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> > > > >>> > sergi.vladykin@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > >
> > > > >>> > > > I looked at the code. The PartitionSet concept looks
> > > > >>> overengineered to
> > > > >>> > > me,
> > > > >>> > > > why wouldn't we just go with int[]?
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > Agree.
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > >
> > > > >>> > > > Sergi
> > > > >>> > > >
> > > > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > > > >>> > > alexey.scherbakoff@gmail.com
> > > > >>> > > > >:
> > > > >>> > > >
> > > > >>> > > > > Done.
> > > > >>> > > > >
> > > > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> > > > >>> dsetrakyan@apache.org
> > > > >>> > >:
> > > > >>> > > > >
> > > > >>> > > > > > Alexey,
> > > > >>> > > > > >
> > > > >>> > > > > > I am not sure I am seeing the API changes documented in
> > the
> > > > >>> ticket.
> > > > >>> > > Can
> > > > >>> > > > > you
> > > > >>> > > > > > please either document them or add GIT links for the
> new
> > > > >>> classes?
> > > > >>> > > > > >
> > > > >>> > > > > > D.
> > > > >>> > > > > >
> > > > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > > >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >>> > > > > >
> > > > >>> > > > > > > Guys,
> > > > >>> > > > > > >
> > > > >>> > > > > > > I've just submitted a PR for
> > > > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > > >>> > > > > > >
> > > > >>> > > > > > > Please review API changes while waiting for TC
> results.
> > > > >>> > > > > > >
> > > > >>> > > > > > > --
> > > > >>> > > > > > >
> > > > >>> > > > > > > Best regards,
> > > > >>> > > > > > > Alexei Scherbakov
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > >
> > > > >>> > > > > --
> > > > >>> > > > >
> > > > >>> > > > > Best regards,
> > > > >>> > > > > Alexei Scherbakov
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > --
> > > > >>> >
> > > > >>> > Best regards,
> > > > >>> > Alexei Scherbakov
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Best regards,
> > > > >> Alexei Scherbakov
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

1. OK.

2. Agreed. In future we might split query execution between nodes, but for
now query is routed to random node in grid,

3. OK, let's mark getter/setter as deprecated.

4. Query must be executed locally only for defined partitions. Currently
this setting is ignored for local queries.

5. I have the same understanding. Distributed joins will ignore the setting.
This is not implemented yet..


2017-01-19 15:39 GMT+03:00 Sergi Vladykin <se...@gmail.com>:

> Agree, lets remove everything related to partition ranges. Looks like
> unnecessary complication.
>
> Sergi
>
> 2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:
>
> > Several side notes about API.
> >
> > 1) I would avoid ranges even in this form.for the sake of simplicity.
> > Ignite do not have any notion of "partition range" in affinity API, so I
> do
> > not understand how users are going to work on ranges unless they have
> some
> > very special custom affinity function, which is rather unlikely case.
> >
> > 2) The fact that this property is ignored in REPLICATED cache is
> confusing.
> > REPLICATED cache still divides partitions into primaries and backups. If
> I
> > have very large data set and want to execute some query, I would
> definitely
> > expect that Ignite will take advantage of distributed computing and
> spread
> > the load between nodes. I understand that currently SQL queries do not
> work
> > this way, but this is clear disadvantage for certain scenarios, which we
> > may improve in future. I would remove this paragraph from docs.
> >
> > 3) We already have ScanQuery.partition getter/setter. We need to make
> sure
> > that they are "merged" somehow. For instance, we may deprecate two
> methods
> > in ScanQuery class, and advise users to use Query.partitions, with
> > clarification - only single partition is supported for ScanQuery at the
> > moment.
> >
> > 4) What should happen if "partitions" are defined and "local" flag is
> set?
> >
> > As per distributed joins - how are we going to execute them when
> partitions
> > are set explicitly? As far as I understand, partitions should apply only
> to
> > map step and only for the cache query was created from, This way
> > distributed join execution should effectively ignore partitions?
> >
> > Vladimir.
> >
> >
> > On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > I mean distributed joins.
> > >
> > > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com>
> > > :
> > >
> > > > Guys,
> > > >
> > > > I've finished adding API changes and implemented proper nodes
> routing.
> > > >
> > > > Currently it doesn't work with distributed queries.But I think this
> > > > feature should be compatible with it.
> > > >
> > > > Could anyone take a look at current branch state while I'm looking
> > deeper
> > > > into dsitributed queries code?
> > > >
> > > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> > > > PR: https://github.com/apache/ignite/pull/1418
> > > >
> > > >
> > > >
> > > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com
> > > > >:
> > > >
> > > >> OK, let's do it this way.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <sergi.vladykin@gmail.com
> >:
> > > >>
> > > >>> Internally we still use int[] when we send partitions (see
> > > >>> GridH2QueryRequest.parts). It looks like we only do more work with
> > > >>> PartitionSet.
> > > >>>
> > > >>> I like the idea of bitset for partitions, but
> > > >>>
> > > >>> 1. We have to change internals first to use it, otherwise the
> > > >>> optimization
> > > >>> makes no sense.
> > > >>> 2. We will need to have a method SqlQuery.setPartitions(int...
> parts)
> > > for
> > > >>> usability reasons anyways.
> > > >>>
> > > >>> Thus I suggest for now to go the straightforward way with int[] and
> > > >>> create
> > > >>> a separate ticket describing the optimization with bitset.
> > > >>>
> > > >>> Sergi
> > > >>>
> > > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
> > > >>> alexey.scherbakoff@gmail.com>:
> > > >>>
> > > >>> > PartitionSet hides internal implementation of int array.
> > > >>> >
> > > >>> > This allows as to efficiently represent contiguous range of
> > > partitions
> > > >>> and
> > > >>> > defines clear API for ordered iteration over partitions and
> > > containment
> > > >>> > check.
> > > >>> >
> > > >>> > Even better to go with compressed bitmap, as I mentioned in
> ticket
> > > >>> comment.
> > > >>> > This will allow us to minimize heap footprint for this object.
> > > >>> >
> > > >>> > Moreover, it will be useful to create reusable compressed bitmap
> > > >>> > implementation in Ignite and use it in other cases, on example,
> for
> > > >>> > replacing H2's IntArray and Set<Integer>.
> > > >>> >
> > > >>> > Should I create a ticket for this ?
> > > >>> >
> > > >>> > .
> > > >>> >
> > > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > > >>> >
> > > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> > > >>> > sergi.vladykin@gmail.com>
> > > >>> > > wrote:
> > > >>> > >
> > > >>> > > > I looked at the code. The PartitionSet concept looks
> > > >>> overengineered to
> > > >>> > > me,
> > > >>> > > > why wouldn't we just go with int[]?
> > > >>> > > >
> > > >>> > >
> > > >>> > > Agree.
> > > >>> > >
> > > >>> > >
> > > >>> > > >
> > > >>> > > > Sergi
> > > >>> > > >
> > > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > > >>> > > alexey.scherbakoff@gmail.com
> > > >>> > > > >:
> > > >>> > > >
> > > >>> > > > > Done.
> > > >>> > > > >
> > > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> > > >>> dsetrakyan@apache.org
> > > >>> > >:
> > > >>> > > > >
> > > >>> > > > > > Alexey,
> > > >>> > > > > >
> > > >>> > > > > > I am not sure I am seeing the API changes documented in
> the
> > > >>> ticket.
> > > >>> > > Can
> > > >>> > > > > you
> > > >>> > > > > > please either document them or add GIT links for the new
> > > >>> classes?
> > > >>> > > > > >
> > > >>> > > > > > D.
> > > >>> > > > > >
> > > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > >>> > > > > >
> > > >>> > > > > > > Guys,
> > > >>> > > > > > >
> > > >>> > > > > > > I've just submitted a PR for
> > > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > >>> > > > > > >
> > > >>> > > > > > > Please review API changes while waiting for TC results.
> > > >>> > > > > > >
> > > >>> > > > > > > --
> > > >>> > > > > > >
> > > >>> > > > > > > Best regards,
> > > >>> > > > > > > Alexei Scherbakov
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > --
> > > >>> > > > >
> > > >>> > > > > Best regards,
> > > >>> > > > > Alexei Scherbakov
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> >
> > > >>> > Best regards,
> > > >>> > Alexei Scherbakov
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Best regards,
> > > >> Alexei Scherbakov
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Sergi Vladykin <se...@gmail.com>.

Agree, lets remove everything related to partition ranges. Looks like
unnecessary complication.

Sergi

2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <vo...@gridgain.com>:

> Several side notes about API.
>
> 1) I would avoid ranges even in this form.for the sake of simplicity.
> Ignite do not have any notion of "partition range" in affinity API, so I do
> not understand how users are going to work on ranges unless they have some
> very special custom affinity function, which is rather unlikely case.
>
> 2) The fact that this property is ignored in REPLICATED cache is confusing.
> REPLICATED cache still divides partitions into primaries and backups. If I
> have very large data set and want to execute some query, I would definitely
> expect that Ignite will take advantage of distributed computing and spread
> the load between nodes. I understand that currently SQL queries do not work
> this way, but this is clear disadvantage for certain scenarios, which we
> may improve in future. I would remove this paragraph from docs.
>
> 3) We already have ScanQuery.partition getter/setter. We need to make sure
> that they are "merged" somehow. For instance, we may deprecate two methods
> in ScanQuery class, and advise users to use Query.partitions, with
> clarification - only single partition is supported for ScanQuery at the
> moment.
>
> 4) What should happen if "partitions" are defined and "local" flag is set?
>
> As per distributed joins - how are we going to execute them when partitions
> are set explicitly? As far as I understand, partitions should apply only to
> map step and only for the cache query was created from, This way
> distributed join execution should effectively ignore partitions?
>
> Vladimir.
>
>
> On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > I mean distributed joins.
> >
> > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
> alexey.scherbakoff@gmail.com>
> > :
> >
> > > Guys,
> > >
> > > I've finished adding API changes and implemented proper nodes routing.
> > >
> > > Currently it doesn't work with distributed queries.But I think this
> > > feature should be compatible with it.
> > >
> > > Could anyone take a look at current branch state while I'm looking
> deeper
> > > into dsitributed queries code?
> > >
> > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> > > PR: https://github.com/apache/ignite/pull/1418
> > >
> > >
> > >
> > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com
> > > >:
> > >
> > >> OK, let's do it this way.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
> > >>
> > >>> Internally we still use int[] when we send partitions (see
> > >>> GridH2QueryRequest.parts). It looks like we only do more work with
> > >>> PartitionSet.
> > >>>
> > >>> I like the idea of bitset for partitions, but
> > >>>
> > >>> 1. We have to change internals first to use it, otherwise the
> > >>> optimization
> > >>> makes no sense.
> > >>> 2. We will need to have a method SqlQuery.setPartitions(int... parts)
> > for
> > >>> usability reasons anyways.
> > >>>
> > >>> Thus I suggest for now to go the straightforward way with int[] and
> > >>> create
> > >>> a separate ticket describing the optimization with bitset.
> > >>>
> > >>> Sergi
> > >>>
> > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
> > >>> alexey.scherbakoff@gmail.com>:
> > >>>
> > >>> > PartitionSet hides internal implementation of int array.
> > >>> >
> > >>> > This allows as to efficiently represent contiguous range of
> > partitions
> > >>> and
> > >>> > defines clear API for ordered iteration over partitions and
> > containment
> > >>> > check.
> > >>> >
> > >>> > Even better to go with compressed bitmap, as I mentioned in ticket
> > >>> comment.
> > >>> > This will allow us to minimize heap footprint for this object.
> > >>> >
> > >>> > Moreover, it will be useful to create reusable compressed bitmap
> > >>> > implementation in Ignite and use it in other cases, on example, for
> > >>> > replacing H2's IntArray and Set<Integer>.
> > >>> >
> > >>> > Should I create a ticket for this ?
> > >>> >
> > >>> > .
> > >>> >
> > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > >>> >
> > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> > >>> > sergi.vladykin@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > I looked at the code. The PartitionSet concept looks
> > >>> overengineered to
> > >>> > > me,
> > >>> > > > why wouldn't we just go with int[]?
> > >>> > > >
> > >>> > >
> > >>> > > Agree.
> > >>> > >
> > >>> > >
> > >>> > > >
> > >>> > > > Sergi
> > >>> > > >
> > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > >>> > > alexey.scherbakoff@gmail.com
> > >>> > > > >:
> > >>> > > >
> > >>> > > > > Done.
> > >>> > > > >
> > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> > >>> dsetrakyan@apache.org
> > >>> > >:
> > >>> > > > >
> > >>> > > > > > Alexey,
> > >>> > > > > >
> > >>> > > > > > I am not sure I am seeing the API changes documented in the
> > >>> ticket.
> > >>> > > Can
> > >>> > > > > you
> > >>> > > > > > please either document them or add GIT links for the new
> > >>> classes?
> > >>> > > > > >
> > >>> > > > > > D.
> > >>> > > > > >
> > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > >>> > > > > >
> > >>> > > > > > > Guys,
> > >>> > > > > > >
> > >>> > > > > > > I've just submitted a PR for
> > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > >>> > > > > > >
> > >>> > > > > > > Please review API changes while waiting for TC results.
> > >>> > > > > > >
> > >>> > > > > > > --
> > >>> > > > > > >
> > >>> > > > > > > Best regards,
> > >>> > > > > > > Alexei Scherbakov
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > --
> > >>> > > > >
> > >>> > > > > Best regards,
> > >>> > > > > Alexei Scherbakov
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> >
> > >>> > Best regards,
> > >>> > Alexei Scherbakov
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Best regards,
> > >> Alexei Scherbakov
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Several side notes about API.

1) I would avoid ranges even in this form.for the sake of simplicity.
Ignite do not have any notion of "partition range" in affinity API, so I do
not understand how users are going to work on ranges unless they have some
very special custom affinity function, which is rather unlikely case.

2) The fact that this property is ignored in REPLICATED cache is confusing.
REPLICATED cache still divides partitions into primaries and backups. If I
have very large data set and want to execute some query, I would definitely
expect that Ignite will take advantage of distributed computing and spread
the load between nodes. I understand that currently SQL queries do not work
this way, but this is clear disadvantage for certain scenarios, which we
may improve in future. I would remove this paragraph from docs.

3) We already have ScanQuery.partition getter/setter. We need to make sure
that they are "merged" somehow. For instance, we may deprecate two methods
in ScanQuery class, and advise users to use Query.partitions, with
clarification - only single partition is supported for ScanQuery at the
moment.

4) What should happen if "partitions" are defined and "local" flag is set?

As per distributed joins - how are we going to execute them when partitions
are set explicitly? As far as I understand, partitions should apply only to
map step and only for the cache query was created from, This way
distributed join execution should effectively ignore partitions?

Vladimir.


On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> I mean distributed joins.
>
> 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <al...@gmail.com>
> :
>
> > Guys,
> >
> > I've finished adding API changes and implemented proper nodes routing.
> >
> > Currently it doesn't work with distributed queries.But I think this
> > feature should be compatible with it.
> >
> > Could anyone take a look at current branch state while I'm looking deeper
> > into dsitributed queries code?
> >
> > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> > PR: https://github.com/apache/ignite/pull/1418
> >
> >
> >
> > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> > >:
> >
> >> OK, let's do it this way.
> >>
> >>
> >>
> >>
> >>
> >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
> >>
> >>> Internally we still use int[] when we send partitions (see
> >>> GridH2QueryRequest.parts). It looks like we only do more work with
> >>> PartitionSet.
> >>>
> >>> I like the idea of bitset for partitions, but
> >>>
> >>> 1. We have to change internals first to use it, otherwise the
> >>> optimization
> >>> makes no sense.
> >>> 2. We will need to have a method SqlQuery.setPartitions(int... parts)
> for
> >>> usability reasons anyways.
> >>>
> >>> Thus I suggest for now to go the straightforward way with int[] and
> >>> create
> >>> a separate ticket describing the optimization with bitset.
> >>>
> >>> Sergi
> >>>
> >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
> >>> alexey.scherbakoff@gmail.com>:
> >>>
> >>> > PartitionSet hides internal implementation of int array.
> >>> >
> >>> > This allows as to efficiently represent contiguous range of
> partitions
> >>> and
> >>> > defines clear API for ordered iteration over partitions and
> containment
> >>> > check.
> >>> >
> >>> > Even better to go with compressed bitmap, as I mentioned in ticket
> >>> comment.
> >>> > This will allow us to minimize heap footprint for this object.
> >>> >
> >>> > Moreover, it will be useful to create reusable compressed bitmap
> >>> > implementation in Ignite and use it in other cases, on example, for
> >>> > replacing H2's IntArray and Set<Integer>.
> >>> >
> >>> > Should I create a ticket for this ?
> >>> >
> >>> > .
> >>> >
> >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >>> >
> >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> >>> > sergi.vladykin@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > I looked at the code. The PartitionSet concept looks
> >>> overengineered to
> >>> > > me,
> >>> > > > why wouldn't we just go with int[]?
> >>> > > >
> >>> > >
> >>> > > Agree.
> >>> > >
> >>> > >
> >>> > > >
> >>> > > > Sergi
> >>> > > >
> >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> >>> > > alexey.scherbakoff@gmail.com
> >>> > > > >:
> >>> > > >
> >>> > > > > Done.
> >>> > > > >
> >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> >>> dsetrakyan@apache.org
> >>> > >:
> >>> > > > >
> >>> > > > > > Alexey,
> >>> > > > > >
> >>> > > > > > I am not sure I am seeing the API changes documented in the
> >>> ticket.
> >>> > > Can
> >>> > > > > you
> >>> > > > > > please either document them or add GIT links for the new
> >>> classes?
> >>> > > > > >
> >>> > > > > > D.
> >>> > > > > >
> >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> >>> > > > > > alexey.scherbakoff@gmail.com> wrote:
> >>> > > > > >
> >>> > > > > > > Guys,
> >>> > > > > > >
> >>> > > > > > > I've just submitted a PR for
> >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> >>> > > > > > >
> >>> > > > > > > Please review API changes while waiting for TC results.
> >>> > > > > > >
> >>> > > > > > > --
> >>> > > > > > >
> >>> > > > > > > Best regards,
> >>> > > > > > > Alexei Scherbakov
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > --
> >>> > > > >
> >>> > > > > Best regards,
> >>> > > > > Alexei Scherbakov
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Best regards,
> >>> > Alexei Scherbakov
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Best regards,
> >> Alexei Scherbakov
> >>
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

I mean distributed joins.

2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> Guys,
>
> I've finished adding API changes and implemented proper nodes routing.
>
> Currently it doesn't work with distributed queries.But I think this
> feature should be compatible with it.
>
> Could anyone take a look at current branch state while I'm looking deeper
> into dsitributed queries code?
>
> Issue: https://issues.apache.org/jira/browse/IGNITE-4523
> PR: https://github.com/apache/ignite/pull/1418
>
>
>
> 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <alexey.scherbakoff@gmail.com
> >:
>
>> OK, let's do it this way.
>>
>>
>>
>>
>>
>> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
>>
>>> Internally we still use int[] when we send partitions (see
>>> GridH2QueryRequest.parts). It looks like we only do more work with
>>> PartitionSet.
>>>
>>> I like the idea of bitset for partitions, but
>>>
>>> 1. We have to change internals first to use it, otherwise the
>>> optimization
>>> makes no sense.
>>> 2. We will need to have a method SqlQuery.setPartitions(int... parts) for
>>> usability reasons anyways.
>>>
>>> Thus I suggest for now to go the straightforward way with int[] and
>>> create
>>> a separate ticket describing the optimization with bitset.
>>>
>>> Sergi
>>>
>>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
>>> alexey.scherbakoff@gmail.com>:
>>>
>>> > PartitionSet hides internal implementation of int array.
>>> >
>>> > This allows as to efficiently represent contiguous range of partitions
>>> and
>>> > defines clear API for ordered iteration over partitions and containment
>>> > check.
>>> >
>>> > Even better to go with compressed bitmap, as I mentioned in ticket
>>> comment.
>>> > This will allow us to minimize heap footprint for this object.
>>> >
>>> > Moreover, it will be useful to create reusable compressed bitmap
>>> > implementation in Ignite and use it in other cases, on example, for
>>> > replacing H2's IntArray and Set<Integer>.
>>> >
>>> > Should I create a ticket for this ?
>>> >
>>> > .
>>> >
>>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>> >
>>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
>>> > sergi.vladykin@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > I looked at the code. The PartitionSet concept looks
>>> overengineered to
>>> > > me,
>>> > > > why wouldn't we just go with int[]?
>>> > > >
>>> > >
>>> > > Agree.
>>> > >
>>> > >
>>> > > >
>>> > > > Sergi
>>> > > >
>>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
>>> > > alexey.scherbakoff@gmail.com
>>> > > > >:
>>> > > >
>>> > > > > Done.
>>> > > > >
>>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
>>> dsetrakyan@apache.org
>>> > >:
>>> > > > >
>>> > > > > > Alexey,
>>> > > > > >
>>> > > > > > I am not sure I am seeing the API changes documented in the
>>> ticket.
>>> > > Can
>>> > > > > you
>>> > > > > > please either document them or add GIT links for the new
>>> classes?
>>> > > > > >
>>> > > > > > D.
>>> > > > > >
>>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
>>> > > > > > alexey.scherbakoff@gmail.com> wrote:
>>> > > > > >
>>> > > > > > > Guys,
>>> > > > > > >
>>> > > > > > > I've just submitted a PR for
>>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
>>> > > > > > >
>>> > > > > > > Please review API changes while waiting for TC results.
>>> > > > > > >
>>> > > > > > > --
>>> > > > > > >
>>> > > > > > > Best regards,
>>> > > > > > > Alexei Scherbakov
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > >
>>> > > > > Best regards,
>>> > > > > Alexei Scherbakov
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Best regards,
>>> > Alexei Scherbakov
>>> >
>>>
>>
>>
>>
>> --
>>
>> Best regards,
>> Alexei Scherbakov
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Guys,

I've finished adding API changes and implemented proper nodes routing.

Currently it doesn't work with distributed queries.But I think this feature
should be compatible with it.

Could anyone take a look at current branch state while I'm looking deeper
into dsitributed queries code?

Issue: https://issues.apache.org/jira/browse/IGNITE-4523
PR: https://github.com/apache/ignite/pull/1418



2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> OK, let's do it this way.
>
>
>
>
>
> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <se...@gmail.com>:
>
>> Internally we still use int[] when we send partitions (see
>> GridH2QueryRequest.parts). It looks like we only do more work with
>> PartitionSet.
>>
>> I like the idea of bitset for partitions, but
>>
>> 1. We have to change internals first to use it, otherwise the optimization
>> makes no sense.
>> 2. We will need to have a method SqlQuery.setPartitions(int... parts) for
>> usability reasons anyways.
>>
>> Thus I suggest for now to go the straightforward way with int[] and create
>> a separate ticket describing the optimization with bitset.
>>
>> Sergi
>>
>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
>> alexey.scherbakoff@gmail.com>:
>>
>> > PartitionSet hides internal implementation of int array.
>> >
>> > This allows as to efficiently represent contiguous range of partitions
>> and
>> > defines clear API for ordered iteration over partitions and containment
>> > check.
>> >
>> > Even better to go with compressed bitmap, as I mentioned in ticket
>> comment.
>> > This will allow us to minimize heap footprint for this object.
>> >
>> > Moreover, it will be useful to create reusable compressed bitmap
>> > implementation in Ignite and use it in other cases, on example, for
>> > replacing H2's IntArray and Set<Integer>.
>> >
>> > Should I create a ticket for this ?
>> >
>> > .
>> >
>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>> >
>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
>> > sergi.vladykin@gmail.com>
>> > > wrote:
>> > >
>> > > > I looked at the code. The PartitionSet concept looks overengineered
>> to
>> > > me,
>> > > > why wouldn't we just go with int[]?
>> > > >
>> > >
>> > > Agree.
>> > >
>> > >
>> > > >
>> > > > Sergi
>> > > >
>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
>> > > alexey.scherbakoff@gmail.com
>> > > > >:
>> > > >
>> > > > > Done.
>> > > > >
>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
>> dsetrakyan@apache.org
>> > >:
>> > > > >
>> > > > > > Alexey,
>> > > > > >
>> > > > > > I am not sure I am seeing the API changes documented in the
>> ticket.
>> > > Can
>> > > > > you
>> > > > > > please either document them or add GIT links for the new
>> classes?
>> > > > > >
>> > > > > > D.
>> > > > > >
>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
>> > > > > > alexey.scherbakoff@gmail.com> wrote:
>> > > > > >
>> > > > > > > Guys,
>> > > > > > >
>> > > > > > > I've just submitted a PR for
>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
>> > > > > > >
>> > > > > > > Please review API changes while waiting for TC results.
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Best regards,
>> > > > > > > Alexei Scherbakov
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > > Best regards,
>> > > > > Alexei Scherbakov
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>> >
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

OK, let's do it this way.





2017-01-13 13:27 GMT+03:00 Sergi Vladykin <se...@gmail.com>:

> Internally we still use int[] when we send partitions (see
> GridH2QueryRequest.parts). It looks like we only do more work with
> PartitionSet.
>
> I like the idea of bitset for partitions, but
>
> 1. We have to change internals first to use it, otherwise the optimization
> makes no sense.
> 2. We will need to have a method SqlQuery.setPartitions(int... parts) for
> usability reasons anyways.
>
> Thus I suggest for now to go the straightforward way with int[] and create
> a separate ticket describing the optimization with bitset.
>
> Sergi
>
> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <alexey.scherbakoff@gmail.com
> >:
>
> > PartitionSet hides internal implementation of int array.
> >
> > This allows as to efficiently represent contiguous range of partitions
> and
> > defines clear API for ordered iteration over partitions and containment
> > check.
> >
> > Even better to go with compressed bitmap, as I mentioned in ticket
> comment.
> > This will allow us to minimize heap footprint for this object.
> >
> > Moreover, it will be useful to create reusable compressed bitmap
> > implementation in Ignite and use it in other cases, on example, for
> > replacing H2's IntArray and Set<Integer>.
> >
> > Should I create a ticket for this ?
> >
> > .
> >
> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> > sergi.vladykin@gmail.com>
> > > wrote:
> > >
> > > > I looked at the code. The PartitionSet concept looks overengineered
> to
> > > me,
> > > > why wouldn't we just go with int[]?
> > > >
> > >
> > > Agree.
> > >
> > >
> > > >
> > > > Sergi
> > > >
> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com
> > > > >:
> > > >
> > > > > Done.
> > > > >
> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > > > >
> > > > > > Alexey,
> > > > > >
> > > > > > I am not sure I am seeing the API changes documented in the
> ticket.
> > > Can
> > > > > you
> > > > > > please either document them or add GIT links for the new classes?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > > >
> > > > > > > Guys,
> > > > > > >
> > > > > > > I've just submitted a PR for
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > > > > >
> > > > > > > Please review API changes while waiting for TC results.
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Alexei Scherbakov
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Sergi Vladykin <se...@gmail.com>.

Internally we still use int[] when we send partitions (see
GridH2QueryRequest.parts). It looks like we only do more work with
PartitionSet.

I like the idea of bitset for partitions, but

1. We have to change internals first to use it, otherwise the optimization
makes no sense.
2. We will need to have a method SqlQuery.setPartitions(int... parts) for
usability reasons anyways.

Thus I suggest for now to go the straightforward way with int[] and create
a separate ticket describing the optimization with bitset.

Sergi

2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> PartitionSet hides internal implementation of int array.
>
> This allows as to efficiently represent contiguous range of partitions and
> defines clear API for ordered iteration over partitions and containment
> check.
>
> Even better to go with compressed bitmap, as I mentioned in ticket comment.
> This will allow us to minimize heap footprint for this object.
>
> Moreover, it will be useful to create reusable compressed bitmap
> implementation in Ignite and use it in other cases, on example, for
> replacing H2's IntArray and Set<Integer>.
>
> Should I create a ticket for this ?
>
> .
>
> 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> > > I looked at the code. The PartitionSet concept looks overengineered to
> > me,
> > > why wouldn't we just go with int[]?
> > >
> >
> > Agree.
> >
> >
> > >
> > > Sergi
> > >
> > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com
> > > >:
> > >
> > > > Done.
> > > >
> > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org
> >:
> > > >
> > > > > Alexey,
> > > > >
> > > > > I am not sure I am seeing the API changes documented in the ticket.
> > Can
> > > > you
> > > > > please either document them or add GIT links for the new classes?
> > > > >
> > > > > D.
> > > > >
> > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > > > alexey.scherbakoff@gmail.com> wrote:
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > I've just submitted a PR for
> > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > > > >
> > > > > > Please review API changes while waiting for TC results.
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Alexei Scherbakov
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

PartitionSet hides internal implementation of int array.

This allows as to efficiently represent contiguous range of partitions and
defines clear API for ordered iteration over partitions and containment
check.

Even better to go with compressed bitmap, as I mentioned in ticket comment.
This will allow us to minimize heap footprint for this object.

Moreover, it will be useful to create reusable compressed bitmap
implementation in Ignite and use it in other cases, on example, for
replacing H2's IntArray and Set<Integer>.

Should I create a ticket for this ?

.

2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > I looked at the code. The PartitionSet concept looks overengineered to
> me,
> > why wouldn't we just go with int[]?
> >
>
> Agree.
>
>
> >
> > Sergi
> >
> > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
> alexey.scherbakoff@gmail.com
> > >:
> >
> > > Done.
> > >
> > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> > >
> > > > Alexey,
> > > >
> > > > I am not sure I am seeing the API changes documented in the ticket.
> Can
> > > you
> > > > please either document them or add GIT links for the new classes?
> > > >
> > > > D.
> > > >
> > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > > alexey.scherbakoff@gmail.com> wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > I've just submitted a PR for
> > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > > >
> > > > > Please review API changes while waiting for TC results.
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Alexei Scherbakov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>

-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <se...@gmail.com>
wrote:

> I looked at the code. The PartitionSet concept looks overengineered to me,
> why wouldn't we just go with int[]?
>

Agree.


>
> Sergi
>
> 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <alexey.scherbakoff@gmail.com
> >:
>
> > Done.
> >
> > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > Alexey,
> > >
> > > I am not sure I am seeing the API changes documented in the ticket. Can
> > you
> > > please either document them or add GIT links for the new classes?
> > >
> > > D.
> > >
> > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > > alexey.scherbakoff@gmail.com> wrote:
> > >
> > > > Guys,
> > > >
> > > > I've just submitted a PR for
> > > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > > >
> > > > Please review API changes while waiting for TC results.
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > > >
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Sergi Vladykin <se...@gmail.com>.

I looked at the code. The PartitionSet concept looks overengineered to me,
why wouldn't we just go with int[]?

Sergi

2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> Done.
>
> 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > Alexey,
> >
> > I am not sure I am seeing the API changes documented in the ticket. Can
> you
> > please either document them or add GIT links for the new classes?
> >
> > D.
> >
> > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> > alexey.scherbakoff@gmail.com> wrote:
> >
> > > Guys,
> > >
> > > I've just submitted a PR for
> > > https://issues.apache.org/jira/browse/IGNITE-4523.
> > >
> > > Please review API changes while waiting for TC results.
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Alexei Scherbakov <al...@gmail.com>.

Done.

2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Alexey,
>
> I am not sure I am seeing the API changes documented in the ticket. Can you
> please either document them or add GIT links for the new classes?
>
> D.
>
> On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
> > Guys,
> >
> > I've just submitted a PR for
> > https://issues.apache.org/jira/browse/IGNITE-4523.
> >
> > Please review API changes while waiting for TC results.
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Alexey,

I am not sure I am seeing the API changes documented in the ticket. Can you
please either document them or add GIT links for the new classes?

D.

On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Guys,
>
> I've just submitted a PR for
> https://issues.apache.org/jira/browse/IGNITE-4523.
>
> Please review API changes while waiting for TC results.
>
> --
>
> Best regards,
> Alexei Scherbakov
>