You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Semyon Boikov <sb...@apache.org> on 2016/06/30 16:59:01 UTC
Remaining distributed joins issues

Hi,

We get back to 'distributed join' feature. I resurrected branch
'ignite-1232', did brief review of changes, added more tests and analysed
some failures. Here are issues identified so far, *Sergi*, could you please
comment:

- it seems current code does not properly handle case when custom
AffinityKeyMapper is set. As I understand when custom AffinityKeyMapper is
used, then it is not possible to reason about affinity key field?

- incorrect result when join partitioned and replicated caches. Schema is:
Account -> Person -> Organization. Person is REPLICATED cache.
Plan for this join query 'from Organization o, Person p, Account a where
p.orgId = o._key and p._key = a.personId':
SELECT
    O.NAME AS __C0,
    P._KEY AS __C1,
    P.NAME AS __C2,
    A.NAME AS __C3
FROM "org".ORGANIZATIONO
    /* "org".ORGANIZATION.__SCAN_ */
INNER JOIN "person".PERSON P
    /* *batched:broadcast* "person".PERSON.__SCAN_ */
    ON 1=1
    /* WHERE P.ORGID = O._KEY
    */
INNER JOIN "acc".ACCOUNT A
    /* batched:broadcast "acc".ACCOUNT.__SCAN_ */
    ON 1=1
WHERE (P.ORGID = O._KEY)
    AND (P._KEY = A.PERSONID)

Look like here distributed join is erroneously used for
'ORGANIZATION/PERSON' join, regual local join can be used since PERSON in
replicated cache. As I understand DistributedLookupBatch should never be
used to get data from index on REPLICATED cache? I added minimal test to
reproduce this: IgniteCacheJoinPartitionedAndReplicatedTest.

- I noticed that a lot of data can be transferred when join condition does
not use index. For example there are Person[ogrName] and Organization[name]
caches (there are no indexes on orgName, name), try join 'Organization o,
Person p where o.name=p.name'.If on some node there are 5 Organizations
then DistributedLookupBatch.addSearchRows(SearchRow firstRow, SearchRow
lastRow) will be called 5 times with parameters firstRow=null, lastRow=null
(which means 'get all rows from index'). As result GridH2IndexRangeRequest
created by this DistributedLookupBatch will contain 5 identical
GridH2RowRangeBounds[first=null, last=null] and related
GridH2IndexRangeResponse will contain 5 identical GridH2RowRanges
containing all index rows. It seems this should be somehow optimized?
I added test with such model and query - IgniteCacheJoinNoIndexTest (test
pass, but can be used to debug query requests/response).

- another observation related to
DistributedLookupBatch.addSearchRows(SearchRow firstRow, SearchRow
lastRow): it is possible that this method will be called multiple times
with the same parameters (for example when 'join Person, Organization where
p.ordId=o.id', then 'addSearchRows' can be called for the same 'orgId'
multiple times). Does it make sense to try optimize this?

- our regular sql queries benchmarks show performance drop ~6%, I'll try to
investigate

Thanks!