You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Vikash Talanki <vi...@gmail.com> on 2016/10/21 11:27:08 UTC

Avoid Full Scan of RHS table in Left Outer Join

Hi All,

I am running a query that has multiple(4) Left outer joins. I have 1 main
table which is around 400K records with 750 columns. There are 2 more
tables and I'll call them as code tables.
My query has 3 left outer joins of main table to same code table1 and 1
left outer join of main table to code table 2.
My code table 2 is a huge table. Almost ~310 million records.

Here is a sample query:(I cannot post my original query. So I created a
pseudo one)
Select mt.col1, mt.col2,...... from Main Table mt
left outer join code_table_1 ct11 on mt.col4=ct11.col1
left outer join code_table_1 ct12 on mt.col5=ct12.col2
left outer join code_table_1 ct13 on mt.col6=ct13.col3
left outer join code_table_2 ct2 on mt.col7=ct2.col1
where <condition 1>, <condition 2>..........

When I explain on this query, its running 4 parallel left joins with full
scans on code_table_1 and code_table_2.
Since I didn't mention anything, it is doing a hash join and trying to load
the code tables(RHSs) into memory. I am getting the cache limit issue here
since my code_table_2 is almost 9 GB and it cannot load into cache.

When I give it a hint of USE_SORT_MERGE_JOIN to avoid hash joins, it still
does the full scan of both code tables but returns me a result in 2.5 secs.
there is no cache exception here as it is not loading tables into memory.

Here are my questions now:
1. Is it possible to avoid full scans on RHS tables in Left outer joins?
From the query plan, I understood that the where filters are applied before
the joins which reduces my number records by huge margin. Then why should
it still do a full scan on huge RHS table? My intention here is not to
avoid loading into memory, I dont want a full scan to take place on a huge
310M table. The only reason I used USE_SORT_MERGE_JOIN is to make my query
run successfully.

2. On using USE_SORT_MERGE_JOIN, I understood it does a sort on both tables
in the join and then merges them. is sorting on a huge table not a costly
operation? I got my result in 2.5 secs which I didnt expect as it needs to
sort a huge table.

3. Why am I not able to take advantage with NO_CHILD_PARENT_OPTIMIZATION
here? on using this hint it still shows be full scans. my col7 in Main
table is a foreign key to col1 primary key in code table 2.

It would be really helpful if anyone can help me understand answers for
these questions. Sorry for a very big post.

-- 
Regards...

Vikash Talanki,
+1 408.203.2151