You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Peter Edike <pe...@interswitchgroup.com> on 2018/08/06 16:00:53 UTC
This query is so slow
Hello all,
The following query takes ages to complete in drill and more often that not, fails
select * from dfs.`/iswdata/rj/201805` where unique_key in
(
select unique_key from dfs.`/iswdata/rj/201805` group by unique_key having count(unique_key) > 1
)
limit 40
Please what can I do to improve the performance of this query
Kind Regards
________________________________
Peter Edike
Senior Software Engineer
Research and Development, ENG
Engineering
[cid:image001.png@01D42DA7.101866D0]
Office NO:
Mobile NO:
Email: peter.edike@interswitchgroup.com<ma...@interswitchgroup.com>
Interswitch Limited
1648C Oko-Awo Street, Victoria Island Lagos
Customer Contact Centre 0700-9065000
? http://www.interswitchgroup.com<http://www.interswitchgroup.com/>
[cid:image003.png@01D42DA7.101866D0]<https://www.quickteller.com/loan-request>
This e-mail and all attachments transmitted with it remain the property of Interswitch Limited , the information contained herein are private confidential and intended solely for the use of the addressee. If you have received this e-mail in error, kindly notify the sender. If you are not the addressee, you should not disseminate, distribute or copy this e-mail. Kindly notify Interswitch immediately by email if you have received this email in error and delete this email and any attachment from your system Emails cannot be guaranteed to be secure or error free as the message and any attachments could be intercepted, corrupted, lost, delayed, incomplete or amended. the contents of this email or its attachments have been scanned for all viruses and all reasonable measures have been taken to ensure that no viruses are present. Interswitch Limited and its subsidiaries do not accept liability for damage caused by this email or any attachments.
Re: This query is so slow
Posted by Nitin Pawar <ni...@gmail.com>.
Depending how many unique keys are there it can cause issues
can you try following query with tweaking settings around hash_join
select * from dfs.`/iswdata/rj/201805` t1 join (select unique_key,
count(unique_key) as total from dfs.`/iswdata/rj/201805` group by
unique_key)t on t1.unique_key = t.unique_key where t.total > 1
On Tue, Aug 7, 2018 at 9:23 PM, Peter Edike <
peter.edike@interswitchgroup.com> wrote:
> Did an explain plan on the query since the query never completes
> successfully (I can't see the profile because of that), Here is the result
> of the explain plan query
>
> "head":{
> "version":1,
> "generator":{
> "type":"ExplainHandler",
> "info":""
> },
> "type":"APACHE_DRILL_PHYSICAL",
> "options":[
>
> ],
> "queue":0,
> "hasResourcePlan":false,
> "resultMode":"EXEC"
> },
> "graph":[
> {
> "pop":"parquet-scan",
> "@id":327683,
> "userName":"mapr",
> "entries":[
> {
> "path":"maprfs:///iswdata/rj"
> }
> ],
> "storage":{
> "type":"file",
> "enabled":true,
> "connection":"maprfs:///",
> "config":null,
> "workspaces":{
> "root":{
> "location":"/",
> "writable":false,
> "defaultInputFormat":null,
> "allowAccessOutsideWorkspace":false
> },
> "tmp":{
> "location":"/tmp",
> "writable":true,
> "defaultInputFormat":null,
> "allowAccessOutsideWorkspace":false
> }
> },
> "formats":{
> "psv":{
> "type":"text",
> "extensions":[
> "tbl"
> ],
> "delimiter":"|"
> },
> "csv":{
> "type":"text",
> "extensions":[
> "csv"
> ],
> "delimiter":","
> },
> "tsv":{
> "type":"text",
> "extensions":[
> "tsv"
> ],
> "delimiter":"\t"
> },
> "parquet":{
> "type":"parquet"
> },
> "json":{
> "type":"json",
> "extensions":[
> "json"
> ]
> },
> "maprdb":{
> "type":"maprdb"
> }
> }
> },
> "format":{
> "type":"parquet"
> },
> "columns":[
> "`unique_key`"
> ],
> "selectionRoot":"maprfs:/iswdata/rj",
> "filter":"true",
> "fileSet":[
> "/iswdata/rj/d43b262e-b5bf-4b70-b891-03e5403186aa.parquet",
> "/iswdata/rj/7a3a0be9-4ab7-42e7-b5a4-da15ae6cbd8e.parquet",
> "/iswdata/rj/f4ecfe34-0a96-4582-bc0e-569486015bc2.parquet",
> "/iswdata/rj/438b1548-89fb-4eb4-8499-982cebcff80b.parquet",
> "/iswdata/rj/07a60a9e-46f0-4274-82c8-4ea46aaf10bb.parquet",
> "/iswdata/rj/57eb07ef-63b4-4b43-83df-719bcf10e364.parquet",
> "/iswdata/rj/8248c70a-3579-4166-a4ce-707db8e4960c.parquet",
> "/iswdata/rj/a018b4d7-6891-48e6-958a-8239ee0c0d64.parquet",
> "/iswdata/rj/8ff9f8fd-8631-4283-b3ce-e2f90a89e5bc.parquet"
> ],
> "cost":4.73824196E8
> },
> {
> "pop":"hash-aggregate",
> "@id":327682,
> "child":327683,
> "cardinality":1.0,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "aggPhase":"PHASE_1of2",
> "groupByExprs":[
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> }
> ],
> "aggrExprs":[
> {
> "ref":"`$f1`",
> "expr":"count(`unique_key`) "
> }
> ],
> "cost":2.36912098E8
> },
> {
> "pop":"project",
> "@id":327681,
> "exprs":[
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> },
> {
> "ref":"`$f1`",
> "expr":"`$f1`"
> },
> {
> "ref":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "expr":"hash32asdouble(`unique_key`, 1301011) "
> }
> ],
> "child":327682,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E7
> },
> {
> "pop":"unordered-mux-exchange",
> "@id":196609,
> "child":327681,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E7
> },
> {
> "pop":"hash-to-random-exchange",
> "@id":65548,
> "child":196609,
> "expr":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E7
> },
> {
> "pop":"project",
> "@id":65547,
> "exprs":[
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> },
> {
> "ref":"`$f1`",
> "expr":"`$f1`"
> }
> ],
> "child":65548,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E7
> },
> {
> "pop":"hash-aggregate",
> "@id":65546,
> "child":65547,
> "cardinality":1.0,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "aggPhase":"PHASE_2of2",
> "groupByExprs":[
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> }
> ],
> "aggrExprs":[
> {
> "ref":"`$f1`",
> "expr":"$sum0(`$f1`) "
> }
> ],
> "cost":2.36912098E7
> },
> {
> "pop":"filter",
> "@id":65545,
> "child":65546,
> "expr":"greater_than(`$f1`, 1) ",
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":2369120.98
> },
> {
> "pop":"selection-vector-remover",
> "@id":65544,
> "child":65545,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":2369120.98
> },
> {
> "pop":"project",
> "@id":65542,
> "exprs":[
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> }
> ],
> "child":65544,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":2369120.98
> },
> {
> "pop":"project",
> "@id":65540,
> "exprs":[
> {
> "ref":"`unique_key0`",
> "expr":"`unique_key`"
> }
> ],
> "child":65542,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":2369120.98
> },
> {
> "pop":"parquet-scan",
> "@id":262147,
> "userName":"mapr",
> "entries":[
> {
> "path":"maprfs:///iswdata/rj"
> }
> ],
> "storage":{
> "type":"file",
> "enabled":true,
> "connection":"maprfs:///",
> "config":null,
> "workspaces":{
> "root":{
> "location":"/",
> "writable":false,
> "defaultInputFormat":null,
> "allowAccessOutsideWorkspace":false
> },
> "tmp":{
> "location":"/tmp",
> "writable":true,
> "defaultInputFormat":null,
> "allowAccessOutsideWorkspace":false
> }
> },
> "formats":{
> "psv":{
> "type":"text",
> "extensions":[
> "tbl"
> ],
> "delimiter":"|"
> },
> "csv":{
> "type":"text",
> "extensions":[
> "csv"
> ],
> "delimiter":","
> },
> "tsv":{
> "type":"text",
> "extensions":[
> "tsv"
> ],
> "delimiter":"\t"
> },
> "parquet":{
> "type":"parquet"
> },
> "json":{
> "type":"json",
> "extensions":[
> "json"
> ]
> },
> "maprdb":{
> "type":"maprdb"
> }
> }
> },
> "format":{
> "type":"parquet"
> },
> "columns":[
> "`**`"
> ],
> "selectionRoot":"maprfs:/iswdata/rj",
> "filter":"true",
> "fileSet":[
> "/iswdata/rj/d43b262e-b5bf-4b70-b891-03e5403186aa.parquet",
> "/iswdata/rj/7a3a0be9-4ab7-42e7-b5a4-da15ae6cbd8e.parquet",
> "/iswdata/rj/f4ecfe34-0a96-4582-bc0e-569486015bc2.parquet",
> "/iswdata/rj/438b1548-89fb-4eb4-8499-982cebcff80b.parquet",
> "/iswdata/rj/07a60a9e-46f0-4274-82c8-4ea46aaf10bb.parquet",
> "/iswdata/rj/57eb07ef-63b4-4b43-83df-719bcf10e364.parquet",
> "/iswdata/rj/8248c70a-3579-4166-a4ce-707db8e4960c.parquet",
> "/iswdata/rj/a018b4d7-6891-48e6-958a-8239ee0c0d64.parquet",
> "/iswdata/rj/8ff9f8fd-8631-4283-b3ce-e2f90a89e5bc.parquet"
> ],
> "cost":4.73824196E8
> },
> {
> "pop":"project",
> "@id":262146,
> "exprs":[
> {
> "ref":"`T0¦¦**`",
> "expr":"`**`"
> },
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> }
> ],
> "child":262147,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"project",
> "@id":262145,
> "exprs":[
> {
> "ref":"`T0¦¦**`",
> "expr":"`T0¦¦**`"
> },
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> },
> {
> "ref":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "expr":"hash32asdouble(`unique_key`, 1301011) "
> }
> ],
> "child":262146,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"unordered-mux-exchange",
> "@id":131073,
> "child":262145,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"hash-to-random-exchange",
> "@id":65543,
> "child":131073,
> "expr":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"project",
> "@id":65541,
> "exprs":[
> {
> "ref":"`T0¦¦**`",
> "expr":"`T0¦¦**`"
> },
> {
> "ref":"`unique_key`",
> "expr":"`unique_key`"
> }
> ],
> "child":65543,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"hash-join",
> "@id":65539,
> "left":65541,
> "right":65540,
> "conditions":[
> {
> "relationship":"EQUALS",
> "left":"`unique_key`",
> "right":"`unique_key0`"
> }
> ],
> "joinType":"INNER",
> "isRowKeyJoin":false,
> "joinControl":0,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":4.73824196E8
> },
> {
> "pop":"limit",
> "@id":65538,
> "child":65539,
> "first":0,
> "last":40,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"selection-vector-remover",
> "@id":65537,
> "child":65538,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"union-exchange",
> "@id":5,
> "child":65537,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"limit",
> "@id":4,
> "child":5,
> "first":0,
> "last":40,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"selection-vector-remover",
> "@id":3,
> "child":4,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"project",
> "@id":2,
> "exprs":[
> {
> "ref":"`T0¦¦**`",
> "expr":"`T0¦¦**`"
> }
> ],
> "child":3,
> "outputProj":true,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"project",
> "@id":1,
> "exprs":[
> {
> "ref":"`**`",
> "expr":"`T0¦¦**`"
> }
> ],
> "child":2,
> "outputProj":false,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> },
> {
> "pop":"screen",
> "@id":0,
> "child":1,
> "initialAllocation":1000000,
> "maxAllocation":10000000000,
> "cost":40.0
> }
> ]
> }
>
> Please what exactly Am I looking for as to why it never completes. I can
> see outofmemory errors but then Drill is running with 16gb on each node and
> there are five nodes in the cluster
>
> Best regards,
> Peter Edike
>
> Senior Software Engineer
> Interswitch
>
> Tel. | Mobile. | IP Phone.
> Fax. | mailto:peter.edike@interswitchgroup.com | http://
>
> http://www.interswitchgroup.com
>
> InterswitchThis e-mail and all attachments transmitted with it remain the
> property of InterSwitch Limited , the information contained herein are
> private confidential and intended solely for the use of the addressee. If
> you have received this e-mail in error, kindly notify the sender. If you
> are not the addressee, you should not disseminate, distribute or copy this
> e-mail. Kindly notify InterSwitch immediately by email if you have received
> this email in error and delete this email and any attachment from your
> system Emails cannot be guaranteed to be secure or error free as the
> message and any attachments could be intercepted, corrupted, lost, delayed,
> incomplete or amended. the contents of this email or its attachments have
> been scanned for all viruses and all reasonable measures have been taken to
> ensure that no viruses are present. InterSwitch Limited and its
> subsidiaries do not accept liability for damage caused by this email or any
> attachments.
>
>
>
> This message has been marked as CONFIDENTIAL on Tuesday, August 7, 2018 @
> 4:54:16 PM
>
> -----Original Message-----
> From: Kunal Khatua <ku...@apache.org>
> Sent: Monday, August 6, 2018 6:41 PM
> To: user@drill.apache.org
> Subject: Re: This query is so slow
>
> Hi Peter
>
> What does the profile for the query indicate?
>
> Take a look at the operator overview. It will indicate which operator is
> using the most CPU cycles. If the average and max processing times vary
> wildly, it might be a problem of skew, where some fragments are doing a
> relatively excessive work.
> If that is the case, within that operator's profile segment, you can see
> the distribution of the fragments and see which ones are the long pole in
> your run.
>
> ~ KK
>
> On 8/6/2018 9:01:25 AM, Peter Edike <pe...@interswitchgroup.com>
> wrote:
> Hello all,
>
> The following query takes ages to complete in drill and more often that
> not, fails
>
> select * from dfs.`/iswdata/rj/201805` where unique_key in
> (
> select unique_key from dfs.`/iswdata/rj/201805` group by
> unique_key having count(unique_key) > 1
> )
> limit 40
>
> Please what can I do to improve the performance of this query
>
> Kind Regards
>
> Peter Edike
> Senior Software Engineer
> Research and Development, ENG
> Engineering
>
> Office NO:
> Mobile NO:
> Email: peter.edike@interswitchgroup.com [mailto:peter.edike@
> interswitchgroup.com]
> Interswitch Limited
> 1648C Oko-Awo Street, Victoria Island Lagos Customer Contact Centre
> 0700-9065000 ü http://www.interswitchgroup.com [
> http://www.interswitchgroup.com/]
> [https://www.quickteller.com/loan-request]
> This e-mail and all attachments transmitted with it remain the property
> of Interswitch Limited , the information contained herein are private
> confidential and intended solely for the use of the addressee. If you have
> received this e-mail in error, kindly notify the sender. If you are not the
> addressee, you should not disseminate, distribute or copy this e-mail.
> Kindly notify Interswitch immediately by email if you have received this
> email in error and delete this email and any attachment from your system
> Emails cannot be guaranteed to be secure or error free as the message and
> any attachments could be intercepted, corrupted, lost, delayed, incomplete
> or amended. the contents of this email or its attachments have been scanned
> for all viruses and all reasonable measures have been taken to ensure that
> no viruses are present. Interswitch Limited and its subsidiaries do not
> accept liability for damage caused by this email or any attachments.
>
--
Nitin Pawar
RE: This query is so slow
Posted by Peter Edike <pe...@interswitchgroup.com>.
Did an explain plan on the query since the query never completes successfully (I can't see the profile because of that), Here is the result of the explain plan query
"head":{
"version":1,
"generator":{
"type":"ExplainHandler",
"info":""
},
"type":"APACHE_DRILL_PHYSICAL",
"options":[
],
"queue":0,
"hasResourcePlan":false,
"resultMode":"EXEC"
},
"graph":[
{
"pop":"parquet-scan",
"@id":327683,
"userName":"mapr",
"entries":[
{
"path":"maprfs:///iswdata/rj"
}
],
"storage":{
"type":"file",
"enabled":true,
"connection":"maprfs:///",
"config":null,
"workspaces":{
"root":{
"location":"/",
"writable":false,
"defaultInputFormat":null,
"allowAccessOutsideWorkspace":false
},
"tmp":{
"location":"/tmp",
"writable":true,
"defaultInputFormat":null,
"allowAccessOutsideWorkspace":false
}
},
"formats":{
"psv":{
"type":"text",
"extensions":[
"tbl"
],
"delimiter":"|"
},
"csv":{
"type":"text",
"extensions":[
"csv"
],
"delimiter":","
},
"tsv":{
"type":"text",
"extensions":[
"tsv"
],
"delimiter":"\t"
},
"parquet":{
"type":"parquet"
},
"json":{
"type":"json",
"extensions":[
"json"
]
},
"maprdb":{
"type":"maprdb"
}
}
},
"format":{
"type":"parquet"
},
"columns":[
"`unique_key`"
],
"selectionRoot":"maprfs:/iswdata/rj",
"filter":"true",
"fileSet":[
"/iswdata/rj/d43b262e-b5bf-4b70-b891-03e5403186aa.parquet",
"/iswdata/rj/7a3a0be9-4ab7-42e7-b5a4-da15ae6cbd8e.parquet",
"/iswdata/rj/f4ecfe34-0a96-4582-bc0e-569486015bc2.parquet",
"/iswdata/rj/438b1548-89fb-4eb4-8499-982cebcff80b.parquet",
"/iswdata/rj/07a60a9e-46f0-4274-82c8-4ea46aaf10bb.parquet",
"/iswdata/rj/57eb07ef-63b4-4b43-83df-719bcf10e364.parquet",
"/iswdata/rj/8248c70a-3579-4166-a4ce-707db8e4960c.parquet",
"/iswdata/rj/a018b4d7-6891-48e6-958a-8239ee0c0d64.parquet",
"/iswdata/rj/8ff9f8fd-8631-4283-b3ce-e2f90a89e5bc.parquet"
],
"cost":4.73824196E8
},
{
"pop":"hash-aggregate",
"@id":327682,
"child":327683,
"cardinality":1.0,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"aggPhase":"PHASE_1of2",
"groupByExprs":[
{
"ref":"`unique_key`",
"expr":"`unique_key`"
}
],
"aggrExprs":[
{
"ref":"`$f1`",
"expr":"count(`unique_key`) "
}
],
"cost":2.36912098E8
},
{
"pop":"project",
"@id":327681,
"exprs":[
{
"ref":"`unique_key`",
"expr":"`unique_key`"
},
{
"ref":"`$f1`",
"expr":"`$f1`"
},
{
"ref":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
"expr":"hash32asdouble(`unique_key`, 1301011) "
}
],
"child":327682,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E7
},
{
"pop":"unordered-mux-exchange",
"@id":196609,
"child":327681,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E7
},
{
"pop":"hash-to-random-exchange",
"@id":65548,
"child":196609,
"expr":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E7
},
{
"pop":"project",
"@id":65547,
"exprs":[
{
"ref":"`unique_key`",
"expr":"`unique_key`"
},
{
"ref":"`$f1`",
"expr":"`$f1`"
}
],
"child":65548,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E7
},
{
"pop":"hash-aggregate",
"@id":65546,
"child":65547,
"cardinality":1.0,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"aggPhase":"PHASE_2of2",
"groupByExprs":[
{
"ref":"`unique_key`",
"expr":"`unique_key`"
}
],
"aggrExprs":[
{
"ref":"`$f1`",
"expr":"$sum0(`$f1`) "
}
],
"cost":2.36912098E7
},
{
"pop":"filter",
"@id":65545,
"child":65546,
"expr":"greater_than(`$f1`, 1) ",
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":2369120.98
},
{
"pop":"selection-vector-remover",
"@id":65544,
"child":65545,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":2369120.98
},
{
"pop":"project",
"@id":65542,
"exprs":[
{
"ref":"`unique_key`",
"expr":"`unique_key`"
}
],
"child":65544,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":2369120.98
},
{
"pop":"project",
"@id":65540,
"exprs":[
{
"ref":"`unique_key0`",
"expr":"`unique_key`"
}
],
"child":65542,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":2369120.98
},
{
"pop":"parquet-scan",
"@id":262147,
"userName":"mapr",
"entries":[
{
"path":"maprfs:///iswdata/rj"
}
],
"storage":{
"type":"file",
"enabled":true,
"connection":"maprfs:///",
"config":null,
"workspaces":{
"root":{
"location":"/",
"writable":false,
"defaultInputFormat":null,
"allowAccessOutsideWorkspace":false
},
"tmp":{
"location":"/tmp",
"writable":true,
"defaultInputFormat":null,
"allowAccessOutsideWorkspace":false
}
},
"formats":{
"psv":{
"type":"text",
"extensions":[
"tbl"
],
"delimiter":"|"
},
"csv":{
"type":"text",
"extensions":[
"csv"
],
"delimiter":","
},
"tsv":{
"type":"text",
"extensions":[
"tsv"
],
"delimiter":"\t"
},
"parquet":{
"type":"parquet"
},
"json":{
"type":"json",
"extensions":[
"json"
]
},
"maprdb":{
"type":"maprdb"
}
}
},
"format":{
"type":"parquet"
},
"columns":[
"`**`"
],
"selectionRoot":"maprfs:/iswdata/rj",
"filter":"true",
"fileSet":[
"/iswdata/rj/d43b262e-b5bf-4b70-b891-03e5403186aa.parquet",
"/iswdata/rj/7a3a0be9-4ab7-42e7-b5a4-da15ae6cbd8e.parquet",
"/iswdata/rj/f4ecfe34-0a96-4582-bc0e-569486015bc2.parquet",
"/iswdata/rj/438b1548-89fb-4eb4-8499-982cebcff80b.parquet",
"/iswdata/rj/07a60a9e-46f0-4274-82c8-4ea46aaf10bb.parquet",
"/iswdata/rj/57eb07ef-63b4-4b43-83df-719bcf10e364.parquet",
"/iswdata/rj/8248c70a-3579-4166-a4ce-707db8e4960c.parquet",
"/iswdata/rj/a018b4d7-6891-48e6-958a-8239ee0c0d64.parquet",
"/iswdata/rj/8ff9f8fd-8631-4283-b3ce-e2f90a89e5bc.parquet"
],
"cost":4.73824196E8
},
{
"pop":"project",
"@id":262146,
"exprs":[
{
"ref":"`T0¦¦**`",
"expr":"`**`"
},
{
"ref":"`unique_key`",
"expr":"`unique_key`"
}
],
"child":262147,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"project",
"@id":262145,
"exprs":[
{
"ref":"`T0¦¦**`",
"expr":"`T0¦¦**`"
},
{
"ref":"`unique_key`",
"expr":"`unique_key`"
},
{
"ref":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
"expr":"hash32asdouble(`unique_key`, 1301011) "
}
],
"child":262146,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"unordered-mux-exchange",
"@id":131073,
"child":262145,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"hash-to-random-exchange",
"@id":65543,
"child":131073,
"expr":"`E_X_P_R_H_A_S_H_F_I_E_L_D`",
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"project",
"@id":65541,
"exprs":[
{
"ref":"`T0¦¦**`",
"expr":"`T0¦¦**`"
},
{
"ref":"`unique_key`",
"expr":"`unique_key`"
}
],
"child":65543,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"hash-join",
"@id":65539,
"left":65541,
"right":65540,
"conditions":[
{
"relationship":"EQUALS",
"left":"`unique_key`",
"right":"`unique_key0`"
}
],
"joinType":"INNER",
"isRowKeyJoin":false,
"joinControl":0,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":4.73824196E8
},
{
"pop":"limit",
"@id":65538,
"child":65539,
"first":0,
"last":40,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"selection-vector-remover",
"@id":65537,
"child":65538,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"union-exchange",
"@id":5,
"child":65537,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"limit",
"@id":4,
"child":5,
"first":0,
"last":40,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"selection-vector-remover",
"@id":3,
"child":4,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"project",
"@id":2,
"exprs":[
{
"ref":"`T0¦¦**`",
"expr":"`T0¦¦**`"
}
],
"child":3,
"outputProj":true,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"project",
"@id":1,
"exprs":[
{
"ref":"`**`",
"expr":"`T0¦¦**`"
}
],
"child":2,
"outputProj":false,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
},
{
"pop":"screen",
"@id":0,
"child":1,
"initialAllocation":1000000,
"maxAllocation":10000000000,
"cost":40.0
}
]
}
Please what exactly Am I looking for as to why it never completes. I can see outofmemory errors but then Drill is running with 16gb on each node and there are five nodes in the cluster
Best regards,
Peter Edike
Senior Software Engineer
Interswitch
Tel. | Mobile. | IP Phone.
Fax. | mailto:peter.edike@interswitchgroup.com | http://
http://www.interswitchgroup.com
InterswitchThis e-mail and all attachments transmitted with it remain the property of InterSwitch Limited , the information contained herein are private confidential and intended solely for the use of the addressee. If you have received this e-mail in error, kindly notify the sender. If you are not the addressee, you should not disseminate, distribute or copy this e-mail. Kindly notify InterSwitch immediately by email if you have received this email in error and delete this email and any attachment from your system Emails cannot be guaranteed to be secure or error free as the message and any attachments could be intercepted, corrupted, lost, delayed, incomplete or amended. the contents of this email or its attachments have been scanned for all viruses and all reasonable measures have been taken to ensure that no viruses are present. InterSwitch Limited and its subsidiaries do not accept liability for damage caused by this email or any attachments.
This message has been marked as CONFIDENTIAL on Tuesday, August 7, 2018 @ 4:54:16 PM
-----Original Message-----
From: Kunal Khatua <ku...@apache.org>
Sent: Monday, August 6, 2018 6:41 PM
To: user@drill.apache.org
Subject: Re: This query is so slow
Hi Peter
What does the profile for the query indicate?
Take a look at the operator overview. It will indicate which operator is using the most CPU cycles. If the average and max processing times vary wildly, it might be a problem of skew, where some fragments are doing a relatively excessive work.
If that is the case, within that operator's profile segment, you can see the distribution of the fragments and see which ones are the long pole in your run.
~ KK
On 8/6/2018 9:01:25 AM, Peter Edike <pe...@interswitchgroup.com> wrote:
Hello all,
The following query takes ages to complete in drill and more often that not, fails
select * from dfs.`/iswdata/rj/201805` where unique_key in
(
select unique_key from dfs.`/iswdata/rj/201805` group by unique_key having count(unique_key) > 1
)
limit 40
Please what can I do to improve the performance of this query
Kind Regards
Peter Edike
Senior Software Engineer
Research and Development, ENG
Engineering
Office NO:
Mobile NO:
Email: peter.edike@interswitchgroup.com [mailto:peter.edike@interswitchgroup.com]
Interswitch Limited
1648C Oko-Awo Street, Victoria Island Lagos Customer Contact Centre 0700-9065000 ü http://www.interswitchgroup.com [http://www.interswitchgroup.com/]
[https://www.quickteller.com/loan-request]
This e-mail and all attachments transmitted with it remain the property of Interswitch Limited , the information contained herein are private confidential and intended solely for the use of the addressee. If you have received this e-mail in error, kindly notify the sender. If you are not the addressee, you should not disseminate, distribute or copy this e-mail. Kindly notify Interswitch immediately by email if you have received this email in error and delete this email and any attachment from your system Emails cannot be guaranteed to be secure or error free as the message and any attachments could be intercepted, corrupted, lost, delayed, incomplete or amended. the contents of this email or its attachments have been scanned for all viruses and all reasonable measures have been taken to ensure that no viruses are present. Interswitch Limited and its subsidiaries do not accept liability for damage caused by this email or any attachments.
Re: This query is so slow
Posted by Kunal Khatua <ku...@apache.org>.
Hi Peter
What does the profile for the query indicate?
Take a look at the operator overview. It will indicate which operator is using the most CPU cycles. If the average and max processing times vary wildly, it might be a problem of skew, where some fragments are doing a relatively excessive work.
If that is the case, within that operator's profile segment, you can see the distribution of the fragments and see which ones are the long pole in your run.
~ KK
On 8/6/2018 9:01:25 AM, Peter Edike <pe...@interswitchgroup.com> wrote:
Hello all,
The following query takes ages to complete in drill and more often that not, fails
select * from dfs.`/iswdata/rj/201805` where unique_key in
(
select unique_key from dfs.`/iswdata/rj/201805` group by unique_key having count(unique_key) > 1
)
limit 40
Please what can I do to improve the performance of this query
Kind Regards
Peter Edike
Senior Software Engineer
Research and Development, ENG
Engineering
Office NO:
Mobile NO:
Email: peter.edike@interswitchgroup.com [mailto:peter.edike@interswitchgroup.com]
Interswitch Limited
1648C Oko-Awo Street, Victoria Island Lagos
Customer Contact Centre 0700-9065000
ü http://www.interswitchgroup.com [http://www.interswitchgroup.com/]
[https://www.quickteller.com/loan-request]
This e-mail and all attachments transmitted with it remain the property of Interswitch Limited , the information contained herein are private confidential and intended solely for the use of the addressee. If you have received this e-mail in error, kindly notify the sender. If you are not the addressee, you should not disseminate, distribute or copy this e-mail. Kindly notify Interswitch immediately by email if you have received this email in error and delete this email and any attachment from your system Emails cannot be guaranteed to be secure or error free as the message and any attachments could be intercepted, corrupted, lost, delayed, incomplete or amended. the contents of this email or its attachments have been scanned for all viruses and all reasonable measures have been taken to ensure that no viruses are present. Interswitch Limited and its subsidiaries do not accept liability for damage caused by this email or any attachments.