You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Melo <an...@gmail.com> on 2020/03/30 23:04:48 UTC
Optimizing LIMIT in DSv2
Hello,
Executing "SELECT Muon_Pt FROM rootDF LIMIT 10", where "rootDF" is a temp
view backed by a DSv2 reader yields the attached plan [1]. It appears that
the initial stage is run over every partition in rootDF, even though each
partition has 200k rows (modulo the last partition which holds the
remainder of rows in a file).
Is there some sort of hinting that can done from the datasource side to
better inform the optimizer or, alternately, am I missing an interface in
the PushDown filters that would let me elide transferring/decompressing
unnecessary partitions?
Thanks!
Andrew
[1]
== Parsed Logical Plan ==
'GlobalLimit 10
+- 'LocalLimit 10
+- 'Project ['Muon_pt]
+- 'UnresolvedRelation `rootDF`
== Analyzed Logical Plan ==
Muon_pt: array<float>
GlobalLimit 10
+- LocalLimit 10
+- Project [Muon_pt#119]
+- SubqueryAlias `rootdf`
+- RelationV2 root[run#0L, luminosityBlock#1L, event#2L,
CaloMET_phi#3, CaloMET_pt#4, CaloMET_sumEt#5, nElectron#6,
Electron_deltaEtaSC#7, Electron_dr03EcalRecHitSumEt#8,
Electron_dr03HcalDepth1TowerSumEt#9, Electron_dr03TkSumPt#10,
Electron_dxy#11, Electron_dxyErr#12, Electron_dz#13, Electron_dzErr#14,
Electron_eCorr#15, Electron_eInvMinusPInv#16, Electron_energyErr#17,
Electron_eta#18, Electron_hoe#19, Electron_ip3d#20, Electron_mass#21,
Electron_miniPFRelIso_all#22, Electron_miniPFRelIso_chg#23, ... 787 more
fields] (Options: [tree=Events,paths=["hdfs://cmshdfs/store/data"]])
== Optimized Logical Plan ==
GlobalLimit 10
+- LocalLimit 10
+- Project [Muon_pt#119]
+- RelationV2 root[run#0L, luminosityBlock#1L, event#2L,
CaloMET_phi#3, CaloMET_pt#4, CaloMET_sumEt#5, nElectron#6,
Electron_deltaEtaSC#7, Electron_dr03EcalRecHitSumEt#8,
Electron_dr03HcalDepth1TowerSumEt#9, Electron_dr03TkSumPt#10,
Electron_dxy#11, Electron_dxyErr#12, Electron_dz#13, Electron_dzErr#14,
Electron_eCorr#15, Electron_eInvMinusPInv#16, Electron_energyErr#17,
Electron_eta#18, Electron_hoe#19, Electron_ip3d#20, Electron_mass#21,
Electron_miniPFRelIso_all#22, Electron_miniPFRelIso_chg#23, ... 787 more
fields] (Options: [tree=Events,paths=["hdfs://cmshdfs/store/data"]])
== Physical Plan ==
CollectLimit 10
+- *(1) Project [Muon_pt#119]
+- *(1) ScanV2 root[Muon_pt#119] (Options:
[tree=Events,paths=["hdfs://cmshdfs/store/data"]])