You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by Dominic Egger <do...@scigility.com> on 2018/03/26 10:33:37 UTC

Write to Disk SQLLine vs Spark with secondary indexing

Hello Phoenix Usergroup
I have a query on a table about 170Mio strong selecting out around 700k.
The query retrieves row-key fields, fields covered by the index as well as
one only occurring in the table itself. We also use index-hinting. This
works very quickly when using SQLLine and dumping the results to a file
(46s). However writing the same query in Spark and materializing the result
in driver memory it takes much longer (10min). I suspect the issue is the
index-hinting but I cannot find out how to get Spark to use the correct
index. Does anyone know how to do that?

Looking at IO usage and HBase overview I suspect the Spark approach leads
to a complete tablescan. The HBase readrate and IO rate on Disk at least
would seem like it.

Best Regards
Dominic Egger