You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Eisenhut, Roman" <ro...@tum.de> on 2017/11/25 14:25:38 UTC
Bulk loading into table vs view
Dear Phoenix-Team,
I did some test on bulk-loading data with the psql.py script in $PHOENIX_HOME/bin and the tpc-h data on my cluster with 1 master and 3 RS. I've found that it makes quite a difference whether you:
1. Create a table
2. Bulk load data into that table
Or
1. Create a table
2. Create a view
3. Bulk load data in the view
I was wondering where the overhead is coming from? (you can find my numbers below)
Additionally, I created a view over a table which was already filled and phoenix returned "No rows affected". At the same time I can't find a table in HBase that reflects the view, which makes me wonder whether views are actually materialized somewhere. As I'm quite interested in the view functionality of Phoenix, I was wondering whether someone can explain what is happening when a view is created?
Best regards,
Roman
psql.py -t X -d '|' X.csv, where X = table name
ID
TABLE
region
nation
supplier
customer
part
partsupp
orders
lineitem
5
25.00
10,000
150,000
200,000
800,000
1,500,000
6,001,215
1
0.068
0.11
2.959
18.789
27.881
107.03
164.853
1007.315
2
0.124
0.093
2.993
19.62
26.954
80.671
169.038
1039.294
3
0.07
0.092
2.795
20.745
29.036
76.855
177.765
1042.642
4
0.132
0.101
2.89
20.527
28.121
78.956
180.145
1019.047
5
0.072
0.116
3.334
27.494
28.891
75.455
166.668
1011.299
MIN
0.068
0.092
2.795
18.789
26.954
75.455
164.853
1007.315
MAX
0.132
0.116
3.334
27.494
29.036
107.03
180.145
1042.642
AVG
0.0932
0.1024
2.9942
21.435
28.1766
83.7934
171.6938
1023.919
psql.py -t X_VIEW -d '|' X.csv, where X = table name
ID
VIEW
region
nation
supplier
customer
part
partsupp
orders
lineitem
5
25.00
10,000
150,000
200,000
800,000
1,500,000
6,001,215
1
0.103
0.159
2.644
22.702
28.424
93.897
201.449
2
0.097
0.138
2.641
20.926
32.014
95.195
190.939
3
0.123
0.076
3.097
19.88
38.426
90.613
193.376
4
0.092
0.098
3.14
23.522
29.509
99.443
192.348
5
0.089
0.146
2.938
22.196
34.407
93.898
198.012
MIN
0.089
0.076
2.641
19.88
28.424
90.613
190.939
0
MAX
0.123
0.159
3.14
23.522
38.426
99.443
201.449
0
AVG
0.1008
0.1234
2.892
21.8452
32.556
94.6092
195.2248
#DIV/0!
Re: Bulk loading into table vs view
Posted by Sergey Soldatov <se...@gmail.com>.
Please take a look at https://phoenix.apache.org/views.html
All views are 'virtual' tables, so they don't have a dedicated physical
table and operates on top of the table that is specified in the view DDL.
Thanks,
Sergey
On Sat, Nov 25, 2017 at 6:25 AM, Eisenhut, Roman <ro...@tum.de>
wrote:
> Dear Phoenix-Team,
>
>
>
> I did some test on bulk-loading data with the psql.py script in
> $PHOENIX_HOME/bin and the tpc-h data on my cluster with 1 master and 3 RS.
> I’ve found that it makes quite a difference whether you:
>
> 1. Create a table
> 2. Bulk load data into that table
>
> Or
>
> 1. Create a table
> 2. Create a view
> 3. Bulk load data in the view
>
>
>
> I was wondering where the overhead is coming from? (you can find my
> numbers below)
>
>
>
> Additionally, I created a view over a table which was already filled and
> phoenix returned “No rows affected”. At the same time I can’t find a table
> in HBase that reflects the view, which makes me wonder whether views are
> actually materialized somewhere. As I’m quite interested in the view
> functionality of Phoenix, I was wondering whether someone can explain what
> is happening when a view is created?
>
>
>
> Best regards,
>
> Roman
>
>
>
>
>
> *psql.py -t X -d '|' X.csv, where X = table name*
>
> *ID*
>
> *TABLE*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.068
>
> 0.11
>
> 2.959
>
> 18.789
>
> 27.881
>
> 107.03
>
> 164.853
>
> 1007.315
>
> *2*
>
> 0.124
>
> 0.093
>
> 2.993
>
> 19.62
>
> 26.954
>
> 80.671
>
> 169.038
>
> 1039.294
>
> *3*
>
> 0.07
>
> 0.092
>
> 2.795
>
> 20.745
>
> 29.036
>
> 76.855
>
> 177.765
>
> 1042.642
>
> *4*
>
> 0.132
>
> 0.101
>
> 2.89
>
> 20.527
>
> 28.121
>
> 78.956
>
> 180.145
>
> 1019.047
>
> *5*
>
> 0.072
>
> 0.116
>
> 3.334
>
> 27.494
>
> 28.891
>
> 75.455
>
> 166.668
>
> 1011.299
>
> *MIN*
>
> 0.068
>
> 0.092
>
> 2.795
>
> 18.789
>
> 26.954
>
> 75.455
>
> 164.853
>
> 1007.315
>
> *MAX*
>
> 0.132
>
> 0.116
>
> 3.334
>
> 27.494
>
> 29.036
>
> 107.03
>
> 180.145
>
> 1042.642
>
> *AVG*
>
> 0.0932
>
> 0.1024
>
> 2.9942
>
> 21.435
>
> 28.1766
>
> 83.7934
>
> 171.6938
>
> 1023.919
>
>
>
> *psql.py -t X_VIEW -d '|' X.csv, where X = table name*
>
> *ID*
>
> *VIEW*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.103
>
> 0.159
>
> 2.644
>
> 22.702
>
> 28.424
>
> 93.897
>
> 201.449
>
>
>
> *2*
>
> 0.097
>
> 0.138
>
> 2.641
>
> 20.926
>
> 32.014
>
> 95.195
>
> 190.939
>
>
>
> *3*
>
> 0.123
>
> 0.076
>
> 3.097
>
> 19.88
>
> 38.426
>
> 90.613
>
> 193.376
>
>
>
> *4*
>
> 0.092
>
> 0.098
>
> 3.14
>
> 23.522
>
> 29.509
>
> 99.443
>
> 192.348
>
>
>
> *5*
>
> 0.089
>
> 0.146
>
> 2.938
>
> 22.196
>
> 34.407
>
> 93.898
>
> 198.012
>
>
>
> *MIN*
>
> 0.089
>
> 0.076
>
> 2.641
>
> 19.88
>
> 28.424
>
> 90.613
>
> 190.939
>
> 0
>
> *MAX*
>
> 0.123
>
> 0.159
>
> 3.14
>
> 23.522
>
> 38.426
>
> 99.443
>
> 201.449
>
> 0
>
> *AVG*
>
> 0.1008
>
> 0.1234
>
> 2.892
>
> 21.8452
>
> 32.556
>
> 94.6092
>
> 195.2248
>
> #DIV/0!
>
>
>