You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by "Eisenhut, Roman" <ro...@tum.de> on 2017/11/25 14:25:38 UTC

Bulk loading into table vs view

Dear Phoenix-Team,

I did some test on bulk-loading data with the psql.py script in $PHOENIX_HOME/bin and the tpc-h data on my cluster with 1 master and 3 RS. I've found that it makes quite a difference whether you:

  1.  Create a table
  2.  Bulk load data into that table
Or

  1.  Create a table
  2.  Create a view
  3.  Bulk load data in the view

I was wondering where the overhead is coming from? (you can find my numbers below)

Additionally, I created a view over a table which was already filled and phoenix returned "No rows affected". At the same time I can't find a table in HBase that reflects the view, which makes me wonder whether views are actually materialized somewhere. As I'm quite interested in the view functionality of Phoenix, I was wondering whether someone can explain what is happening when a view is created?

Best regards,
Roman



psql.py -t X -d '|' X.csv, where X = table name

ID

TABLE


region

nation

supplier

customer

part

partsupp

orders

lineitem


5

25.00

10,000

150,000

200,000

800,000

1,500,000

6,001,215


1

0.068

0.11

2.959

18.789

27.881

107.03

164.853

1007.315

2

0.124

0.093

2.993

19.62

26.954

80.671

169.038

1039.294

3

0.07

0.092

2.795

20.745

29.036

76.855

177.765

1042.642

4

0.132

0.101

2.89

20.527

28.121

78.956

180.145

1019.047

5

0.072

0.116

3.334

27.494

28.891

75.455

166.668

1011.299


MIN

0.068

0.092

2.795

18.789

26.954

75.455

164.853

1007.315

MAX

0.132

0.116

3.334

27.494

29.036

107.03

180.145

1042.642

AVG

0.0932

0.1024

2.9942

21.435

28.1766

83.7934

171.6938

1023.919



psql.py -t X_VIEW -d '|' X.csv, where X = table name

ID

VIEW


region

nation

supplier

customer

part

partsupp

orders

lineitem


5

25.00

10,000

150,000

200,000

800,000

1,500,000

6,001,215


1

0.103

0.159

2.644

22.702

28.424

93.897

201.449



2

0.097

0.138

2.641

20.926

32.014

95.195

190.939



3

0.123

0.076

3.097

19.88

38.426

90.613

193.376



4

0.092

0.098

3.14

23.522

29.509

99.443

192.348



5

0.089

0.146

2.938

22.196

34.407

93.898

198.012




MIN

0.089

0.076

2.641

19.88

28.424

90.613

190.939

0

MAX

0.123

0.159

3.14

23.522

38.426

99.443

201.449

0

AVG

0.1008

0.1234

2.892

21.8452

32.556

94.6092

195.2248

#DIV/0!

Re: Bulk loading into table vs view

Posted by Sergey Soldatov <se...@gmail.com>.

Please take a look at https://phoenix.apache.org/views.html
All views are 'virtual' tables, so they don't have a dedicated physical
table and operates on top of the table that is specified in the view DDL.

Thanks,
Sergey

On Sat, Nov 25, 2017 at 6:25 AM, Eisenhut, Roman <ro...@tum.de>
wrote:

> Dear Phoenix-Team,
>
>
>
> I did some test on bulk-loading data with the psql.py script in
> $PHOENIX_HOME/bin and the tpc-h data on my cluster with 1 master and 3 RS.
> I’ve found that it makes quite a difference whether you:
>
>    1. Create a table
>    2. Bulk load data into that table
>
> Or
>
>    1. Create a table
>    2. Create a view
>    3. Bulk load data in the view
>
>
>
> I was wondering where the overhead is coming from? (you can find my
> numbers below)
>
>
>
> Additionally, I created a view over a table which was already filled and
> phoenix returned “No rows affected”. At the same time I can’t find a table
> in HBase that reflects the view, which makes me wonder whether views are
> actually materialized somewhere. As I’m quite interested in the view
> functionality of Phoenix, I was wondering whether someone can explain what
> is happening when a view is created?
>
>
>
> Best regards,
>
> Roman
>
>
>
>
>
> *psql.py -t X -d '|' X.csv, where X = table name*
>
> *ID*
>
> *TABLE*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.068
>
> 0.11
>
> 2.959
>
> 18.789
>
> 27.881
>
> 107.03
>
> 164.853
>
> 1007.315
>
> *2*
>
> 0.124
>
> 0.093
>
> 2.993
>
> 19.62
>
> 26.954
>
> 80.671
>
> 169.038
>
> 1039.294
>
> *3*
>
> 0.07
>
> 0.092
>
> 2.795
>
> 20.745
>
> 29.036
>
> 76.855
>
> 177.765
>
> 1042.642
>
> *4*
>
> 0.132
>
> 0.101
>
> 2.89
>
> 20.527
>
> 28.121
>
> 78.956
>
> 180.145
>
> 1019.047
>
> *5*
>
> 0.072
>
> 0.116
>
> 3.334
>
> 27.494
>
> 28.891
>
> 75.455
>
> 166.668
>
> 1011.299
>
> *MIN*
>
> 0.068
>
> 0.092
>
> 2.795
>
> 18.789
>
> 26.954
>
> 75.455
>
> 164.853
>
> 1007.315
>
> *MAX*
>
> 0.132
>
> 0.116
>
> 3.334
>
> 27.494
>
> 29.036
>
> 107.03
>
> 180.145
>
> 1042.642
>
> *AVG*
>
> 0.0932
>
> 0.1024
>
> 2.9942
>
> 21.435
>
> 28.1766
>
> 83.7934
>
> 171.6938
>
> 1023.919
>
>
>
> *psql.py -t X_VIEW -d '|' X.csv, where X = table name*
>
> *ID*
>
> *VIEW*
>
> *region*
>
> *nation*
>
> *supplier*
>
> *customer*
>
> *part*
>
> *partsupp*
>
> *orders*
>
> *lineitem*
>
> *5*
>
> *25.00*
>
> *10,000*
>
> *150,000*
>
> *200,000*
>
> *800,000*
>
> *1,500,000*
>
> *6,001,215*
>
> *1*
>
> 0.103
>
> 0.159
>
> 2.644
>
> 22.702
>
> 28.424
>
> 93.897
>
> 201.449
>
>
>
> *2*
>
> 0.097
>
> 0.138
>
> 2.641
>
> 20.926
>
> 32.014
>
> 95.195
>
> 190.939
>
>
>
> *3*
>
> 0.123
>
> 0.076
>
> 3.097
>
> 19.88
>
> 38.426
>
> 90.613
>
> 193.376
>
>
>
> *4*
>
> 0.092
>
> 0.098
>
> 3.14
>
> 23.522
>
> 29.509
>
> 99.443
>
> 192.348
>
>
>
> *5*
>
> 0.089
>
> 0.146
>
> 2.938
>
> 22.196
>
> 34.407
>
> 93.898
>
> 198.012
>
>
>
> *MIN*
>
> 0.089
>
> 0.076
>
> 2.641
>
> 19.88
>
> 28.424
>
> 90.613
>
> 190.939
>
> 0
>
> *MAX*
>
> 0.123
>
> 0.159
>
> 3.14
>
> 23.522
>
> 38.426
>
> 99.443
>
> 201.449
>
> 0
>
> *AVG*
>
> 0.1008
>
> 0.1234
>
> 2.892
>
> 21.8452
>
> 32.556
>
> 94.6092
>
> 195.2248
>
> #DIV/0!
>
>
>