You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gora.apache.org by Sheriffo Ceesay <sn...@gmail.com> on 2019/07/14 23:33:13 UTC

Week 7 Report

Week seven report is available at
https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report

Basically, I am currently running workloads on HBase. I will continue to do
this for next week and probably the week after. More details are specified
in the report.

Please let me know if you have any questions.



**Sheriffo Ceesay**

Re: Week 7 Report

Posted by Sheriffo Ceesay <sn...@gmail.com>.

Hi Renato,

Thank you very much for your reply.

Find reply to your comments inline.

On Mon, Jul 22, 2019 at 8:18 AM Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hey Sheriffo,
>
> Very nice work! I am sorry for the silence in the past weeks, but I
> have been swamped with things. I hope I can be of more help now.

> Anyway, regarding your progress reports, I have some questions:
> - Regarding using Google Cloud credits,  did you see this? [1] Maybe
> that would also be something we could try. Although I am not sure how
> compatible/incompatible the required versions are. Maybe that could be
> an alternative of key-value store instead of others beside the ones
> you have picked so far.
>
Yes, I have seen this in Google Cloud Console when I was trying to set up
HBase but I was not sure about it's compatible with Gora. So I decided to
use a fresh installation of HBase. I will find time and have a look at it.

> - Regarding the exception when creating very large objects to be
> serialized, what about using arrays of records from Avro? or maybe
> just arrays of primitive types? with that we could increase the size
> of the value and have an extra knob to try in the benchmarks.
>

This problem is now resolved, it was due to improper configuration of
HBase. However, I think it will be interesting to use Arrays of objects as
well and see if there will be any significant improvement.

- Regarding, the last report where you have some plots, could you
> explain what you are plotting? e.g., what is on the x-axis? aggregated
> number of inserted keys? or numbers of keys inserted at a particular
> point in time?
> Overall, very nice work Sheriffo! Thanks for all the good work!
>

In short:- yes it is the aggregated number of inserted keys.

Basically, YCSB has a couple of workloads that are used to benchmark any
implemented datastore.  These workloads comprise of a mix of read, write
and update operations.  The six workloads are listed. below.

   - Workload A: Update heavy workload: 50/50% Mix of Reads/Writes
   - Workload B: Read mostly workload: 95/5% Mix of Reads/Writes
   - Workload C: Read-only: 100% reads
   - Workload D: Read the latest workload: More traffic on recent inserts
   - Workload E: Short ranges: Short range based queries
   - Workload F: Read-modify-write: Read, modify and update existing records

So in the first plot (Insert Operation), the x-axis represent the total
number of records that are inserted into to MongoDB and HBase using gora
and native api (YCSB). The axis ranges from 100K to 10M.

The second and third plots represents results of workload A and workload B.
The x-axis records the total combined operations. E.g. If we set workload
A's operation count to 5 million then the framework will execute 2.5
million reads and 2.5 million updates.

I hope that makes. sense.

>
> Best,
>
> Renato M.
>
Best,
Sheriffo

>
>
> [1] https://cloud.google.com/bigtable/docs/hbase-bigtable
>
> El dom., 14 jul. 2019 a las 16:33, Sheriffo Ceesay
> (<sn...@gmail.com>) escribió:
> >
> > Week seven report is available at
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > Basically, I am currently running workloads on HBase. I will continue to
> do this for next week and probably the week after. More details are
> specified in the report.
> >
> > Please let me know if you have any questions.
> >
> >
> > *Sheriffo Ceesay*
> >
>

Re: Week 7 Report

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Hey Sheriffo,

Very nice work! I am sorry for the silence in the past weeks, but I
have been swamped with things. I hope I can be of more help now.
Anyway, regarding your progress reports, I have some questions:
- Regarding using Google Cloud credits,  did you see this? [1] Maybe
that would also be something we could try. Although I am not sure how
compatible/incompatible the required versions are. Maybe that could be
an alternative of key-value store instead of others beside the ones
you have picked so far.
- Regarding the exception when creating very large objects to be
serialized, what about using arrays of records from Avro? or maybe
just arrays of primitive types? with that we could increase the size
of the value and have an extra knob to try in the benchmarks.
- Regarding, the last report where you have some plots, could you
explain what you are plotting? e.g., what is on the x-axis? aggregated
number of inserted keys? or numbers of keys inserted at a particular
point in time?
Overall, very nice work Sheriffo! Thanks for all the good work!


Best,

Renato M.


[1] https://cloud.google.com/bigtable/docs/hbase-bigtable

El dom., 14 jul. 2019 a las 16:33, Sheriffo Ceesay
(<sn...@gmail.com>) escribió:
>
> Week seven report is available at https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>
> Basically, I am currently running workloads on HBase. I will continue to do this for next week and probably the week after. More details are specified in the report.
>
> Please let me know if you have any questions.
>
>
> *Sheriffo Ceesay*
>