You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gora.apache.org by Sheriffo Ceesay <sn...@gmail.com> on 2019/08/21 16:42:11 UTC

Final Report

All,

My draft final report is available at
https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora

We have until 26th of this month submit the report. Please let me know if
you have any comments to improve it.

Meanwhile, I will work on the documentation on how to run the benchmark
module and publish on gora website.

Thank you.

**Sheriffo Ceesay**

Re: Final Report

Posted by Sheriffo Ceesay <sn...@gmail.com>.

Hi Renato,

See replies inline.

On Thu, Aug 22, 2019 at 5:52 PM Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hey Sheriffo,
>
> Thanks for the report and all the work!
> Gora performing worst when inserting data in the HBase case I think it
> can make sense, because Gora still needs to serialize every data bean
> through Avro, (maybe some caching? but Sheriffo also deactivated this
> with gora.hbasestore.hbase.client.autoflush.enabled=true) so I guess
> the rest of the time it is just Gora serialization.
>

I agree with you.


> Now for the reads in HBase-native and HBase-Gora, are we sure we are
> getting the same granularity of objects? I mean because of the mapping
> Gora does (different column families per attribute), maybe we are
> fetching the attributes in a different way than HBase is doing, maybe
> Gora fetches only some column families whereas HBase fetches
> everything.
>

I have done some basic test to verify this see the testUpdate() method in
the GoraClientTest file. Here, I insert some strings retrieve them and
verify that they match the expected value.

Did you run any correctness tests to know that we are retrieving the
> correct results in both cases? Something like inserting an integer as
> part of the attributes, and then summing them when retrieved to check
> that the sum is what we expect.
>

Thanks for this, I have added a new test case called testCorrectness() to
handle the issue you have raised. The results I got are consistent with we
are expecting.

>
> Best,
>
> Renato M.
>
> El jue., 22 ago. 2019 a las 5:17, Sheriffo Ceesay
> (<sn...@gmail.com>) escribió:
> >
> > Hi Furqan,
> >
> > Yes, it baffled me as well. I haven't made any specific performance
> optimisation configuration to either of the setups so I think these results
> may not be final at this stage and would need further investigation.
> >
> > The only setting I set for HBase for Apache Gora in the gora.properties
> file is:
> >
> > gora.hbasestore.hbase.client.autoflush.enabled=true
> >
> > For the local HBase setup, I have followed the recommendations here [1]
> to avoid any performance issues.
> >
> > https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
> >
> > Basically, the setups are fresh and simplified installations with any
> major configuration for optimisation.
> >
> > Thank you.
> >
> > *Sheriffo Ceesay*
> >
> >
> >
> > On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI <fu...@gmail.com>
> wrote:
> >>
> >> Hi Sheriffo,
> >>
> >> Thanks for the updates!
> >>
> >> By the way, I still wonder the reason of poorly performance of HBase
> native
> >> implementation.
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay <sn...@gmail.com>
> >> wrote:
> >>
> >> > Hi Furkan,
> >> > Thanks for your feedback.
> >> >
> >> > Please find replies to your comments inline.
> >> >
> >> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <furkankamaci@gmail.com
> >
> >> > wrote:
> >> >
> >> > > Hi Sheriffo,
> >> > >
> >> > > Thanks for your great effort!
> >> > >
> >> > > 1) Could you separate charts for HBase and MongoDB? HBase charts
> suppress
> >> > > MongoDB ones.
> >> > >
> >> > Yes, this is now done. Can you please have a look?
> >> >
> >> > >
> >> > > 2) Report says that:
> >> > >
> >> > > *"In this work, we have time to include only three gora data stores
> >> > > (MongoDB, HBase and CouchDB)"*
> >> > >
> >> > > However, you have not run this benchmark for CouchDB as far as I
> know?
> >> > >
> >> >
> >> > Yes, you are right that it is not included in the benchmark results
> but I
> >> > have included its implementation in the module. This includes
> >> > auto-generating mapping and related files. Due to time factors, there
> was a
> >> > bit of discussion as to which datastores to include in the preliminary
> >> > benchmarking and we have decided to include HBase and MongoDB. In
> future, I
> >> > will work on adding more data stores and compare their performance as
> well.
> >> >
> >> >
> >> > > 3) I don't think there is a need to add commit hashes and messages
> as
> >> > > Appendix. Especially if we consider that hashes will be changed
> once the
> >> > PR
> >> > > merged into the codebase.
> >> > >
> >> >
> >> > I have seen this as a good tip in the email send by GSoC team, but I
> agree
> >> > with you and I have now removed this.
> >> >
> >> > >
> >> > > Kind Regards,
> >> > > Furkan KAMACI
> >> >
> >> >
> >> > Thank you.
> >> > Sheriffo.
> >> >
> >> > >
> >> >
> >> >
> >> > > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <
> sneceesay77@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > All,
> >> > > >
> >> > > > My draft final report is available at
> >> > > >
> >> > > >
> >> > >
> >> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> >> > > >
> >> > > > We have until 26th of this month submit the report. Please let me
> know
> >> > if
> >> > > > you have any comments to improve it.
> >> > > >
> >> > > > Meanwhile, I will work on the documentation on how to run the
> benchmark
> >> > > > module and publish on gora website.
> >> > > >
> >> > > > Thank you.
> >> > > >
> >> > > > **Sheriffo Ceesay**
> >> > > >
> >> > >
> >> >
>

Re: Final Report

Posted by Renato Marroquín Mogrovejo <re...@gmail.com>.

Hey Sheriffo,

Thanks for the report and all the work!
Gora performing worst when inserting data in the HBase case I think it
can make sense, because Gora still needs to serialize every data bean
through Avro, (maybe some caching? but Sheriffo also deactivated this
with gora.hbasestore.hbase.client.autoflush.enabled=true) so I guess
the rest of the time it is just Gora serialization.
Now for the reads in HBase-native and HBase-Gora, are we sure we are
getting the same granularity of objects? I mean because of the mapping
Gora does (different column families per attribute), maybe we are
fetching the attributes in a different way than HBase is doing, maybe
Gora fetches only some column families whereas HBase fetches
everything.
Did you run any correctness tests to know that we are retrieving the
correct results in both cases? Something like inserting an integer as
part of the attributes, and then summing them when retrieved to check
that the sum is what we expect.

Best,

Renato M.

El jue., 22 ago. 2019 a las 5:17, Sheriffo Ceesay
(<sn...@gmail.com>) escribió:
>
> Hi Furqan,
>
> Yes, it baffled me as well. I haven't made any specific performance optimisation configuration to either of the setups so I think these results may not be final at this stage and would need further investigation.
>
> The only setting I set for HBase for Apache Gora in the gora.properties file is:
>
> gora.hbasestore.hbase.client.autoflush.enabled=true
>
> For the local HBase setup, I have followed the recommendations here [1] to avoid any performance issues.
>
> https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
>
> Basically, the setups are fresh and simplified installations with any major configuration for optimisation.
>
> Thank you.
>
> *Sheriffo Ceesay*
>
>
>
> On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI <fu...@gmail.com> wrote:
>>
>> Hi Sheriffo,
>>
>> Thanks for the updates!
>>
>> By the way, I still wonder the reason of poorly performance of HBase native
>> implementation.
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay <sn...@gmail.com>
>> wrote:
>>
>> > Hi Furkan,
>> > Thanks for your feedback.
>> >
>> > Please find replies to your comments inline.
>> >
>> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <fu...@gmail.com>
>> > wrote:
>> >
>> > > Hi Sheriffo,
>> > >
>> > > Thanks for your great effort!
>> > >
>> > > 1) Could you separate charts for HBase and MongoDB? HBase charts suppress
>> > > MongoDB ones.
>> > >
>> > Yes, this is now done. Can you please have a look?
>> >
>> > >
>> > > 2) Report says that:
>> > >
>> > > *"In this work, we have time to include only three gora data stores
>> > > (MongoDB, HBase and CouchDB)"*
>> > >
>> > > However, you have not run this benchmark for CouchDB as far as I know?
>> > >
>> >
>> > Yes, you are right that it is not included in the benchmark results but I
>> > have included its implementation in the module. This includes
>> > auto-generating mapping and related files. Due to time factors, there was a
>> > bit of discussion as to which datastores to include in the preliminary
>> > benchmarking and we have decided to include HBase and MongoDB. In future, I
>> > will work on adding more data stores and compare their performance as well.
>> >
>> >
>> > > 3) I don't think there is a need to add commit hashes and messages as
>> > > Appendix. Especially if we consider that hashes will be changed once the
>> > PR
>> > > merged into the codebase.
>> > >
>> >
>> > I have seen this as a good tip in the email send by GSoC team, but I agree
>> > with you and I have now removed this.
>> >
>> > >
>> > > Kind Regards,
>> > > Furkan KAMACI
>> >
>> >
>> > Thank you.
>> > Sheriffo.
>> >
>> > >
>> >
>> >
>> > > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <sn...@gmail.com>
>> > > wrote:
>> > >
>> > > > All,
>> > > >
>> > > > My draft final report is available at
>> > > >
>> > > >
>> > >
>> > https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
>> > > >
>> > > > We have until 26th of this month submit the report. Please let me know
>> > if
>> > > > you have any comments to improve it.
>> > > >
>> > > > Meanwhile, I will work on the documentation on how to run the benchmark
>> > > > module and publish on gora website.
>> > > >
>> > > > Thank you.
>> > > >
>> > > > **Sheriffo Ceesay**
>> > > >
>> > >
>> >

Re: Final Report

Posted by Sheriffo Ceesay <sn...@gmail.com>.

Hi Furqan,

Yes, it baffled me as well. I haven't made any specific performance
optimisation configuration to either of the setups so I think these results
may not be final at this stage and would need further investigation.

The only setting I set for HBase for Apache Gora in the gora.properties
file is:

*gora.hbasestore.hbase.client.autoflush.enabled=true*

For the local HBase setup, I have followed the recommendations here [1] to
avoid any performance issues.

https://github.com/brianfrankcooper/YCSB/tree/master/hbase098

Basically, the setups are fresh and simplified installations with any major
configuration for optimisation.

Thank you.


**Sheriffo Ceesay**


On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI <fu...@gmail.com>
wrote:

> Hi Sheriffo,
>
> Thanks for the updates!
>
> By the way, I still wonder the reason of poorly performance of HBase native
> implementation.
>
> Kind Regards,
> Furkan KAMACI
>
> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay <sn...@gmail.com>
> wrote:
>
> > Hi Furkan,
> > Thanks for your feedback.
> >
> > Please find replies to your comments inline.
> >
> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <fu...@gmail.com>
> > wrote:
> >
> > > Hi Sheriffo,
> > >
> > > Thanks for your great effort!
> > >
> > > 1) Could you separate charts for HBase and MongoDB? HBase charts
> suppress
> > > MongoDB ones.
> > >
> > Yes, this is now done. Can you please have a look?
> >
> > >
> > > 2) Report says that:
> > >
> > > *"In this work, we have time to include only three gora data stores
> > > (MongoDB, HBase and CouchDB)"*
> > >
> > > However, you have not run this benchmark for CouchDB as far as I know?
> > >
> >
> > Yes, you are right that it is not included in the benchmark results but I
> > have included its implementation in the module. This includes
> > auto-generating mapping and related files. Due to time factors, there
> was a
> > bit of discussion as to which datastores to include in the preliminary
> > benchmarking and we have decided to include HBase and MongoDB. In
> future, I
> > will work on adding more data stores and compare their performance as
> well.
> >
> >
> > > 3) I don't think there is a need to add commit hashes and messages as
> > > Appendix. Especially if we consider that hashes will be changed once
> the
> > PR
> > > merged into the codebase.
> > >
> >
> > I have seen this as a good tip in the email send by GSoC team, but I
> agree
> > with you and I have now removed this.
> >
> > >
> > > Kind Regards,
> > > Furkan KAMACI
> >
> >
> > Thank you.
> > Sheriffo.
> >
> > >
> >
> >
> > > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <sneceesay77@gmail.com
> >
> > > wrote:
> > >
> > > > All,
> > > >
> > > > My draft final report is available at
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> > > >
> > > > We have until 26th of this month submit the report. Please let me
> know
> > if
> > > > you have any comments to improve it.
> > > >
> > > > Meanwhile, I will work on the documentation on how to run the
> benchmark
> > > > module and publish on gora website.
> > > >
> > > > Thank you.
> > > >
> > > > **Sheriffo Ceesay**
> > > >
> > >
> >
>

Re: Final Report

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Sheriffo,

Thanks for the updates!

By the way, I still wonder the reason of poorly performance of HBase native
implementation.

Kind Regards,
Furkan KAMACI

On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay <sn...@gmail.com>
wrote:

> Hi Furkan,
> Thanks for your feedback.
>
> Please find replies to your comments inline.
>
> On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <fu...@gmail.com>
> wrote:
>
> > Hi Sheriffo,
> >
> > Thanks for your great effort!
> >
> > 1) Could you separate charts for HBase and MongoDB? HBase charts suppress
> > MongoDB ones.
> >
> Yes, this is now done. Can you please have a look?
>
> >
> > 2) Report says that:
> >
> > *"In this work, we have time to include only three gora data stores
> > (MongoDB, HBase and CouchDB)"*
> >
> > However, you have not run this benchmark for CouchDB as far as I know?
> >
>
> Yes, you are right that it is not included in the benchmark results but I
> have included its implementation in the module. This includes
> auto-generating mapping and related files. Due to time factors, there was a
> bit of discussion as to which datastores to include in the preliminary
> benchmarking and we have decided to include HBase and MongoDB. In future, I
> will work on adding more data stores and compare their performance as well.
>
>
> > 3) I don't think there is a need to add commit hashes and messages as
> > Appendix. Especially if we consider that hashes will be changed once the
> PR
> > merged into the codebase.
> >
>
> I have seen this as a good tip in the email send by GSoC team, but I agree
> with you and I have now removed this.
>
> >
> > Kind Regards,
> > Furkan KAMACI
>
>
> Thank you.
> Sheriffo.
>
> >
>
>
> > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <sn...@gmail.com>
> > wrote:
> >
> > > All,
> > >
> > > My draft final report is available at
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> > >
> > > We have until 26th of this month submit the report. Please let me know
> if
> > > you have any comments to improve it.
> > >
> > > Meanwhile, I will work on the documentation on how to run the benchmark
> > > module and publish on gora website.
> > >
> > > Thank you.
> > >
> > > **Sheriffo Ceesay**
> > >
> >
>

Re: Final Report

Posted by Sheriffo Ceesay <sn...@gmail.com>.

Hi Furkan,
Thanks for your feedback.

Please find replies to your comments inline.

On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <fu...@gmail.com>
wrote:

> Hi Sheriffo,
>
> Thanks for your great effort!
>
> 1) Could you separate charts for HBase and MongoDB? HBase charts suppress
> MongoDB ones.
>
Yes, this is now done. Can you please have a look?

>
> 2) Report says that:
>
> *"In this work, we have time to include only three gora data stores
> (MongoDB, HBase and CouchDB)"*
>
> However, you have not run this benchmark for CouchDB as far as I know?
>

Yes, you are right that it is not included in the benchmark results but I
have included its implementation in the module. This includes
auto-generating mapping and related files. Due to time factors, there was a
bit of discussion as to which datastores to include in the preliminary
benchmarking and we have decided to include HBase and MongoDB. In future, I
will work on adding more data stores and compare their performance as well.

> 3) I don't think there is a need to add commit hashes and messages as
> Appendix. Especially if we consider that hashes will be changed once the PR
> merged into the codebase.
>

I have seen this as a good tip in the email send by GSoC team, but I agree
with you and I have now removed this.

>
> Kind Regards,
> Furkan KAMACI

Thank you.
Sheriffo.

>

> On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <sn...@gmail.com>
> wrote:
>
> > All,
> >
> > My draft final report is available at
> >
> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> >
> > We have until 26th of this month submit the report. Please let me know if
> > you have any comments to improve it.
> >
> > Meanwhile, I will work on the documentation on how to run the benchmark
> > module and publish on gora website.
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
> >
>

Re: Final Report

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Sheriffo,

Thanks for your great effort!

1) Could you separate charts for HBase and MongoDB? HBase charts suppress
MongoDB ones.

2) Report says that:

*"In this work, we have time to include only three gora data stores
(MongoDB, HBase and CouchDB)"*

However, you have not run this benchmark for CouchDB as far as I know?

3) I don't think there is a need to add commit hashes and messages as
Appendix. Especially if we consider that hashes will be changed once the PR
merged into the codebase.

Kind Regards,
Furkan KAMACI

On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <sn...@gmail.com>
wrote:

> All,
>
> My draft final report is available at
>
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
>
> We have until 26th of this month submit the report. Please let me know if
> you have any comments to improve it.
>
> Meanwhile, I will work on the documentation on how to run the benchmark
> module and publish on gora website.
>
> Thank you.
>
> **Sheriffo Ceesay**
>