You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ofbiz.apache.org by Giulio Speri - MpStyle Srl <gi...@mpstyle.it> on 2019/04/11 23:46:28 UTC

Visit/Visitor specific client IPs tracking exclusion

Hello devs,

I'm writing because I would like to explain a problem my company, MpStyle,
faced with an OFBiz installation with two active ecommerce sites, for one
of our customers.
I am writing this email to the dev mailing list, because I could not find
any reference in the mailings to the kind of problem we faced, and I think
that the solution we built, could be an improvement to the OFBiz
visit/visitor tracking capabilites.

I shortly explain the server architecture on which OFBiz is running: hosted
by a third party supplier, there are two (virtual) machines where Apache
OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
fronts).
On other two different machines there are the database (MariaDB) and
HaProxy has a load balancer.
HaProxy is configured to perform its Health Checks on the backend servers
with a Http GET on the Home Page of one of the two sites.
Visit and Visitor tracking are enabled, for BI and analytics purposes, so
we cannot turn them off.
These two combined things caused the Visit and Visitor tables to explode in
dimensions (we counted about 19M records of Visit and about 67M of
Visitors, with the 86% of those caused by the load balancer), since each
hit of the HaProxy store a Visit and a Visitor record on the db (plus some
other record of other entities, like ShoppingList, due to <firstvisit> and
<preprocessor> events).
A bad side effect of this situation, on the long run, is an overall
performance degradation, and an increase in webfront unavailability time
windows during the day: it's not necessary to say that our customer was not
so happy about this.

The difficult part of figuring out this problem, was that we did not have
direct access to HaProxy and DB machines, to check logs.

The solution we thought and implemented, was to exclude from Visit/Visitor
tracking specific IP addresses (for our case we were interested in HaProxy
IP).
The Visit and Visitor records (along with firstvisit and preprocessor
events) are created mainly in the ControlServlet class, using VisitHandler
getVisit/getVisitor/getVisitId methods.

Our idea consist in reading from a .properties file one or more IP
addresses we would like to exclude from tracking and then check them
against the client ip address the request is coming from.
If the client ip address is in the "exclusion list", then do not persist
visit/visitor and do not run firstvisit events neither for it.

The idea is quite simple, but we noticed in few days, a meaningful
improvement in overall system performance and stability/availability.

This kind of exclusion could be also useful in case we do not want to track
or register internal IP addresses (ie: mainly used for testing).

However this solution, should be integrated with a service (cron or
scheduled in ofbiz) that keeps the number of records in the tables limited
(for example keep only the last month of visit/visitor); I think that these
two solutions together, could do the job well.

I hope my explanation was clear enough and I would be happy to know what do
you think about this.

Thank you all for the attention!

Regards,

Giulio



























-- 
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 334/3779851

www.mpstyle.it

Re: Visit/Visitor specific client IPs tracking exclusion

Posted by Nicolas Malin <ni...@nereide.fr>.
Greats Giulio,

I will follow this and try to help if my time permit this.

Nicolas

On 25/04/2019 11:22, Giulio Speri - MpStyle Srl wrote:
> Hello everyone,
>
> I have created a Jira task for this.
>
> OFBIZ-10957 <https://issues.apache.org/jira/browse/OFBIZ-10957>
>
>
> Giulio
>
> Il giorno sab 13 apr 2019 alle ore 16:30 Giulio Speri - MpStyle Srl <
> giulio.speri@mpstyle.it> ha scritto:
>
>> Hi Nicolas,
>>
>> thank you for the suggestion.
>> For HaProxy in fact, we prepared a simple request "hacheck" used
>> specifically for Health Checks; I tried to set the attributes "track-visit"
>> and "track-serverhit" to "false" on the request-map and made some test
>> (with the help JMeter) but they affect only "firstvisit" events and the
>> tracking of the server hit.
>> The problem is that records for Visit and Visitor are created in the
>> *ControlServlet#doGet*, before the *RequestHandler#trackVisit* and
>> *RequestHandler#trackStats* are executed.
>>
>> In particular, the trackVisit method, evaluated to check if
>> *FIRST_VISIT_EVENTS* should be run, is executed in the
>> *RequestHandler#doRequest* method, called in the ControlServlet after the
>> setup of some initial request attributes and after these two statements:
>>
>> *VisitHandler.getVisitor(request, response);*
>>
>> and
>>
>> *String visitId = VisitHandler.getVisitId(session);*
>>
>> If the ip is not filtered before the statements above, the HaProxy that
>> requests for its health check page with the track-visit and track-serverhit
>> set to "false", will end up to store Visit/Visitor records anyway.
>>
>> I think that setting those two parameters on the check page should be used
>> along with our solution, in order to achieve what we are looking for.
>>
>>
>> @Arun Yes, your summary is accurate.
>>
>> Thank you and Regards,
>> Giulio
>>
>>
>>
>>
>> Il giorno ven 12 apr 2019 alle ore 09:19 Nicolas Malin <
>> nicolas.malin@nereide.fr> ha scritto:
>>
>>> Hello,
>>>
>>> To manage own load balancer we use a dedicate uri like this :
>>>
>>>      <request-map uri="technicalValidation" track-serverhit="false"
>>> track-visit="false">
>>>
>>> It would be help to redirect your monitoring traffic.
>>>
>>> Nicolas
>>>
>>> On 12/04/2019 08:49, Arun Patidar wrote:
>>>> Hello Giulio,
>>>>
>>>> Thanks for the the detailed and clear message. My understanding with
>>> your
>>>> proposal is as below:
>>>>
>>>> 1) We should enable configuration settings to ignore visit entries for
>>>> Internal IPs and known requests (like HaProxy/load balancer, monitoring
>>>> requests) etc.
>>>> 2) For large DB size due to visits and hits, we can use a separate Stats
>>>> database for visit entity group.
>>>> 3) Also, idea to purge old visits using a scheduled job is good, We can
>>> set
>>>> number of days configurable as per need.
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Arun Patidar
>>>> www.hotwax.co
>>>>
>>>>
>>>>
>>>> On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl <
>>>> giulio.speri@mpstyle.it> wrote:
>>>>
>>>>> Hello devs,
>>>>>
>>>>> I'm writing because I would like to explain a problem my company,
>>> MpStyle,
>>>>> faced with an OFBiz installation with two active ecommerce sites, for
>>> one
>>>>> of our customers.
>>>>> I am writing this email to the dev mailing list, because I could not
>>> find
>>>>> any reference in the mailings to the kind of problem we faced, and I
>>> think
>>>>> that the solution we built, could be an improvement to the OFBiz
>>>>> visit/visitor tracking capabilites.
>>>>>
>>>>> I shortly explain the server architecture on which OFBiz is running:
>>> hosted
>>>>> by a third party supplier, there are two (virtual) machines where
>>> Apache
>>>>> OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
>>>>> fronts).
>>>>> On other two different machines there are the database (MariaDB) and
>>>>> HaProxy has a load balancer.
>>>>> HaProxy is configured to perform its Health Checks on the backend
>>> servers
>>>>> with a Http GET on the Home Page of one of the two sites.
>>>>> Visit and Visitor tracking are enabled, for BI and analytics purposes,
>>> so
>>>>> we cannot turn them off.
>>>>> These two combined things caused the Visit and Visitor tables to
>>> explode in
>>>>> dimensions (we counted about 19M records of Visit and about 67M of
>>>>> Visitors, with the 86% of those caused by the load balancer), since
>>> each
>>>>> hit of the HaProxy store a Visit and a Visitor record on the db (plus
>>> some
>>>>> other record of other entities, like ShoppingList, due to <firstvisit>
>>> and
>>>>> <preprocessor> events).
>>>>> A bad side effect of this situation, on the long run, is an overall
>>>>> performance degradation, and an increase in webfront unavailability
>>> time
>>>>> windows during the day: it's not necessary to say that our customer
>>> was not
>>>>> so happy about this.
>>>>>
>>>>> The difficult part of figuring out this problem, was that we did not
>>> have
>>>>> direct access to HaProxy and DB machines, to check logs.
>>>>>
>>>>> The solution we thought and implemented, was to exclude from
>>> Visit/Visitor
>>>>> tracking specific IP addresses (for our case we were interested in
>>> HaProxy
>>>>> IP).
>>>>> The Visit and Visitor records (along with firstvisit and preprocessor
>>>>> events) are created mainly in the ControlServlet class, using
>>> VisitHandler
>>>>> getVisit/getVisitor/getVisitId methods.
>>>>>
>>>>> Our idea consist in reading from a .properties file one or more IP
>>>>> addresses we would like to exclude from tracking and then check them
>>>>> against the client ip address the request is coming from.
>>>>> If the client ip address is in the "exclusion list", then do not
>>> persist
>>>>> visit/visitor and do not run firstvisit events neither for it.
>>>>>
>>>>> The idea is quite simple, but we noticed in few days, a meaningful
>>>>> improvement in overall system performance and stability/availability.
>>>>>
>>>>> This kind of exclusion could be also useful in case we do not want to
>>> track
>>>>> or register internal IP addresses (ie: mainly used for testing).
>>>>>
>>>>> However this solution, should be integrated with a service (cron or
>>>>> scheduled in ofbiz) that keeps the number of records in the tables
>>> limited
>>>>> (for example keep only the last month of visit/visitor); I think that
>>> these
>>>>> two solutions together, could do the job well.
>>>>>
>>>>> I hope my explanation was clear enough and I would be happy to know
>>> what do
>>>>> you think about this.
>>>>>
>>>>> Thank you all for the attention!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Giulio
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Giulio Speri
>>>>>
>>>>>
>>>>> *Mp Styl**e Srl*
>>>>> via Antonio Meucci, 37
>>>>> 41019 Limidi di Soliera (MO)
>>>>> T 059/684916
>>>>> M 334/3779851
>>>>>
>>>>> www.mpstyle.it
>>>>>
>>
>> --
>> Giulio Speri
>>
>>
>> *Mp Styl**e Srl*
>> via Antonio Meucci, 37
>> 41019 Limidi di Soliera (MO)
>> T 059/684916
>> M 334/3779851
>>
>> www.mpstyle.it
>>
>>
>>

Re: Visit/Visitor specific client IPs tracking exclusion

Posted by Giulio Speri - MpStyle Srl <gi...@mpstyle.it>.
Hello everyone,

I have created a Jira task for this.

OFBIZ-10957 <https://issues.apache.org/jira/browse/OFBIZ-10957>


Giulio

Il giorno sab 13 apr 2019 alle ore 16:30 Giulio Speri - MpStyle Srl <
giulio.speri@mpstyle.it> ha scritto:

> Hi Nicolas,
>
> thank you for the suggestion.
> For HaProxy in fact, we prepared a simple request "hacheck" used
> specifically for Health Checks; I tried to set the attributes "track-visit"
> and "track-serverhit" to "false" on the request-map and made some test
> (with the help JMeter) but they affect only "firstvisit" events and the
> tracking of the server hit.
> The problem is that records for Visit and Visitor are created in the
> *ControlServlet#doGet*, before the *RequestHandler#trackVisit* and
> *RequestHandler#trackStats* are executed.
>
> In particular, the trackVisit method, evaluated to check if
> *FIRST_VISIT_EVENTS* should be run, is executed in the
> *RequestHandler#doRequest* method, called in the ControlServlet after the
> setup of some initial request attributes and after these two statements:
>
> *VisitHandler.getVisitor(request, response);*
>
> and
>
> *String visitId = VisitHandler.getVisitId(session);*
>
> If the ip is not filtered before the statements above, the HaProxy that
> requests for its health check page with the track-visit and track-serverhit
> set to "false", will end up to store Visit/Visitor records anyway.
>
> I think that setting those two parameters on the check page should be used
> along with our solution, in order to achieve what we are looking for.
>
>
> @Arun Yes, your summary is accurate.
>
> Thank you and Regards,
> Giulio
>
>
>
>
> Il giorno ven 12 apr 2019 alle ore 09:19 Nicolas Malin <
> nicolas.malin@nereide.fr> ha scritto:
>
>> Hello,
>>
>> To manage own load balancer we use a dedicate uri like this :
>>
>>     <request-map uri="technicalValidation" track-serverhit="false"
>> track-visit="false">
>>
>> It would be help to redirect your monitoring traffic.
>>
>> Nicolas
>>
>> On 12/04/2019 08:49, Arun Patidar wrote:
>> > Hello Giulio,
>> >
>> > Thanks for the the detailed and clear message. My understanding with
>> your
>> > proposal is as below:
>> >
>> > 1) We should enable configuration settings to ignore visit entries for
>> > Internal IPs and known requests (like HaProxy/load balancer, monitoring
>> > requests) etc.
>> > 2) For large DB size due to visits and hits, we can use a separate Stats
>> > database for visit entity group.
>> > 3) Also, idea to purge old visits using a scheduled job is good, We can
>> set
>> > number of days configurable as per need.
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Arun Patidar
>> > www.hotwax.co
>> >
>> >
>> >
>> > On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl <
>> > giulio.speri@mpstyle.it> wrote:
>> >
>> >> Hello devs,
>> >>
>> >> I'm writing because I would like to explain a problem my company,
>> MpStyle,
>> >> faced with an OFBiz installation with two active ecommerce sites, for
>> one
>> >> of our customers.
>> >> I am writing this email to the dev mailing list, because I could not
>> find
>> >> any reference in the mailings to the kind of problem we faced, and I
>> think
>> >> that the solution we built, could be an improvement to the OFBiz
>> >> visit/visitor tracking capabilites.
>> >>
>> >> I shortly explain the server architecture on which OFBiz is running:
>> hosted
>> >> by a third party supplier, there are two (virtual) machines where
>> Apache
>> >> OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
>> >> fronts).
>> >> On other two different machines there are the database (MariaDB) and
>> >> HaProxy has a load balancer.
>> >> HaProxy is configured to perform its Health Checks on the backend
>> servers
>> >> with a Http GET on the Home Page of one of the two sites.
>> >> Visit and Visitor tracking are enabled, for BI and analytics purposes,
>> so
>> >> we cannot turn them off.
>> >> These two combined things caused the Visit and Visitor tables to
>> explode in
>> >> dimensions (we counted about 19M records of Visit and about 67M of
>> >> Visitors, with the 86% of those caused by the load balancer), since
>> each
>> >> hit of the HaProxy store a Visit and a Visitor record on the db (plus
>> some
>> >> other record of other entities, like ShoppingList, due to <firstvisit>
>> and
>> >> <preprocessor> events).
>> >> A bad side effect of this situation, on the long run, is an overall
>> >> performance degradation, and an increase in webfront unavailability
>> time
>> >> windows during the day: it's not necessary to say that our customer
>> was not
>> >> so happy about this.
>> >>
>> >> The difficult part of figuring out this problem, was that we did not
>> have
>> >> direct access to HaProxy and DB machines, to check logs.
>> >>
>> >> The solution we thought and implemented, was to exclude from
>> Visit/Visitor
>> >> tracking specific IP addresses (for our case we were interested in
>> HaProxy
>> >> IP).
>> >> The Visit and Visitor records (along with firstvisit and preprocessor
>> >> events) are created mainly in the ControlServlet class, using
>> VisitHandler
>> >> getVisit/getVisitor/getVisitId methods.
>> >>
>> >> Our idea consist in reading from a .properties file one or more IP
>> >> addresses we would like to exclude from tracking and then check them
>> >> against the client ip address the request is coming from.
>> >> If the client ip address is in the "exclusion list", then do not
>> persist
>> >> visit/visitor and do not run firstvisit events neither for it.
>> >>
>> >> The idea is quite simple, but we noticed in few days, a meaningful
>> >> improvement in overall system performance and stability/availability.
>> >>
>> >> This kind of exclusion could be also useful in case we do not want to
>> track
>> >> or register internal IP addresses (ie: mainly used for testing).
>> >>
>> >> However this solution, should be integrated with a service (cron or
>> >> scheduled in ofbiz) that keeps the number of records in the tables
>> limited
>> >> (for example keep only the last month of visit/visitor); I think that
>> these
>> >> two solutions together, could do the job well.
>> >>
>> >> I hope my explanation was clear enough and I would be happy to know
>> what do
>> >> you think about this.
>> >>
>> >> Thank you all for the attention!
>> >>
>> >> Regards,
>> >>
>> >> Giulio
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Giulio Speri
>> >>
>> >>
>> >> *Mp Styl**e Srl*
>> >> via Antonio Meucci, 37
>> >> 41019 Limidi di Soliera (MO)
>> >> T 059/684916
>> >> M 334/3779851
>> >>
>> >> www.mpstyle.it
>> >>
>>
>
>
> --
> Giulio Speri
>
>
> *Mp Styl**e Srl*
> via Antonio Meucci, 37
> 41019 Limidi di Soliera (MO)
> T 059/684916
> M 334/3779851
>
> www.mpstyle.it
>
>
>

-- 
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 334/3779851

www.mpstyle.it

Re: Visit/Visitor specific client IPs tracking exclusion

Posted by Giulio Speri - MpStyle Srl <gi...@mpstyle.it>.
Hi Nicolas,

thank you for the suggestion.
For HaProxy in fact, we prepared a simple request "hacheck" used
specifically for Health Checks; I tried to set the attributes "track-visit"
and "track-serverhit" to "false" on the request-map and made some test
(with the help JMeter) but they affect only "firstvisit" events and the
tracking of the server hit.
The problem is that records for Visit and Visitor are created in the
*ControlServlet#doGet*, before the *RequestHandler#trackVisit* and
*RequestHandler#trackStats* are executed.

In particular, the trackVisit method, evaluated to check if
*FIRST_VISIT_EVENTS* should be run, is executed in the
*RequestHandler#doRequest* method, called in the ControlServlet after the
setup of some initial request attributes and after these two statements:

*VisitHandler.getVisitor(request, response);*

and

*String visitId = VisitHandler.getVisitId(session);*

If the ip is not filtered before the statements above, the HaProxy that
requests for its health check page with the track-visit and track-serverhit
set to "false", will end up to store Visit/Visitor records anyway.

I think that setting those two parameters on the check page should be used
along with our solution, in order to achieve what we are looking for.


@Arun Yes, your summary is accurate.

Thank you and Regards,
Giulio




Il giorno ven 12 apr 2019 alle ore 09:19 Nicolas Malin <
nicolas.malin@nereide.fr> ha scritto:

> Hello,
>
> To manage own load balancer we use a dedicate uri like this :
>
>     <request-map uri="technicalValidation" track-serverhit="false"
> track-visit="false">
>
> It would be help to redirect your monitoring traffic.
>
> Nicolas
>
> On 12/04/2019 08:49, Arun Patidar wrote:
> > Hello Giulio,
> >
> > Thanks for the the detailed and clear message. My understanding with your
> > proposal is as below:
> >
> > 1) We should enable configuration settings to ignore visit entries for
> > Internal IPs and known requests (like HaProxy/load balancer, monitoring
> > requests) etc.
> > 2) For large DB size due to visits and hits, we can use a separate Stats
> > database for visit entity group.
> > 3) Also, idea to purge old visits using a scheduled job is good, We can
> set
> > number of days configurable as per need.
> >
> >
> >
> > --
> > Best Regards,
> > Arun Patidar
> > www.hotwax.co
> >
> >
> >
> > On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl <
> > giulio.speri@mpstyle.it> wrote:
> >
> >> Hello devs,
> >>
> >> I'm writing because I would like to explain a problem my company,
> MpStyle,
> >> faced with an OFBiz installation with two active ecommerce sites, for
> one
> >> of our customers.
> >> I am writing this email to the dev mailing list, because I could not
> find
> >> any reference in the mailings to the kind of problem we faced, and I
> think
> >> that the solution we built, could be an improvement to the OFBiz
> >> visit/visitor tracking capabilites.
> >>
> >> I shortly explain the server architecture on which OFBiz is running:
> hosted
> >> by a third party supplier, there are two (virtual) machines where Apache
> >> OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
> >> fronts).
> >> On other two different machines there are the database (MariaDB) and
> >> HaProxy has a load balancer.
> >> HaProxy is configured to perform its Health Checks on the backend
> servers
> >> with a Http GET on the Home Page of one of the two sites.
> >> Visit and Visitor tracking are enabled, for BI and analytics purposes,
> so
> >> we cannot turn them off.
> >> These two combined things caused the Visit and Visitor tables to
> explode in
> >> dimensions (we counted about 19M records of Visit and about 67M of
> >> Visitors, with the 86% of those caused by the load balancer), since each
> >> hit of the HaProxy store a Visit and a Visitor record on the db (plus
> some
> >> other record of other entities, like ShoppingList, due to <firstvisit>
> and
> >> <preprocessor> events).
> >> A bad side effect of this situation, on the long run, is an overall
> >> performance degradation, and an increase in webfront unavailability time
> >> windows during the day: it's not necessary to say that our customer was
> not
> >> so happy about this.
> >>
> >> The difficult part of figuring out this problem, was that we did not
> have
> >> direct access to HaProxy and DB machines, to check logs.
> >>
> >> The solution we thought and implemented, was to exclude from
> Visit/Visitor
> >> tracking specific IP addresses (for our case we were interested in
> HaProxy
> >> IP).
> >> The Visit and Visitor records (along with firstvisit and preprocessor
> >> events) are created mainly in the ControlServlet class, using
> VisitHandler
> >> getVisit/getVisitor/getVisitId methods.
> >>
> >> Our idea consist in reading from a .properties file one or more IP
> >> addresses we would like to exclude from tracking and then check them
> >> against the client ip address the request is coming from.
> >> If the client ip address is in the "exclusion list", then do not persist
> >> visit/visitor and do not run firstvisit events neither for it.
> >>
> >> The idea is quite simple, but we noticed in few days, a meaningful
> >> improvement in overall system performance and stability/availability.
> >>
> >> This kind of exclusion could be also useful in case we do not want to
> track
> >> or register internal IP addresses (ie: mainly used for testing).
> >>
> >> However this solution, should be integrated with a service (cron or
> >> scheduled in ofbiz) that keeps the number of records in the tables
> limited
> >> (for example keep only the last month of visit/visitor); I think that
> these
> >> two solutions together, could do the job well.
> >>
> >> I hope my explanation was clear enough and I would be happy to know
> what do
> >> you think about this.
> >>
> >> Thank you all for the attention!
> >>
> >> Regards,
> >>
> >> Giulio
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Giulio Speri
> >>
> >>
> >> *Mp Styl**e Srl*
> >> via Antonio Meucci, 37
> >> 41019 Limidi di Soliera (MO)
> >> T 059/684916
> >> M 334/3779851
> >>
> >> www.mpstyle.it
> >>
>


-- 
Giulio Speri


*Mp Styl**e Srl*
via Antonio Meucci, 37
41019 Limidi di Soliera (MO)
T 059/684916
M 334/3779851

www.mpstyle.it

Re: Visit/Visitor specific client IPs tracking exclusion

Posted by Nicolas Malin <ni...@nereide.fr>.
Hello,

To manage own load balancer we use a dedicate uri like this :

    <request-map uri="technicalValidation" track-serverhit="false" 
track-visit="false">

It would be help to redirect your monitoring traffic.

Nicolas

On 12/04/2019 08:49, Arun Patidar wrote:
> Hello Giulio,
>
> Thanks for the the detailed and clear message. My understanding with your
> proposal is as below:
>
> 1) We should enable configuration settings to ignore visit entries for
> Internal IPs and known requests (like HaProxy/load balancer, monitoring
> requests) etc.
> 2) For large DB size due to visits and hits, we can use a separate Stats
> database for visit entity group.
> 3) Also, idea to purge old visits using a scheduled job is good, We can set
> number of days configurable as per need.
>
>
>
> --
> Best Regards,
> Arun Patidar
> www.hotwax.co
>
>
>
> On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl <
> giulio.speri@mpstyle.it> wrote:
>
>> Hello devs,
>>
>> I'm writing because I would like to explain a problem my company, MpStyle,
>> faced with an OFBiz installation with two active ecommerce sites, for one
>> of our customers.
>> I am writing this email to the dev mailing list, because I could not find
>> any reference in the mailings to the kind of problem we faced, and I think
>> that the solution we built, could be an improvement to the OFBiz
>> visit/visitor tracking capabilites.
>>
>> I shortly explain the server architecture on which OFBiz is running: hosted
>> by a third party supplier, there are two (virtual) machines where Apache
>> OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
>> fronts).
>> On other two different machines there are the database (MariaDB) and
>> HaProxy has a load balancer.
>> HaProxy is configured to perform its Health Checks on the backend servers
>> with a Http GET on the Home Page of one of the two sites.
>> Visit and Visitor tracking are enabled, for BI and analytics purposes, so
>> we cannot turn them off.
>> These two combined things caused the Visit and Visitor tables to explode in
>> dimensions (we counted about 19M records of Visit and about 67M of
>> Visitors, with the 86% of those caused by the load balancer), since each
>> hit of the HaProxy store a Visit and a Visitor record on the db (plus some
>> other record of other entities, like ShoppingList, due to <firstvisit> and
>> <preprocessor> events).
>> A bad side effect of this situation, on the long run, is an overall
>> performance degradation, and an increase in webfront unavailability time
>> windows during the day: it's not necessary to say that our customer was not
>> so happy about this.
>>
>> The difficult part of figuring out this problem, was that we did not have
>> direct access to HaProxy and DB machines, to check logs.
>>
>> The solution we thought and implemented, was to exclude from Visit/Visitor
>> tracking specific IP addresses (for our case we were interested in HaProxy
>> IP).
>> The Visit and Visitor records (along with firstvisit and preprocessor
>> events) are created mainly in the ControlServlet class, using VisitHandler
>> getVisit/getVisitor/getVisitId methods.
>>
>> Our idea consist in reading from a .properties file one or more IP
>> addresses we would like to exclude from tracking and then check them
>> against the client ip address the request is coming from.
>> If the client ip address is in the "exclusion list", then do not persist
>> visit/visitor and do not run firstvisit events neither for it.
>>
>> The idea is quite simple, but we noticed in few days, a meaningful
>> improvement in overall system performance and stability/availability.
>>
>> This kind of exclusion could be also useful in case we do not want to track
>> or register internal IP addresses (ie: mainly used for testing).
>>
>> However this solution, should be integrated with a service (cron or
>> scheduled in ofbiz) that keeps the number of records in the tables limited
>> (for example keep only the last month of visit/visitor); I think that these
>> two solutions together, could do the job well.
>>
>> I hope my explanation was clear enough and I would be happy to know what do
>> you think about this.
>>
>> Thank you all for the attention!
>>
>> Regards,
>>
>> Giulio
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Giulio Speri
>>
>>
>> *Mp Styl**e Srl*
>> via Antonio Meucci, 37
>> 41019 Limidi di Soliera (MO)
>> T 059/684916
>> M 334/3779851
>>
>> www.mpstyle.it
>>

Re: Visit/Visitor specific client IPs tracking exclusion

Posted by Arun Patidar <ar...@hotwax.co>.
Hello Giulio,

Thanks for the the detailed and clear message. My understanding with your
proposal is as below:

1) We should enable configuration settings to ignore visit entries for
Internal IPs and known requests (like HaProxy/load balancer, monitoring
requests) etc.
2) For large DB size due to visits and hits, we can use a separate Stats
database for visit entity group.
3) Also, idea to purge old visits using a scheduled job is good, We can set
number of days configurable as per need.



--
Best Regards,
Arun Patidar
www.hotwax.co



On Fri, Apr 12, 2019 at 5:17 AM Giulio Speri - MpStyle Srl <
giulio.speri@mpstyle.it> wrote:

> Hello devs,
>
> I'm writing because I would like to explain a problem my company, MpStyle,
> faced with an OFBiz installation with two active ecommerce sites, for one
> of our customers.
> I am writing this email to the dev mailing list, because I could not find
> any reference in the mailings to the kind of problem we faced, and I think
> that the solution we built, could be an improvement to the OFBiz
> visit/visitor tracking capabilites.
>
> I shortly explain the server architecture on which OFBiz is running: hosted
> by a third party supplier, there are two (virtual) machines where Apache
> OFBiz 13.07.03 is running behind Apache2 web server (so we have two web
> fronts).
> On other two different machines there are the database (MariaDB) and
> HaProxy has a load balancer.
> HaProxy is configured to perform its Health Checks on the backend servers
> with a Http GET on the Home Page of one of the two sites.
> Visit and Visitor tracking are enabled, for BI and analytics purposes, so
> we cannot turn them off.
> These two combined things caused the Visit and Visitor tables to explode in
> dimensions (we counted about 19M records of Visit and about 67M of
> Visitors, with the 86% of those caused by the load balancer), since each
> hit of the HaProxy store a Visit and a Visitor record on the db (plus some
> other record of other entities, like ShoppingList, due to <firstvisit> and
> <preprocessor> events).
> A bad side effect of this situation, on the long run, is an overall
> performance degradation, and an increase in webfront unavailability time
> windows during the day: it's not necessary to say that our customer was not
> so happy about this.
>
> The difficult part of figuring out this problem, was that we did not have
> direct access to HaProxy and DB machines, to check logs.
>
> The solution we thought and implemented, was to exclude from Visit/Visitor
> tracking specific IP addresses (for our case we were interested in HaProxy
> IP).
> The Visit and Visitor records (along with firstvisit and preprocessor
> events) are created mainly in the ControlServlet class, using VisitHandler
> getVisit/getVisitor/getVisitId methods.
>
> Our idea consist in reading from a .properties file one or more IP
> addresses we would like to exclude from tracking and then check them
> against the client ip address the request is coming from.
> If the client ip address is in the "exclusion list", then do not persist
> visit/visitor and do not run firstvisit events neither for it.
>
> The idea is quite simple, but we noticed in few days, a meaningful
> improvement in overall system performance and stability/availability.
>
> This kind of exclusion could be also useful in case we do not want to track
> or register internal IP addresses (ie: mainly used for testing).
>
> However this solution, should be integrated with a service (cron or
> scheduled in ofbiz) that keeps the number of records in the tables limited
> (for example keep only the last month of visit/visitor); I think that these
> two solutions together, could do the job well.
>
> I hope my explanation was clear enough and I would be happy to know what do
> you think about this.
>
> Thank you all for the attention!
>
> Regards,
>
> Giulio
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Giulio Speri
>
>
> *Mp Styl**e Srl*
> via Antonio Meucci, 37
> 41019 Limidi di Soliera (MO)
> T 059/684916
> M 334/3779851
>
> www.mpstyle.it
>