You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@eagle.apache.org by Daniel Zhou <Da...@dataguise.com> on 2016/02/06 00:46:32 UTC

Policy based on sensitive types stops working if there are too many sensitive items

Hi all,

Anyone test eagle's performance in this situation?

1.       large amount (Eg: 2000) sensitive items are stored in Hbase table "fileSensitivity"

2.       create policies  based on sensitive Types.


I am asking this question  because it seems that  policy based on sensitive types stops working if there are too many sensitive items (1700+).

Here is what I did:
At first I created about 20 HDFS policies based on 20 sensitive types, such as "creditCard", "PhoneNumber", etc,   and in table "fileSensitivity" there were  10 sensitive entries, alerts were triggered when I did hdfs operation on these sensitive items.
Then I inject 1700 sensitive items into table "fileSensitivity"  by calling Eagle's API, after that I operated on sensitive items through Hadoop terminal, alerts CAN NOT be triggered.
Notice that at that time, policies based on attributes such as "src", "dest" still work.

To fix that,
I delete table "fileSensitivity" and create a new one with same name,  this time I only inject 5 items into the table. Then 20 HDFS policies start to work again.

So I'm wondering is it a performance issue?

My  cluster contains two machines, both are:
Centos6,  4 cores, 15.58 GB RAM,  50G Disk (20% used), HDP-2.2.9.0-3393<http://192.168.6.131:8080/>

Regards,
Daniel



Re: Policy based on sensitive types stops working if there are too many sensitive items

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.
If policy based on fields ³src²/³dst² are working, then there is no reason
that sensitivity based policy won¹t work.
You have 2 machines and please find out the log for policy distribution in
the 2 machines (if it has multiple workers, we need distributions for all
workers too)

sb.append("policyDistributionStats for " + policyGroupId +", total: " +
policyIds.size() + ", [³);

We first need look at how policies are distributed and figure out policy
cost after that

Thanks
Edward



On 2/5/16, 16:13, "Daniel Zhou" <Da...@dataguise.com> wrote:

>Here is an example:
>Query:
>from 
>hdfsAuditLogEventStream[(str:regexp(sensitivityType,'.*Social_Security.*')
>==true)] select * insert into outputStream;
>
>-----Original Message-----
>From: Zhang, Edward (GDI Hadoop) [mailto:yonzhang@ebay.com]
>Sent: Friday, February 05, 2016 4:11 PM
>To: dev@eagle.incubator.apache.org
>Subject: Re: Policy based on sensitive types stops working if there are
>too many sensitive items
>
>can you please show one policy? I thought it would be because policy
>itself is too complicated for engine to parse and evaluate
>
>Thanks
>Edward
>
>On 2/5/16, 15:46, "Daniel Zhou" <Da...@dataguise.com> wrote:
>
>>Hi all,
>>
>>Anyone test eagle's performance in this situation?
>>
>>1.       large amount (Eg: 2000) sensitive items are stored in Hbase
>>table "fileSensitivity"
>>
>>2.       create policies  based on sensitive Types.
>>
>>
>>I am asking this question  because it seems that  policy based on
>>sensitive types stops working if there are too many sensitive items
>>(1700+).
>>
>>Here is what I did:
>>At first I created about 20 HDFS policies based on 20 sensitive types,
>>such as "creditCard", "PhoneNumber", etc,   and in table
>>"fileSensitivity" there were  10 sensitive entries, alerts were
>>triggered when I did hdfs operation on these sensitive items.
>>Then I inject 1700 sensitive items into table "fileSensitivity"  by
>>calling Eagle's API, after that I operated on sensitive items through
>>Hadoop terminal, alerts CAN NOT be triggered.
>>Notice that at that time, policies based on attributes such as "src",
>>"dest" still work.
>>
>>To fix that,
>>I delete table "fileSensitivity" and create a new one with same name,
>>this time I only inject 5 items into the table. Then 20 HDFS policies
>>start to work again.
>>
>>So I'm wondering is it a performance issue?
>>
>>My  cluster contains two machines, both are:
>>Centos6,  4 cores, 15.58 GB RAM,  50G Disk (20% used),
>>HDP-2.2.9.0-3393<http://192.168.6.131:8080/>
>>
>>Regards,
>>Daniel
>>
>>
>


RE: Policy based on sensitive types stops working if there are too many sensitive items

Posted by Daniel Zhou <Da...@dataguise.com>.
Here is an example:
Query:
from hdfsAuditLogEventStream[(str:regexp(sensitivityType,'.*Social_Security.*')==true)] select * insert into outputStream;

-----Original Message-----
From: Zhang, Edward (GDI Hadoop) [mailto:yonzhang@ebay.com] 
Sent: Friday, February 05, 2016 4:11 PM
To: dev@eagle.incubator.apache.org
Subject: Re: Policy based on sensitive types stops working if there are too many sensitive items

can you please show one policy? I thought it would be because policy itself is too complicated for engine to parse and evaluate

Thanks
Edward

On 2/5/16, 15:46, "Daniel Zhou" <Da...@dataguise.com> wrote:

>Hi all,
>
>Anyone test eagle's performance in this situation?
>
>1.       large amount (Eg: 2000) sensitive items are stored in Hbase
>table "fileSensitivity"
>
>2.       create policies  based on sensitive Types.
>
>
>I am asking this question  because it seems that  policy based on 
>sensitive types stops working if there are too many sensitive items 
>(1700+).
>
>Here is what I did:
>At first I created about 20 HDFS policies based on 20 sensitive types,
>such as "creditCard", "PhoneNumber", etc,   and in table
>"fileSensitivity" there were  10 sensitive entries, alerts were 
>triggered when I did hdfs operation on these sensitive items.
>Then I inject 1700 sensitive items into table "fileSensitivity"  by 
>calling Eagle's API, after that I operated on sensitive items through 
>Hadoop terminal, alerts CAN NOT be triggered.
>Notice that at that time, policies based on attributes such as "src", 
>"dest" still work.
>
>To fix that,
>I delete table "fileSensitivity" and create a new one with same name, 
>this time I only inject 5 items into the table. Then 20 HDFS policies 
>start to work again.
>
>So I'm wondering is it a performance issue?
>
>My  cluster contains two machines, both are:
>Centos6,  4 cores, 15.58 GB RAM,  50G Disk (20% used), 
>HDP-2.2.9.0-3393<http://192.168.6.131:8080/>
>
>Regards,
>Daniel
>
>


Re: Policy based on sensitive types stops working if there are too many sensitive items

Posted by "Zhang, Edward (GDI Hadoop)" <yo...@ebay.com>.
can you please show one policy? I thought it would be because policy
itself is too complicated for engine to parse and evaluate

Thanks
Edward

On 2/5/16, 15:46, "Daniel Zhou" <Da...@dataguise.com> wrote:

>Hi all,
>
>Anyone test eagle's performance in this situation?
>
>1.       large amount (Eg: 2000) sensitive items are stored in Hbase
>table "fileSensitivity"
>
>2.       create policies  based on sensitive Types.
>
>
>I am asking this question  because it seems that  policy based on
>sensitive types stops working if there are too many sensitive items
>(1700+).
>
>Here is what I did:
>At first I created about 20 HDFS policies based on 20 sensitive types,
>such as "creditCard", "PhoneNumber", etc,   and in table
>"fileSensitivity" there were  10 sensitive entries, alerts were triggered
>when I did hdfs operation on these sensitive items.
>Then I inject 1700 sensitive items into table "fileSensitivity"  by
>calling Eagle's API, after that I operated on sensitive items through
>Hadoop terminal, alerts CAN NOT be triggered.
>Notice that at that time, policies based on attributes such as "src",
>"dest" still work.
>
>To fix that,
>I delete table "fileSensitivity" and create a new one with same name,
>this time I only inject 5 items into the table. Then 20 HDFS policies
>start to work again.
>
>So I'm wondering is it a performance issue?
>
>My  cluster contains two machines, both are:
>Centos6,  4 cores, 15.58 GB RAM,  50G Disk (20% used),
>HDP-2.2.9.0-3393<http://192.168.6.131:8080/>
>
>Regards,
>Daniel
>
>