You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/20 18:35:00 UTC

[jira] [Commented] (METRON-1364) Add an implementation of Robust PCA outlier detection

    [ https://issues.apache.org/jira/browse/METRON-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406842#comment-16406842 ] 

ASF GitHub Bot commented on METRON-1364:
----------------------------------------

Github user JonZeolla commented on the issue:

    https://github.com/apache/metron/pull/870
  
    Is this still alive?


> Add an implementation of Robust PCA outlier detection
> -----------------------------------------------------
>
>                 Key: METRON-1364
>                 URL: https://issues.apache.org/jira/browse/METRON-1364
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>            Priority: Major
>
> With short circuiting in Stellar, we have the opportunity to delve into more computationally intensive outlier detection techniques.  Generally these would be executed only if simpler outlier detection techniques indicated an outlier (e.g. statistical outlier tests).
> As the first one of these supported, I'd suggest a Robust PCA based technique similar to Netflix's Surus.  See https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6b0494371cc and https://metamarkets.com/2012/algorithmic-trendspotting-the-meaning-of-interesting/ for more detail.
> It should be noted that there are some caveats with this approach around sparsity and orderedness.  
> Regarding sparsity,this outlier detection algorithm presumes dense output, which is not the case for data spanning profiles (e.g. the profiler does not write out data every period if no data was seen). To deal with this, I am suggesting a modification to the profiler to allow PROFILE_GET to return a default value.  That will be done in a separate JIRA.
> Regarding well-orderedness, this is an outlier detector for time series data, so it is sensitive to order to a certain extent.  Given its computational intensity, it is likely to be used with a sample of the data to shrink the size of the data.  To that end, uniform sampling is not sensible here, but rather a biased sample for recency.  Without this, you may get poor results from this outlier detector.  This sampler should be done in a separate JIRA, but I will ensure the infrastructure to add it is contributed in METRON-1350.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)