You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Raghuveer <al...@yahoo.com.INVALID> on 2015/05/22 13:42:34 UTC

Analysing NDT data for POC

I am doing a POC and have a dataset of the format ( client_ip, timestamp, bytes_transferred ) and trying to do the usecase "Predict bytes_transferred for a particular client for a give timestamp". I got as dataset for example a client_ip 1.1.1.1 which has downloaded bytes 234 for timestamp 1432292516696. Similarly lets say i have datasets for 22nd morning, 23rd evening and 24th afternoon. So now we need to apply the usecase here. Therefore i pass this individual bytes_transferred sets to classification to categorize into low_download (<2500), medium_download (>2500 and <5000) and high_download (>5000).

How can i pass this dataset to classification algorithm like cBayes to categorize the client_ips based on timestamp.
Since the data file has 3 columns should i pass the file as is to sequence file conversion and then to vector or any pre-processing is required? since this is a time series data is there any specific algorithms that can do the job?

I need your help, kindly suggest.

thanks