You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by Donald Szeto <do...@apache.org> on 2017/11/21 19:30:20 UTC

Re: Hardware Configuration for Binary Classification using PIO

Hi Sachin,

1. I would highly encourage you to adopt the template, and upgrade and
maintain it to track future PIO releases if that's something you like to
do. Otherwise, you may want to consider following http://predictionio.
apache.org/templates/classification/quickstart/ and see if your use case
fits into it. Being an official template means it will track the main
PredictionIO release.

2. You should definitely have a dedicated Spark cluster if your input data
size is going to be much larger. Start with machines that have 1:2 to 1:4
core-to-GB of memory ratio, and scales out the cluster as needed to meet
your training time requirement.

Regards,
Donald

On Thu, Oct 26, 2017 at 1:00 AM, Sachin Kamkar <sa...@gmail.com>
wrote:

> Hi Team,
>
> Firstly, If I am posting to a wrong a group please direct me to the right
> forum or mailing list. Thanks in advance.
>
> Problem: Binary Classification
> Number of Features: 10K - 20K
> Number of documents to be trained: 1 Million
> Model: https://github.com/EmergentOrder/template-scala-probabilisti
> c-classifier-batch-lbfgs
> Recommended PIO version: 0.9.2
>
> I am new to Prediction IO and I have done small predictions with ~100
> features and 10k training set and I was able to run that using a 2 Core
> 16GB RAM server.
>
> Now that my actual dataset is very huge, I don't know where to even start
> in terms of configuration.
>
> I need 3 suggestions
>
>    - For my problem, have I chosen the correct model? As this model only
>    runs on 0.9.2 and with 0.12 being the latest, am I spending energy on the
>    wrong model?
>       - Should I consider changing the code to be compatible 0.12?
>    - What is the hardware that I should choose?
>       - Should I have a dedicated Spark Cluster? If yes, with what config
>       should I start off with?
>       - How much memory should I set for the driver and executor?
>    - How much time can I expect this training to take?
>
>
> With Regards,
>
>      Sachin
> ⚜KTBFFH⚜
>