You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bluemarlin.apache.org by GitBox <gi...@apache.org> on 2022/03/03 14:33:49 UTC

[GitHub] [incubator-bluemarlin] jimmylao commented on issue #52: [BLUEMARLIN-29] : For DIN-Lookalike model, training is very slow.

jimmylao commented on issue #52:
URL: https://github.com/apache/incubator-bluemarlin/issues/52#issuecomment-1058102582


   @Bimlesh759-AI Thanks for sharing.
   As shown in the table below, the ratios between positive and negative samples are 1.07 in training data and 2.66 in testing data. These are balanced (good) ratios.
   
   <html xmlns:v="urn:schemas-microsoft-com:vml"
   xmlns:o="urn:schemas-microsoft-com:office:office"
   xmlns:x="urn:schemas-microsoft-com:office:excel"
   xmlns="http://www.w3.org/TR/REC-html40">
   
   <head>
   
   <meta name=ProgId content=Excel.Sheet>
   <meta name=Generator content="Microsoft Excel 15">
   <link id=Main-File rel=Main-File
   href="file:///C:/Users/zlao/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
   <link rel=File-List
   href="file:///C:/Users/zlao/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
   <!--table
   	{mso-displayed-decimal-separator:"\.";
   	mso-displayed-thousand-separator:"\,";}
   @page
   	{margin:.75in .7in .75in .7in;
   	mso-header-margin:.3in;
   	mso-footer-margin:.3in;}
   tr
   	{mso-height-source:auto;}
   col
   	{mso-width-source:auto;}
   br
   	{mso-data-placement:same-cell;}
   td
   	{padding-top:1px;
   	padding-right:1px;
   	padding-left:1px;
   	mso-ignore:padding;
   	color:black;
   	font-size:11.0pt;
   	font-weight:400;
   	font-style:normal;
   	text-decoration:none;
   	font-family:Calibri, sans-serif;
   	mso-font-charset:0;
   	mso-number-format:General;
   	text-align:general;
   	vertical-align:bottom;
   	border:none;
   	mso-background-source:auto;
   	mso-pattern:auto;
   	mso-protection:locked visible;
   	white-space:nowrap;
   	mso-rotate:0;}
   .xl65
   	{text-align:center;}
   .xl66
   	{mso-number-format:Percent;}
   .xl67
   	{color:#24292F;
   	font-family:"Segoe UI", sans-serif;
   	mso-font-charset:0;
   	mso-number-format:"\#\,\#\#0";
   	text-align:center;}
   -->
   </head>
   
   <body link="#0563C1" vlink="#954F72">
   
     | positive # | negative # | pos vs neg ratio | %
   -- | -- | -- | -- | --
   train | 217,949,105 | 233,029,477 | 1.069192172 | 87.16%
   test | 18,131,220 | 48,307,204 | 2.66431073 | 12.84%
   
   
   
   </body>
   
   </html>
   
   
   One approach you may want to try is: keep all testing samples and randomly drop 50% of training samples, then train a model to see if there's significant drop in AUC for testing samples. (Since the ratio of testing samples is 13%-, it has much less contribution to training time)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@bluemarlin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org