You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@flex.apache.org by bilbosax <wa...@comcast.net> on 2016/08/01 03:25:38 UTC

Re: Workers and Speed

Hi Justin.  I have never used the additional compiler arguments dialogue
before.  I profiled my app in Scout as you suggested, and when browsing
through the Session Info, it says that Advanced Telemetry is disabled, so I
don't know if I entered the additional compiler arguments correctly.  This
is what I have in the dialogue:

-locale en_US
-advanced-telemetry=true
-debug=false


Is my syntax wrong?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13115.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Nemi <ne...@gmail.com>.

Using first Array instead of ArrayCollection could perform a lot of faster.
For example filling, manipulating array then use it as source: new
ArrayCollection(array), and you can use arrayCollection.source

To enable telemetry on swf you can use  SWF Scout Enabler
<http://renaun.com/blog/2012/12/enable-advanced-telemetry-on-flex-or-old-swfs-with-swf-scount-enabler/>  

Scout can help you diagnose where can you do more code refactoring to gain
more performance.

Also, it is always good to find some AS3 and Flex performance
tuning/optimizing slides/checklists.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13353.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Well, I wanted to at least try to turn off updates on the arraycollection, to
work with the array inside of the arraycollection, and just see if I could
optimize the main app before I tried dividing it into Workers.  Without the
advanced telemetry, all I can really tell is that it takes 50 minutes to get
through one frame LOL, and that there is about 5 minutes of garbage
collection going on.  I can't see how much of the time is spent on my
conditional IF statements or the actual trigonometry.  Could Scout at least
help me figure that much out?  As I told Justin, I am working on Flash
Builder 4.5 and don't have the ability to turn on Advanced Telemetry with a
check box, so he said to add some things to my additional compiler
arguements.  Is my syntax right on this because Scout is tell me that
Advanced Telemetry is disabled?

-locale en_US 
-advanced-telemetry=true 
-debug=false 



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13118.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Gary Yang <fl...@gmail.com>.

I would try Vector directly if possible, I think ArrayCollecton or
ArrayList are designed for driving UI List/Grid.

On Mon, Aug 1, 2016 at 10:13 PM, bilbosax <wa...@comcast.net> wrote:

> So, I finally got Scout to work with my program which turned out to be a
> pain(Scout would time out, saying it was out of memory during the hour that
> it took for the program to complete the calculations)
>
> The results are both interesting, and a little confusing. The main loop
> took
> 2560 seconds.  ObjectProxy.getProperty (mx utils) took 787 seconds.
> ListCollectionView.getProperty (mx.collections) took 713 seconds.  Garbage
> Collection took 184 seconds. ListCollectionView.getlength took 29
> seconds(this is easily fixed).  And the trig method I wrote only took 14
> seconds.
>
> This clearly shows that the math is not the bottleneck, getting information
> out of the arraycollection and objectproxy objects is the slowest process.
> I am addressing the arraycollection as myAC[1].someProperty in all of my
> comparisons and calculations as was suggested. I don't know if this is
> treating it more as an array or an array collection. I don't know if it
> would all process faster if I initially passed the arraycollection data off
> to a regular array to do all of the processing.  I need the objectproxies
> because my itemrenderers won't bind to my arraycollection otherwise.
>
> So the question now is, should I find an alternative to the ObjectProxies
> and try and optimize working with my arraycollection, or should I simply
> make another worker to chop down this processing time?  Math is not the
> bottleneck, getting the data together to do the math is the slow part.
>
>
>
> --
> View this message in context:
> http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13134.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.


On 8/2/16, 10:57 PM, "Justin Mclean" <ju...@classsoftware.com> wrote:

>Hi,
>
>> Was just wondering. In Scout, if it says that something is taking say
>>150
>> sec, when you drill down into the object/function/event, the parts don't
>> ever add up to 150 seconds. So I thought that some things, like data
>>type
>> conversions, just weren't declared.
>
>If I understand you correctly I think the difference is the time it spent
>in the current function. (Which is displayed as self time in Scout.)
>
>So if function A (150ms)  calls function B (40ms) and function C (50ms)
>then it spent 60ms in in code inside function A.

And also, I believe this is a sampling profiler so there will always be
some inaccuracy in the small numbers.  But which issues are the big issues
will be accurate.

-Alex

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> Was just wondering. In Scout, if it says that something is taking say 150
> sec, when you drill down into the object/function/event, the parts don't
> ever add up to 150 seconds. So I thought that some things, like data type
> conversions, just weren't declared.

If I understand you correctly I think the difference is the time it spent in the current function. (Which is displayed as self time in Scout.)

So if function A (150ms)  calls function B (40ms) and function C (50ms) then it spent 60ms in in code inside function A.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Was just wondering. In Scout, if it says that something is taking say 150
sec, when you drill down into the object/function/event, the parts don't
ever add up to 150 seconds. So I thought that some things, like data type
conversions, just weren't declared.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13160.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/2/16, 7:04 PM, "bilbosax" <wa...@comcast.net> wrote:

>What about changing datatypes, does this eat up a lot of time?  My
>database
>has some values that are typed as string but are actually numbers.  When I
>download them from the database, I assume that they are entered into the
>arraycollection as strings.  In my loops, I have to do math functions on
>them so I force them to a number type as such:
>
>var a:Number = Number(myArray[1].someProp)*10;
>
>When doing this a large number of times in big loops, will it eat up a
>significant amount of time or is type conversion pretty fast?

It would have shown up in the profiler if it was a big deal.  But again,
if you waste 10ms 100 times, you've wasted a full second.  Not that the
type conversion would take 10ms, just that if you have a lot of stuff, and
care about every second, even small stuff can make a difference.

-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Awesome Tip!  Thanks!



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13194.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

Not looked at the code in detail yet. Probably not a big deal but a small simple win is to move Math.PI/180 into a variable and do the calculation once rather than calculating 4 x times around the loop.

> 																lon1 = tempArray[i].longitude*Math.PI/180;
> 																lon2 = tempArray[j].longitude*Math.PI/180;
> 																lat1 = tempArray[i].latitude*Math.PI/180;
> 																lat2 = tempArray[j].latitude*Math.PI/180;

Thanks,
Justin

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

P.S. Couldn't status be just 0 or 1 instead of "Act"? You are doing 38k * 3
letters comparisons, instead of 38k. Also, shouldn't you check the status
of j too, or it doesn't matter?

On Thu, Aug 4, 2016 at 1:43 AM, Javier Guerrero García <ja...@gmail.com>
wrote:

> Another quick one: since you're comparing points and distances and
> sin/cos/tans, does it make sense to compare points i=25 and j=50, and then
> again points i=50 and j=25? Would the calculations yield the same results?
> In that case, you could just halve your computation to (i<j) instead of
> (i!=j), and instead of getting rid of the diagonal of the matrix, get rid
> of the whole triangle below it (or just do something checking propTwelve if
> applies)
>
> Besides Justin recommendation, also a precalculated look-up table for
> sqrt, sin,cos and atan2 might come in handy if you can bear with the error
> tradeoff (they are REALLY expensive functions). Also try a*a directly
> instead of Math.Pow(a,2).
>
> Also, isn't "count" exactly the length of tempCompArray? Why keep track
> of it?
>
> Also, if you write the first three ifs as (if a && b && c), b && c are
> NEVER calculated if a is false, so you can skip 3 lines (and make your
> branch predictor happier :)
>
> Also, since you're referencing array[i] and array[j] a vast number of
> times, maybe is a good idea at the top of the loop:
>
> itemi=tempArray[i]
> itemj=temArray[j]
>
> hence having a direct memory reference instead of a double reference
> (address of tempArray + address of the ith element)
>
> Also, you use a few times "Math.abs(a-b)<=1". Try a oneliner function
> near1(a,b) { c=a+b; return c>=-1&&c<=1}, I think this might be faster (and
> again you get rid of the IF involved in the ABS function).
>
> Well, 1:41AM here, have to sleep :)
>
> On Thu, Aug 4, 2016 at 12:21 AM, bilbosax <wa...@comcast.net> wrote:
>
>> Distances are selected by the user using a dropdownlist in the range of
>> .25
>> to 4 miles.  Distance is the key to the whole thing here and has to be
>> exact, I can't be off by 10ths of a mile, it needs to be within say, 20
>> feet.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13196.html
>> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>>
>
>

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

Another quick one: since you're comparing points and distances and
sin/cos/tans, does it make sense to compare points i=25 and j=50, and then
again points i=50 and j=25? Would the calculations yield the same results?
In that case, you could just halve your computation to (i<j) instead of
(i!=j), and instead of getting rid of the diagonal of the matrix, get rid
of the whole triangle below it (or just do something checking propTwelve if
applies)

Besides Justin recommendation, also a precalculated look-up table for sqrt,
sin,cos and atan2 might come in handy if you can bear with the error
tradeoff (they are REALLY expensive functions). Also try a*a directly
instead of Math.Pow(a,2).

Also, isn't "count" exactly the length of tempCompArray? Why keep track of
it?

Also, if you write the first three ifs as (if a && b && c), b && c are
NEVER calculated if a is false, so you can skip 3 lines (and make your
branch predictor happier :)

Also, since you're referencing array[i] and array[j] a vast number of
times, maybe is a good idea at the top of the loop:

itemi=tempArray[i]
itemj=temArray[j]

hence having a direct memory reference instead of a double reference
(address of tempArray + address of the ith element)

Also, you use a few times "Math.abs(a-b)<=1". Try a oneliner function
near1(a,b) { c=a+b; return c>=-1&&c<=1}, I think this might be faster (and
again you get rid of the IF involved in the ABS function).

Well, 1:41AM here, have to sleep :)

On Thu, Aug 4, 2016 at 12:21 AM, bilbosax <wa...@comcast.net> wrote:

> Distances are selected by the user using a dropdownlist in the range of .25
> to 4 miles.  Distance is the key to the whole thing here and has to be
> exact, I can't be off by 10ths of a mile, it needs to be within say, 20
> feet.
>
>
>
> --
> View this message in context:
> http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13196.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

By the way, your Math.PI/180 suggestion has us down to 2 minutes and 17
seconds, so thanks a lot!



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13197.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> Distances are selected by the user using a dropdownlist in the range of .25
> to 4 miles.

A 4 miles square section of the earth is going to be fairly close to flat. A  pythagorean distance formula is very likely to give errors of < 20 feet over that range. A quick google search suggests 8 inches curvature/mile which is going to be less than real life topological features.

Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Distances are selected by the user using a dropdownlist in the range of .25
to 4 miles.  Distance is the key to the whole thing here and has to be
exact, I can't be off by 10ths of a mile, it needs to be within say, 20
feet.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13196.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

Just out of interest how large is tempDistance (it it’s range of values)? It looks like you working out distances of long/lat and for small values of tempDistance you can use approximations than mean that just about all of the maths goes away.

Of course if the math isn’t taking up a lot of then then there may not be any point to that, so you’ll need to do that first. Once simple way would be to count how many times the math code in the inner most if is calculated from a single run and then profile that code x run times to see if it needs to be optimised.

Thanks,
Justin

Re: Workers and Speed

Posted by Harbs <ha...@gmail.com>.

BTW:

In terms of code styling, if you reverse all your tests, your if statements would not creep off the screen and it becomes much more legible:

				for (var i:int = 0; i < length; i++) {   /*  Loop Through Each of the
Records*/
					if (tempArray[i].status != "Act") { /*  Test to see if current record status is ACTIVE */
						continue;
					}
					for (var j:int = 0; j < length; j++){  /*  Compare to Each of the other Records*/
						if (i == j) {continue;}	/*  Make sure you are not comparing a record to itself */
						if (tempArray[i].propOne != tempArray[j].propOne) { continue; }
						if (tempArray[i].propTwo != tempArray[j].propTwo) { continue; }
						if (tempArray[i].propThree != tempArray[j].propThree) { continue; }
						if (Math.abs(tempArray[i].propFour - tempArray[j].propFour) > 10) { continue; }
						if (tempArray[i].propFive != tempArray[j].propFive) { continue;}
						if (Math.abs(tempArray[i].propSix - tempArray[j].propSix) > 1){ continue;}
						if (Math.abs(tempArray[i].propSeven - tempArray[j].propSeven) > 1) { continue;}
						if (Math.abs(((tempArray[i].propEight - tempArray[j].propEight)/tempArray[i].propEight))<0.10) {
							//do your stuff

On Aug 4, 2016, at 12:35 AM, bilbosax <wa...@comcast.net> wrote:

> I have resisted actually posting the code because it has been suggested to me
> that the app should be looked at for a patent, and this calculation process
> is at the heart of the idea.  So I have reduced the variables down to just
> generic names so as not to give away my idea away.  I really liked your idea
> of combining all of the conditionals to one line using && operators, but in
> the end it was no faster.  Two minutes and forty seconds to complete.  I
> believe this is because, in my code, I have placed the conditionals in an
> order that is most likely to throw out a record quickly if it is not a good
> comparator, whereas your idea forces all of the conditionals to HAVE to be
> considered for EVERY record(at least I think that it does).  Here is a
> simplified version of the code:
> 
> protected function recalculateHandler(event:Event):void
> 			{
> 				processingPU.removeEventListener("popUpOpened", recalculateHandler);
> 				
> 				var count:int = 0;
> 				var average:Number = 0.0;
> 				var someNumberl:Number = 0.0;
> 				var lon1:Number;
> 				var lon2:Number;
> 				var lat1:Number;
> 				var lat2:Number;
> 				var dlon:Number;
> 				var dlat:Number;
> 				var a:Number;
> 				var c:Number;
> 				var d:Number;
> 				var tempDist:Number = (distance.selectedIndex*.25)+.25;
> 				var length:int = mainArrayCollection.length;
> 				var tempArray:Array = new Array();
> 				compArrayCollection = new ArrayCollection();
> 				tempCompArray = new Array();
> 				tempArray = speedArrayCollection.source;
> 				
> 				for (var i:int = 0; i < length; i++) {   /*  Loop Through Each of the
> Records*/
> 					if (tempArray[i].status == "Act") { /*  Test to see if current record
> status is ACTIVE */
> 						for (var j:int = 0; j < length; j++){  /*  Compare to Each of the
> other Records*/
> 							if (i != j) {	/*  Make sure you are not comparing a record to itself
> */
> 								if (tempArray[i].propOne == tempArray[j].propOne) { 
> 									if (tempArray[i].propTwo == tempArray[j].propTwo) {
> 										if (tempArray[i].propThree == tempArray[j].propThree) { 
> 											if (Math.abs(tempArray[i].propFour - tempArray[j].propFour) <=
> 10) { 
> 												if (tempArray[i].propFive == tempArray[j].propFive) { 
> 													if (Math.abs(tempArray[i].propSix - tempArray[j].propSix) <= 1)
> { 
> 														if (Math.abs(tempArray[i].propSeven - tempArray[j].propSeven)
> <= 1) { 
> 															if (Math.abs(((tempArray[i].propEight -
> tempArray[j].propEight)/tempArray[i].propEight))<0.10) {
> 																lon1 = tempArray[i].longitude*Math.PI/180;
> 																lon2 = tempArray[j].longitude*Math.PI/180;
> 																lat1 = tempArray[i].latitude*Math.PI/180;
> 																lat2 = tempArray[j].latitude*Math.PI/180;
> 																dlon = lon2 - lon1;
> 																dlat = lat2 - lat1;
> 																a = Math.pow(Math.sin(dlat/2), 2) +
> (Math.cos(lat1)*Math.cos(lat2)*Math.pow(Math.sin(dlon/2), 2));
> 																c = 2*Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
> 																d = 3961 * c;
> 																
> 																if (d <= tempDistance ) {
> 																	count = count + 1;
> 																	someProp = someProp + Number(tempArray[j].propEleven);
> 																	tempCompArray.push({" push about 26 properties from one
> array to a comparison array, plus some new values});
> 																	tempArray[i].propTwelve = istrue;
> 																}
> 															}
> 														}
> 													}
> 												}
> 											}
> 										}
> 									}
> 								}
> 							}						
> 						}
> 						
> 						if (count != 0) {											/*  Populate data if there is actually
> data to be updated */
> 							average = someProp/Number(count);
> 							tempArray[i].propThirteen = count;
> 							tempArray[i].Prop14 = average;
> 							
> 							if (average == 0.0) {
> 								tempArray[i].propFourteen = 0.0;
> 							} else {
> 								tempArray[i].propFourteen = (Number(tempArray[i].propTen) -
> average)/average *100.0;
> 							}
> 							
> 							tempArray[i].propFifteen = tempArray[i].propFourteen -
> tempArray[i].propFifteen;
> 						}
> 						
> 						count = 0;
> 						average = 0.0;
> 						someNumber = 0.0;
> 					}
> 				}
> 				
> 				PopUpManager.removePopUp(processingPU);
> 				compArrayCollection.source = tempCompArray;
> 				mainArrayCollection.source = tempArray;
> 				
> 		     }
> 
> 
> 
> --
> View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13192.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I just wanted to give an update because I am so excited. I have never
programmed in C before, but on the suggestion of Justin and others, I
converted my program to C and ran the processing engine and the results are
in - 25 seconds!!! That's over 4x faster than the actionscript version. I am
still going to write my mobile version in AIR, but now it will download
pre-processed data calculated by my C application. I'll be able to process
the entire USA in 5 hours every night. Wicked psyched!!



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13347.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Alex, you truly are a genius.  That is what it turned out to be.  I figured
it out about two hours after my rant.  I had a conditional statement that
was not backwards compatible with the new distance calculation model we
created.  For instance, say I was searching for squirrels in a 1 mile radius
from a target squirrel.  My target squirrel has to be a happy squirrel, but
I don't care if the squirrels within a mile of the target squirrel are
happy.  As I got towards the end of the dataset, I found fewer and fewer
happy squirrels left ahead in the loop, and then realized that I need to
know the distances of the unhappy squirrels behind in the search, and those
were never calculated.  So in the end, if I am planning to only go through
half of the distance searches, I now have to calculate the distance between
every object and not just the ones that meet my conditional criteria.  In
other words, the solution turned out to be slower than just originally
crunching through all the data.  About 5 minutes compared to about 2
minutes.

A quick question about speed.  I am thinking of porting my application to an
iPad since I have never done it before.  If my desktop can do the
calculations in about 2 minutes, how long should I expect an iPad to be able
to do the same work? 5 min, 15 min, an hour? 



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13244.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

Hard to know without seeing the code. One way to make sure that the program is behaving correctly is to use unit tests, you make a change, run the tests if everything the same you know you’ve not introduced an error. That combined with version control can be a very effective way of programming, but that’s perhaps a little off topic.

It your doing the floating point calculations sightly differently it may just be rounding errors. The was at least one calculation in there that was would be sensitive to that. Comparing the value of a complex floating point calculations some exact values is always going to cause some issues.

Take one of the properties you that you show up in one set and not the other and see what condition fails and compare the values it produces.

Thanks,
Justin

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

Also keep in mind that the new logic runs fewer and fewer tests as you
near the end.  Item1 is compared against all items, but item(N-1) isn't
compared at all since it was already compared by all the previous items.
If your sparse array logic, or some other calculation is expecting more
sparse array entries from the last entries back towards previous entries
then that might throw off the results.

-Alex

On 8/6/16, 4:57 PM, "Javier Guerrero García" <ja...@gmail.com> wrote:

>1. Are you sure the second result set is the wrong one? I mean, could
>those
>9 different results be just duplicates?
>
>2. Did you apply Justin simplification on using Pythagorean distances
>instead of euclidean distances on the second version? 4.000000000001 miles
>is greater than 4.0 miles (and hence discarded), and that could happen for
>a very small number of results (5%).
>
>3. Go back to version 1, change one thing at a time, and check results on
>each change to see if they are both congruent.
>
>On Sat, Aug 6, 2016 at 11:19 PM, bilbosax <wa...@comcast.net> wrote:
>
>> I probably should quit whining on here about this issue, but sometimes I
>> get
>> so baffled that you just want someone to share in the misery :)
>>
>> So, I have the two different versions of my program.  One that just
>>chugs
>> through all of the numbers in a little over 2 min.  The other calculates
>> half of the distances and writes the data to a sparse array, and then
>> calculates my equations, all in about 1 minute and 40 seconds.
>>
>> I want to use the faster version, but the data sets that I get out of
>>the
>> two programs are just slightly different.  This is where I am baffled.
>>I
>> exported my data to CSV files so that I could compare them in Excel,
>>and in
>> 95% of the records, the two programs calculate exactly all the same
>> answers,
>> all the totals and sums and averages.  But in about 200 of 38K records,
>>it
>> just goes off the rails and finds no objects within the calculated
>> distance,
>> while the original program will find like 9.  It seems to happen more
>> frequently towards the end of the dataset.  But I find this baffling.
>>How
>> does the logic work 95% if the time, and not the other 5%?  I wonder if
>>the
>> sparse array is unstable or unrelieable, but it always returns the exact
>> same, but faulty, data.  Ugh.  End of rant.
>>
>>
>>
>> --
>> View this message in context: http://apache-flex-users.
>> 2333346.n4.nabble.com/Workers-and-Speed-tp13098p13238.html
>> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>>

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

1. Are you sure the second result set is the wrong one? I mean, could those
9 different results be just duplicates?

2. Did you apply Justin simplification on using Pythagorean distances
instead of euclidean distances on the second version? 4.000000000001 miles
is greater than 4.0 miles (and hence discarded), and that could happen for
a very small number of results (5%).

3. Go back to version 1, change one thing at a time, and check results on
each change to see if they are both congruent.

On Sat, Aug 6, 2016 at 11:19 PM, bilbosax <wa...@comcast.net> wrote:

> I probably should quit whining on here about this issue, but sometimes I
> get
> so baffled that you just want someone to share in the misery :)
>
> So, I have the two different versions of my program.  One that just chugs
> through all of the numbers in a little over 2 min.  The other calculates
> half of the distances and writes the data to a sparse array, and then
> calculates my equations, all in about 1 minute and 40 seconds.
>
> I want to use the faster version, but the data sets that I get out of the
> two programs are just slightly different.  This is where I am baffled.  I
> exported my data to CSV files so that I could compare them in Excel, and in
> 95% of the records, the two programs calculate exactly all the same
> answers,
> all the totals and sums and averages.  But in about 200 of 38K records, it
> just goes off the rails and finds no objects within the calculated
> distance,
> while the original program will find like 9.  It seems to happen more
> frequently towards the end of the dataset.  But I find this baffling.  How
> does the logic work 95% if the time, and not the other 5%?  I wonder if the
> sparse array is unstable or unrelieable, but it always returns the exact
> same, but faulty, data.  Ugh.  End of rant.
>
>
>
> --
> View this message in context: http://apache-flex-users.
> 2333346.n4.nabble.com/Workers-and-Speed-tp13098p13238.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I probably should quit whining on here about this issue, but sometimes I get
so baffled that you just want someone to share in the misery :)

So, I have the two different versions of my program.  One that just chugs
through all of the numbers in a little over 2 min.  The other calculates
half of the distances and writes the data to a sparse array, and then
calculates my equations, all in about 1 minute and 40 seconds.

I want to use the faster version, but the data sets that I get out of the
two programs are just slightly different.  This is where I am baffled.  I
exported my data to CSV files so that I could compare them in Excel, and in
95% of the records, the two programs calculate exactly all the same answers,
all the totals and sums and averages.  But in about 200 of 38K records, it
just goes off the rails and finds no objects within the calculated distance,
while the original program will find like 9.  It seems to happen more
frequently towards the end of the dataset.  But I find this baffling.  How
does the logic work 95% if the time, and not the other 5%?  I wonder if the
sparse array is unstable or unrelieable, but it always returns the exact
same, but faulty, data.  Ugh.  End of rant.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13238.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/5/16, 10:24 PM, "bilbosax" <wa...@comcast.net> wrote:

>>  You might want to make sure the sparse array doesn't have some
>unintended side effect.
>
>What did you mean by this?  When I run my program the original way, doing
>ALL of the calculations, I get a certain number of matches.
>
>When I am running the programming doing half of the calculations and
>storing
>them in a sparse array, I am not getting the same number of matches.  I
>have
>been staring at this all day and don't see any errors in my code, so I
>thought that I would ask you what you meant by unintended side effects.

I meant that the sparse array might use so much memory or cause more
garbage collection than expected, but sure, this is an unintended side
effect as well.
It might be worth going back to your original code, change the loop to do
half the calculations and see what happens.  This assumes that you really
don't need to separately calculate A vs B and B vs A.

-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

>  You might want to make sure the sparse array doesn't have some 
unintended side effect. 

What did you mean by this?  When I run my program the original way, doing
ALL of the calculations, I get a certain number of matches.

When I am running the programming doing half of the calculations and storing
them in a sparse array, I am not getting the same number of matches.  I have
been staring at this all day and don't see any errors in my code, so I
thought that I would ask you what you meant by unintended side effects.

Thanks



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13234.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/5/16, 6:55 PM, "bilbosax" <wa...@comcast.net> wrote:

>Alex Harui wrote
>> IMO, the new loop is constructed in a way that it will only test a vs b
>> and never b vs a, so there is no need to store things for the b vs a
>>test.
>
>Yes, but the point that I am trying to make is that I can only calculate
>the
>test A sums and averages against all the other records at this point.
>All I
>know about the Test B at this point is that it is a certain distance from
>Test A, but what about the distance between Test B and all of the other
>records and all the sums and averages that I want to keep for Test B?  I
>don't have them at the time.  So we are cutting the number of distance
>calculations in half, but have to go through them again so that the
>sums/averages for each and every record can be ascertained against all the
>other records. 

I think a key question is what these "sums and averages" are used for.  If
you must compute A against all other items in the database and then B
against all other items, then you simply have to do the work, although you
could store the results in A and B and look them up by computing which
record is holding the cached results.  Then you wouldn't need a sparse
array: the item with the lowest "index" holds the comparison based on the
new looping logic.

But it sounds like you only need to do the "sums and averages" only when
two items meet some criteria, not in the computation of which items meet
the criteria.  If that's true, then you first want to find the few pairs
items that need computation and then crank the data.  If you need some of
this math to determine which two items to compare, that would go into the
hash function.

>
>I definitely like the hash idea and want to learn more about it.  Do you
>have a book or any links that you recommend to learn a lot about hash
>functions?
>

I don't have any good resources.  This is stuff from my undergrad days
over 30 years ago.  It's been fun trying to recall it.  As you can see
from Wikipedia, mathematical functions can be used to process data into
groups for many useful purposes.  Having the right data and the right
functions is the key.  IMO, there are relatively few "new" problems these
days.  Most things have an analogy that has been solved before.  If you
can figure out a good analogous problem we can discuss it here without
messing up your company's IP.

-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Alex Harui wrote
> IMO, the new loop is constructed in a way that it will only test a vs b
> and never b vs a, so there is no need to store things for the b vs a test.

Yes, but the point that I am trying to make is that I can only calculate the
test A sums and averages against all the other records at this point.  All I
know about the Test B at this point is that it is a certain distance from
Test A, but what about the distance between Test B and all of the other
records and all the sums and averages that I want to keep for Test B?  I
don't have them at the time.  So we are cutting the number of distance
calculations in half, but have to go through them again so that the
sums/averages for each and every record can be ascertained against all the
other records. 

I definitely like the hash idea and want to learn more about it.  Do you
have a book or any links that you recommend to learn a lot about hash
functions?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13232.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

Following your example, how many "shapes" are there in the problem?
Couldn't you just store them separately first:

points["marbles"]=array(...)
points["squares"]=array(...)

so you don't have to loop through 38k records every time you have a marble
just to find all other marbles? Also, sorting your DB query by the shape
should help to improve the speed, so once the "shape[i]==shape[j]"
condition is met, all X following record will also meet that condition, and
after those X records, no one will.

You can do a first loop over the 38k records to prepare those "shape"
arrays, and then proceed with the big loop, but for each itemA, the inner
loop just iterates over all the items in points[itemA.shape], not over the
whole 38k records.

If all that sounds good and works, now thing if adding more dimensions to
the first array (as points[shape][color] for instance) would also help, add
as many dimensions as you wish, and there you have your hash function :) (a
dictionary is essentially a hash table).

On Sat, Aug 6, 2016 at 12:38 AM, bilbosax <wa...@comcast.net> wrote:

> I wish sometimes that we could actually talk because typing can become
> cumbersome when trying to convey ideas.  But basically imagine that I have
> a
> yellow marble laying on a map, and I want to know how many blue marbles lay
> within a mile of that marble.  I go through all the conditionals to make
> sure that it is a round blue marble, and if it is, I calculate the distance
> between the yellow and blue marble, and if it is less than a mile, I record
> the distance to a sparse array.  Because of your guy's suggestions, I now
> also add it to the sparse array in 2 places, because if I know the distance
> between the yellow and blue marble, I also know the distance from the blue
> to the yellow marble.  So now I am doing half the number of distance
> calculations, but have added the overhead of placing and getting
> information
> from a new array to be evaluated later.  None of the averages, medians, and
> other calculations that I need to do can be done at this time because we
> have made the sacrifice of increasing speed.
>
> Now I go to the next record, but this time it is a red square and it is
> looking for all of the green squares within a mile of it.  So I HAVE to go
> through all of the records again with respect to what the new record
> specifies.
>
> Now, once all distances have been claculated, I can go back through the
> sparse array and if a distance has been recorded, calculate the averages
> and
> sums for that particular record.
>
> So, yes, we could probably get it closer to a minute if distance
> calculations were all we had to do, but a lot of numbers have to be
> calculated, and they  have to be calculated with respect to the target
> record.  Just because we are cutting down the number of distance
> calculations does not change the fact that other numbers have to be
> calculated in addition to the distance for every record.
>
>
>
> --
> View this message in context: http://apache-flex-users.
> 2333346.n4.nabble.com/Workers-and-Speed-tp13098p13230.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

Yeah, email can be painful at times, but it makes a good record for others
to maybe use some day.

IMO, if your old loop worked, then without changing anything else, the new
loop that does half the compares should work as well in half the time.
Then separately, there is whether the sparse array speeds anything up.
IMO, the new loop is constructed in a way that it will only test a vs b
and never b vs a, so there is no need to store things for the b vs a test.

Regarding finding things within a certain distance, I would definitely
suggest figuring out a hash-and-bucket algorithm.   The bucket size would
be 1 mile.  I think if you sort into two sets of buckets (x and y) then
testing for near neighbors should be straightforward.  The whole goal of
this algorithm is to do as few tests as possible.  By sorting the bucket
names in each dimension, you should be able to only test your bucket and
the one bucket one mile away on either side if it exists.  IMO, that will
have result in the fewest tests run.

-Alex

On 8/5/16, 3:38 PM, "bilbosax" <wa...@comcast.net> wrote:

>I wish sometimes that we could actually talk because typing can become
>cumbersome when trying to convey ideas.  But basically imagine that I
>have a
>yellow marble laying on a map, and I want to know how many blue marbles
>lay
>within a mile of that marble.  I go through all the conditionals to make
>sure that it is a round blue marble, and if it is, I calculate the
>distance
>between the yellow and blue marble, and if it is less than a mile, I
>record
>the distance to a sparse array.  Because of your guy's suggestions, I now
>also add it to the sparse array in 2 places, because if I know the
>distance
>between the yellow and blue marble, I also know the distance from the blue
>to the yellow marble.  So now I am doing half the number of distance
>calculations, but have added the overhead of placing and getting
>information
>from a new array to be evaluated later.  None of the averages, medians,
>and
>other calculations that I need to do can be done at this time because we
>have made the sacrifice of increasing speed.
>
>Now I go to the next record, but this time it is a red square and it is
>looking for all of the green squares within a mile of it.  So I HAVE to go
>through all of the records again with respect to what the new record
>specifies.
>
>Now, once all distances have been claculated, I can go back through the
>sparse array and if a distance has been recorded, calculate the averages
>and
>sums for that particular record.
>
>So, yes, we could probably get it closer to a minute if distance
>calculations were all we had to do, but a lot of numbers have to be
>calculated, and they  have to be calculated with respect to the target
>record.  Just because we are cutting down the number of distance
>calculations does not change the fact that other numbers have to be
>calculated in addition to the distance for every record.
>
>
>
>--
>View this message in context:
>http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p1
>3230.html
>Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I wish sometimes that we could actually talk because typing can become
cumbersome when trying to convey ideas. But basically imagine that I have a
yellow marble laying on a map, and I want to know how many blue marbles lay
within a mile of that marble. I go through all the conditionals to make
sure that it is a round blue marble, and if it is, I calculate the distance
between the yellow and blue marble, and if it is less than a mile, I record
the distance to a sparse array. Because of your guy's suggestions, I now
also add it to the sparse array in 2 places, because if I know the distance
between the yellow and blue marble, I also know the distance from the blue
to the yellow marble. So now I am doing half the number of distance
calculations, but have added the overhead of placing and getting information
from a new array to be evaluated later. None of the averages, medians, and
other calculations that I need to do can be done at this time because we
have made the sacrifice of increasing speed.

Now I go to the next record, but this time it is a red square and it is
looking for all of the green squares within a mile of it. So I HAVE to go
through all of the records again with respect to what the new record
specifies.

Now, once all distances have been claculated, I can go back through the
sparse array and if a distance has been recorded, calculate the averages and
sums for that particular record.

So, yes, we could probably get it closer to a minute if distance
calculations were all we had to do, but a lot of numbers have to be
calculated, and they have to be calculated with respect to the target
record. Just because we are cutting down the number of distance
calculations does not change the fact that other numbers have to be
calculated in addition to the distance for every record.

--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13230.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

Agree with Alex :) You should be under the minute barrier, since you're
doing just half the job (unless of course getting the results from the DB
takes 1 minute :)

On Fri, Aug 5, 2016 at 6:21 AM, Alex Harui <ah...@adobe.com> wrote:

> Sounds like you made two major changes.  Did you do them separately and
> measure the effects of each change?  In theory, cutting out half the loops
> should have cut the time in half and I thought you were already under 3
> minutes.  You might want to make sure the sparse array doesn't have some
> unintended side effect.
>
> -Alex
>
> On 8/4/16, 5:42 PM, "bilbosax" <wa...@comcast.net> wrote:
>
> >WooHoo! 1 minute and 45 seconds!  I can live with that.  Taking out half
> >of
> >the calculations and storing the data in a sparse array, and then looping
> >the sparse array knocked off another 20-30 seconds!  Thanks for all of the
> >help guys, I am glad this community is here and I have learned a lot from
> >all of you!
> >
> >...now, I am going to have to read up about hashing :)
> >
> >
> >
> >--
> >View this message in context:
> >http://apache-flex-users.2333346.n4.nabble.com/Workers-
> and-Speed-tp13098p1
> >3225.html
> >Sent from the Apache Flex Users mailing list archive at Nabble.com.
>
>

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

Sounds like you made two major changes.  Did you do them separately and
measure the effects of each change?  In theory, cutting out half the loops
should have cut the time in half and I thought you were already under 3
minutes.  You might want to make sure the sparse array doesn't have some
unintended side effect.

-Alex

On 8/4/16, 5:42 PM, "bilbosax" <wa...@comcast.net> wrote:

>WooHoo! 1 minute and 45 seconds!  I can live with that.  Taking out half
>of
>the calculations and storing the data in a sparse array, and then looping
>the sparse array knocked off another 20-30 seconds!  Thanks for all of the
>help guys, I am glad this community is here and I have learned a lot from
>all of you!
>
>...now, I am going to have to read up about hashing :)
>
>
>
>--
>View this message in context:
>http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p1
>3225.html
>Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

WooHoo! 1 minute and 45 seconds!  I can live with that.  Taking out half of
the calculations and storing the data in a sparse array, and then looping
the sparse array knocked off another 20-30 seconds!  Thanks for all of the
help guys, I am glad this community is here and I have learned a lot from
all of you!

...now, I am going to have to read up about hashing :)



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13225.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Ooops.  I get it.  It is giving me the position of the data.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13224.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

So, I don't think this will turn out faster because of the mechanics involved
and am only doing it to learn more about Flex, but I have taken yours and
Alex's suggestions and made adjustments so that I am only doing half of the
comparisons and storing the distance data in a sparsely populated array.  I
am then going to run through this array to do my sums, averages, medians,
etc.  I am having some trouble with the sparsely populated array.  I put a
distance number in the array and then immediately trace the array entry and
I get a nice Number with a lot of decimal places.  But when I iterate
through the sparse array to get the data back out later on, I am getting
back integer numbers that are way too big and have no decimal places.  Since
I already traced what I put into the array, I know the data is stored
properly, so it has to be the way I am trying to get it back out.  Here is
how I am iterating through the sparse array. tableArray is the sparse array
containing distance values.

for (var z:int = 0; z < length; z++) {
					for (var myProperty:String in tableArray[z]) {
						trace(z, myProperty);
					}
				}



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13223.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/3/16, 10:33 PM, "bilbosax" <wa...@comcast.net> wrote:
>
>Strange thing though, the results are not very consistent.  I can run the
>program one time and get 2 min and 5 sec, close the program and open it
>again and get 2 min and 30 seconds.  What gives?

There are dozens of factors in the run time.  The garbage collection is
opportunistic (it doesn't run on a schedule) and even things like network
and mouse events can affect when the GC will run.  Plus, the CPUs are
being shared with other apps on your system.  Wait till you see the run
time change when your virus scanner starts up.

When I run performance tests, I try to run 5 runs or more and look for
convergence.

HTH,
-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Yes I did, and that actually shaved a little time off the total.  The best
time that I have gotten so far was about 2 min and 5 sec.  That and the
pythagorean were great suggestions, and one actually helped.

Strange thing though, the results are not very consistent.  I can run the
program one time and get 2 min and 5 sec, close the program and open it
again and get 2 min and 30 seconds.  What gives?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13214.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> As far as the pythagorean theorem was concerened, the results were accurate
> enough but saved no noticeable time.  In other words, the actual math is not
> the real culprit here with the speed in the grand scheme of things.

I expected it to show something but that just goes to show you while profiling is important. :-)

Have you tried replacing tempArray[i] with tempI and tempArray[j] to be tempJ after assign to local variable tempI and tempJ? Again I think that will help but profile it to be sure.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Yes, you are right, but only if I am saving the distances that have already
been calculated to an array so I know that they already exist and can look
them up.  It has yet to be determined if the mechanics of saving and
retrieving to a super huge array will save any time or not.  I will report
by some time tomorrow.

As far as the pythagorean theorem was concerened, the results were accurate
enough but saved no noticeable time.  In other words, the actual math is not
the real culprit here with the speed in the grand scheme of things.

I am learning so much.  Thanks to everyone for their help in my project!



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13212.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/4/16, 1:54 PM, "bilbosax" <wa...@comcast.net> wrote:

>I'm going to read up about hashing, this sounds very very interesting.
>But I
>am curious, does it also work if you have to do more than just compare
>strings?  What if you have to read a guys shoe size and make sure it is
>within 3 sizes of a predetermined value? Is this possible using hashing?
>

It is possible either via the hash function or via the bucket search.  It
depends a bit on the range of possible outcomes. For example, if I wanted
to find folks who provided comments by city block, the hash function would
sort all house numbers into buckets (assuming city blocks range 1-100,
101-200, etc.  But that's because I know the boundaries (hundreds by house
number).  If you don't know the bucket boundaries are trying to find
groups on the fly (so that If you start with 100 you find both 99, 101)
then you might design the hash key to be sortable by those ranges.  Then
instead of walking the buckets with a simple "for each" you would first
get all the keys via "for each", sort them, then scan the list of keys and
look for groups.

Even with doing all of that, 38,000 loops should be way better than 1.4
billion (and of course, you might still be able to break some of this up
into threads).

If you read [1] the related sections are "Finding Duplicate Records" and
"Finding Similar Records".

-Alex

[1] https://en.wikipedia.org/wiki/Hash_function

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I'm going to read up about hashing, this sounds very very interesting.  But I
am curious, does it also work if you have to do more than just compare
strings?  What if you have to read a guys shoe size and make sure it is
within 3 sizes of a predetermined value? Is this possible using hashing?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13219.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.


On 8/4/16, 3:04 PM, "Justin Mclean" <ju...@classsoftware.com> wrote:

>Hi,
>
>> The simplest hash is the concatenation of the string representation of
>>all
>> the properties you are currently comparing, assuming that no property
>>will
>> have the value of empty string "".  I believe you can always test for ""
>> and swap for something else, even a simple space " ".  If there are
>> numeric properties you might want to add a delimiter so that "11" + "1"
>>is
>> different from "1" + "11”.
>
>You may need to be a little careful here as calculation of hashes (and
>string manipulation in general) can be very expensive in terms of garbage
>collection. Your milage may vary but in a similar situation I had a while
>back string hashes made the process slower and I needed up needing to go
>with numeric hashes. Also looking up numeric index in array or numeric
>properties in objects is a lot faster than looking up string properties
>in objects.

Correct, in my earlier post I mentioned that string hashing may not be
optimal and encoded hashes would be better.

-Alex

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> The simplest hash is the concatenation of the string representation of all
> the properties you are currently comparing, assuming that no property will
> have the value of empty string "".  I believe you can always test for ""
> and swap for something else, even a simple space " ".  If there are
> numeric properties you might want to add a delimiter so that "11" + "1" is
> different from "1" + "11”.

You may need to be a little careful here as calculation of hashes (and string manipulation in general) can be very expensive in terms of garbage collection. Your milage may vary but in a similar situation I had a while back string hashes made the process slower and I needed up needing to go with numeric hashes. Also looking up numeric index in array or numeric properties in objects is a lot faster than looking up string properties in objects.

Thanks,
Justin

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/3/16, 11:41 PM, "bilbosax" <wa...@comcast.net> wrote:

>You just stepped outside of my academic sphere :)  Although, I did a quick
>wiki scan on hashes and it sounds fascinating, but I don't know how well
>it
>would work in this situation.  I think I would have to do a hash 38k times
>if I understand it correctly.  I don't have a certain set of values that I
>am looking for to create a hash.  The record that I am looking at defines
>all of the values that need compared, and those values change from record
>to
>record.  So the search characteristics are constantly changing.  But, it
>could be that I simply do not understand hashes, because I just glanced at
>it.  I will take a deeper look into it soon, as it seems like something I
>should know to a certain degree of competency.
>

Well, generalized hashing is a big topic.  What you are doing here is much
simpler.  Maybe another way to think about it is as the generation of
identifiers or keys for each object where if two identifiers match, then
you know the two items should be compared to each other.

All of the properties you use in your chain of if statements during the
compare become the variables in the hash/identifier function.  The key
things to keep in mind are that this gives you the opportunity to do the
comparison inside the AIR runtime (in C code) instead of in ActionScript
because the property lookup on the Object runs inside AIR, and that this
dramatically changes the number of comparisons, which was the bottleneck
in your app.

Your current code was/is (roughly):

for (i = 0; i < n; i++)
  for (j = 0; j < n, j++)
    if (i != j &&
        item[i].prop1 == item[j].prop1 &&
        item[i].prop2 == item[j].prop2 ...

The "if"s are run n^2 times.  If you have 4 items, 16 times, for 100
items, 10,000 times and for 38,000 items, the ifs are being run
1,444,000,000 times.

Since the comparison of itemI to itemJ is the same as comparing itemJ to
itemI, your loop only really needs to look like:

for (i = 0; i < n - 1; i++)
  for (j = i; j < n, j++)
    if (item[i].prop1 == item[j].prop1 &&
        item[i].prop2 == item[j].prop2 ...

Then, the "ifs" are run much fewer times.  If you have 4 items, only 6
times, for 100 items, 4950 times, and for 38,000 items, only 721,981,000
times, which should cut your total time in half.

But then, if you use hashing, the "if" may take longer because it won't
have early exits, but if you have 4 items, the test is only run 4 times,
for 100 items only 100 times and for 38,000 items, only 38,000 times.  You
can see that that is way fewer times than 721,981,000 times so you can
usually afford the more expensive hash calculation.

The simplest hash is the concatenation of the string representation of all
the properties you are currently comparing, assuming that no property will
have the value of empty string "".  I believe you can always test for ""
and swap for something else, even a simple space " ".  If there are
numeric properties you might want to add a delimiter so that "11" + "1" is
different from "1" + "11".

Let's say I want to rummage through comments left on our wiki and find
folks who submitted more than one.  Each comment record might have:

class CommentRecord
{
  Var date:Date
  var firstName:String
  var lastName:String
  var country:String
  var state:String
  var city:String
  var street:String
  var houseNumber:String
  var phoneNumber:String
  var comment:String
}

If one can assume that
country+state+city+street+houseNumber+lastName+firstName would uniquely
identify someone who left a comment on the wiki, then we can find folks
who've left more than one comment by hash-sorting them into buckets as I
described in my previous post.  The computeHash() could be as simple as:

function computeHash(item:Object):String
{
  return item.country + item.state + item.street +
         item.houseNumber + item.lastName +
         item.firstName;
}

I would expect this algorithm to scale much better as the number of
comments grows since even at 100,000 records it is only 100,000 tests.
Also note that you could store the hash in the DB so it doesn't need to be
computed each time you start up the app.

HTH,
-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

You just stepped outside of my academic sphere :)  Although, I did a quick
wiki scan on hashes and it sounds fascinating, but I don't know how well it
would work in this situation.  I think I would have to do a hash 38k times
if I understand it correctly.  I don't have a certain set of values that I
am looking for to create a hash.  The record that I am looking at defines
all of the values that need compared, and those values change from record to
record.  So the search characteristics are constantly changing.  But, it
could be that I simply do not understand hashes, because I just glanced at
it.  I will take a deeper look into it soon, as it seems like something I
should know to a certain degree of competency.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13217.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/3/16, 7:32 PM, "bilbosax" <wa...@comcast.net> wrote:

>True, I can live with the speed that it is currently running.  I was
>willing
>to construct 4 workers in hopes of getting it down to 15 minutes.  To be
>able to have it running in my main application in under three minutes is a
>dream.  Now it is just a fun academic exercise for me.

Well, if we're being academic, it may be that this is more of a sorting
problem than a comparison problem.  While changing the looping as I
suggested in my previous post can make a n^2 algorithm a nlogn algorithm,
hashing might make the algorithm linear.  One thing might matter is how
many matches you expect to find in the database.  If there are always
going to be relative few matches (the other extreme would be all 38K items
having the same 8 properties), then hashing into buckets might be an
alternative implementation.

Then the algorithm is more like:

Var hashTable:Object = {};

for (i = 0; i < n; i++)
{
  var hash:String = computeHash(items[i]);

  // if no entry in table, create an array to hold items
  if (hashTable[hash] == null)
    hashTable[hash] = [];
  hashTable[hash].push(items[i]);
}

The hash has to be designed to so that items who match all 8 properties
have the same hash value.  Then in just one loop, the hashTable has
buckets with matching items and you can scan the table and run the math:

for each (hash in hashTable)
{
  var matches:Array = hashTable[hash];
  if (matches.length > 1)
  {
    runMathOnMatches(matches);
  }
}

The hash could be a simple concatenation of the strings of all 8
properties as long as you are sure there won't be inadvertent collisions.
For example, if one property is State and the other is City, you can
compute a hash of State+City.  Such a hash could compute long strings so
hash computation and lookup won't be optimal, so then if you really want
to squeeze a few cycles more out of it you could consider encoding the
properties or running better hash algorithms to generate numeric hashes or
other shorter hashes.

Food for thought,
-Alex

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> For instance, if I create an array like so:
> 
> var tableArray:Array = new Array();
> 
> then how do I go about placing a distance calculation in say, slot
> [1703][25000]?  Flex won't allow me to do this:
> 
> tableArray[1703][25000] = distance;

You need to do something like this:

if (!tableArray.hasOwnProperty(1703)) {
	tableArray[1703] = new Array();
}
tableArray[1703][25000] = distance;

That array of arrays is sparse and will only take up a little bit of space.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Also, I don't know if you can push a value into a somewhat random slot in an
Array.  For instance, if I create an array like so:

var tableArray:Array = new Array();


then how do I go about placing a distance calculation in say, slot
[1703][25000]?  Flex won't allow me to do this:

tableArray[1703][25000] = distance;

And I don't want to keep track of 38000 distance calculations every time I
want to push a row of data into the array.  Is there a way to just push a
single distance calculation into a row if it is actually calculated?

If not, then this process is likely to be a great idea in theory, but in
mechanics, unmanageable by the program.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13207.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> Do you think an Array is a better choice than a Vector here?

Yes arrays can be sparse (i.e. missing values), vectors are not in ActionScript.

Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

True, I can live with the speed that it is currently running.  I was willing
to construct 4 workers in hopes of getting it down to 15 minutes.  To be
able to have it running in my main application in under three minutes is a
dream.  Now it is just a fun academic exercise for me.  So, I am definitely
going to try the Pythagorean theorem and see if it is close enough and
faster.  But I am also interested in trying to create a table to look up
previous calculated values to see if I can make it any faster.  I don't know
if Flex is going to allow this. A 38k x 38k table has 1,444,000,000 values
that it can hold, and that is a lot of Gigabytes for Flex.  I tried to
create a vector this big and Flex gave up and just closed.  This is how I
tried to create it:

var vector:Vector.<Vector.&lt;Number>> = new
Vector.<Vector.&lt;Number>>(length);
								
				for (var k:int = 0; k < length; k++) {
					vector[k] = new Vector.<Number>(length);
				}
				

Flex didn't like this. The thing is, I don't need for a cell to be created
to hold something if there is nothing to put it in.  So if I calculate
distances and only place them in an array if they are appropriate
comparisons, I may only have an array with a few million cells populated.  I
don't know if Flex will only increase memory on an array if it is needed. Do
you think an Array is a better choice than a Vector here?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13206.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> Well, it is not quite as simple as that. If I were just updating the array,
> then yes, it would be. But I am keeping track of a count of items within a
> distance of a particular record, and averages and medians and other such
> things that have to be calculated specifically with respect to that record.

Which is also easily done, look up how to calculate a running average. Basically if you have an average of A from N values then given one more value V the new average is (A+V)/(N+1).

But up to you if a little work is worth making it run twice as fast, as given the time come down from hours to minutes it may be that it’s now fast enough for what you need to do?

There still a few more simpler improvements that are likely to give you speed improvements that could be done without that change.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Well, it is not quite as simple as that. If I were just updating the array,
then yes, it would be. But I am keeping track of a count of items within a
distance of a particular record, and averages and medians and other such
things that have to be calculated specifically with respect to that record.
So I am calculating a bunch of values for record i, but won't know those
totals for record j until I get to record j and tally them up for that
record. But I will know already that it is comparable to record i when I get
to it, and the distance will already be calculated. So I do have to save the
distance.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13204.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/3/16, 5:58 PM, "Justin Mclean" <ju...@classsoftware.com> wrote:

>Hi,
>
>> If i compare record 20 to record 1030 and find them to be comparable and
>> decide to calculate the distance between them, then why should i
>>recompare
>> record 1030 to 20 when the time comes in the loop?
>
>Correct theres no need to.

I could be wrong, but doesn't this mean the inner loop doesn't have to
start at 0 but at i?  If so, that would cut out a lot of comparisons.

For (i = 0; i < n - 1; i++)
  For (j = i + 1; j < n; j++)
    Test(i, j)

IF n = 4, this will test:

0,1
0,2
0,3
1,2
1,3
2,3

HTH,
-Alex

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> If i compare record 20 to record 1030 and find them to be comparable and
> decide to calculate the distance between them, then why should i recompare
> record 1030 to 20 when the time comes in the loop?

Correct theres no need to.

> So if I stored the distance in a huge 38k x 38k array(or
> vector), and the first thing I check is if a distance has been recorded in
> this array

No need to do that.

All you need to do is alter the loop as I suggested and then when you update the array for i also update the array for j as well.

> Do you think it is worth trying this scenario? 

Well up to you but it will half the time it takes to run so yes.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I'm trying to wrap my brain around this idea.  In theory, it should be
faster, but it would be such a huge array for a lookup table and it would
add another conditional statement, that when the mechanicals are built, it
may be slower.  But my idea is this:

If i compare record 20 to record 1030 and find them to be comparable and
decide to calculate the distance between them, then why should i recompare
record 1030 to 20 when the time comes in the loop?  That comparison has
already been made.  So if I stored the distance in a huge 38k x 38k array(or
vector), and the first thing I check is if a distance has been recorded in
this array, then I already know that they are comparable and do not need to
execute any more conditionals or calculate the distance.  So the logic cuts
the number of times through the conditionals and calculations in half, at
the expense of a very large array and adding another conditional for at
least half of the calculations.

Do you think it is worth trying this scenario? 



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13202.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> 				for (var i:int = 0; i < length; i++) {   /*  Loop Through Each of the
> Records*/
> 					if (tempArray[i].status == "Act") { /*  Test to see if current record
> status is ACTIVE */

String comparisons are slow, try to make this numeric if you can.

> 						for (var j:int = 0; j < length; j++){  /*  Compare to Each of the
> other Records*/

Should be able to go from i+1 to j in this loop so you don’t do the same calculations twice. (That will make it run twice as fast.)

> 							if (i != j) {	/*  Make sure you are not comparing a record to itself

If you change the loop above no need for this.

> */
> 								if (tempArray[i].propOne == tempArray[j].propOne) { 

Creating  var itemA:Object = tempArray[i] and var itemB:Object = tempArray[j] rather than looking them up every time may be faster. Use itemA and itemB where you have tempArray below. Also makes code a little more readable.

> 									if (tempArray[i].propTwo == tempArray[j].propTwo) {
> 										if (tempArray[i].propThree == tempArray[j].propThree) { 
> 											if (Math.abs(tempArray[i].propFour - tempArray[j].propFour) <=
> 10) { 
> 												if (tempArray[i].propFive == tempArray[j].propFive) { 
> 													if (Math.abs(tempArray[i].propSix - tempArray[j].propSix) <= 1)
> { 
> 														if (Math.abs(tempArray[i].propSeven - tempArray[j].propSeven)
> <= 1) { 
> 															if (Math.abs(((tempArray[i].propEight -
> tempArray[j].propEight)/tempArray[i].propEight))<0.10) {
> 																lon1 = tempArray[i].longitude*Math.PI/180;
> 																lon2 = tempArray[j].longitude*Math.PI/180;
> 																lat1 = tempArray[i].latitude*Math.PI/180;
> 																lat2 = tempArray[j].latitude*Math.PI/180;
> 																dlon = lon2 - lon1;
> 																dlat = lat2 - lat1;
> 																a = Math.pow(Math.sin(dlat/2), 2) +
> (Math.cos(lat1)*Math.cos(lat2)*Math.pow(Math.sin(dlon/2), 2));
> 																c = 2*Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
> 																d = 3961 * c;
> 																
> 																if (d <= tempDistance ) {
> 																	count = count + 1;
> 																	someProp = someProp + Number(tempArray[j].propEleven);
> 																	tempCompArray.push({" push about 26 properties from one
> array to a comparison array, plus some new values});
> 																	tempArray[i].propTwelve = is true;

istrue? Is that a string or a number or boolean?

> 																}
> 															}
> 														}
> 													}
> 												}
> 											}
> 										}
> 									}
> 								}
> 							}						
> 						}
> 						
> 						if (count != 0) {											/*  Populate data if there is actually
> data to be updated */
> 							average = someProp/Number(count);

Note sure you need a cast to Number here?

> 							tempArray[i].propThirteen = count;
> 							tempArray[i].Prop14 = average;
> 							
> 							if (average == 0.0) {

May be a issue with numbers close to 0, rounding issues etc etc here, typically something like if (average <= 0.00001) is used.

> 								tempArray[i].propFourteen = 0.0;
> 							} else {
> 								tempArray[i].propFourteen = (Number(tempArray[i].propTen) -
> average)/average *100.0;
> 							}
> 							
> 							tempArray[i].propFifteen = tempArray[i].propFourteen -
> tempArray[i].propFifteen;
> 						}
> 						
> 						count = 0;
> 						average = 0.0;
> 						someNumber = 0.0;
> 					}
> 				}
> 				
> 				PopUpManager.removePopUp(processingPU);
> 				compArrayCollection.source = tempCompArray;
> 				mainArrayCollection.source = tempArray;
> 				
> 		     }

Hope that helps,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I have resisted actually posting the code because it has been suggested to me
that the app should be looked at for a patent, and this calculation process
is at the heart of the idea.  So I have reduced the variables down to just
generic names so as not to give away my idea away.  I really liked your idea
of combining all of the conditionals to one line using && operators, but in
the end it was no faster.  Two minutes and forty seconds to complete.  I
believe this is because, in my code, I have placed the conditionals in an
order that is most likely to throw out a record quickly if it is not a good
comparator, whereas your idea forces all of the conditionals to HAVE to be
considered for EVERY record(at least I think that it does).  Here is a
simplified version of the code:

protected function recalculateHandler(event:Event):void
			{
				processingPU.removeEventListener("popUpOpened", recalculateHandler);
				
				var count:int = 0;
				var average:Number = 0.0;
				var someNumberl:Number = 0.0;
				var lon1:Number;
				var lon2:Number;
				var lat1:Number;
				var lat2:Number;
				var dlon:Number;
				var dlat:Number;
				var a:Number;
				var c:Number;
				var d:Number;
				var tempDist:Number = (distance.selectedIndex*.25)+.25;
				var length:int = mainArrayCollection.length;
				var tempArray:Array = new Array();
				compArrayCollection = new ArrayCollection();
				tempCompArray = new Array();
				tempArray = speedArrayCollection.source;
				
				for (var i:int = 0; i < length; i++) {   /*  Loop Through Each of the
Records*/
					if (tempArray[i].status == "Act") { /*  Test to see if current record
status is ACTIVE */
						for (var j:int = 0; j < length; j++){  /*  Compare to Each of the
other Records*/
							if (i != j) {	/*  Make sure you are not comparing a record to itself
*/
								if (tempArray[i].propOne == tempArray[j].propOne) { 
									if (tempArray[i].propTwo == tempArray[j].propTwo) {
										if (tempArray[i].propThree == tempArray[j].propThree) { 
											if (Math.abs(tempArray[i].propFour - tempArray[j].propFour) <=
10) { 
												if (tempArray[i].propFive == tempArray[j].propFive) { 
													if (Math.abs(tempArray[i].propSix - tempArray[j].propSix) <= 1)
{ 
														if (Math.abs(tempArray[i].propSeven - tempArray[j].propSeven)
<= 1) { 
															if (Math.abs(((tempArray[i].propEight -
tempArray[j].propEight)/tempArray[i].propEight))<0.10) {
																lon1 = tempArray[i].longitude*Math.PI/180;
																lon2 = tempArray[j].longitude*Math.PI/180;
																lat1 = tempArray[i].latitude*Math.PI/180;
																lat2 = tempArray[j].latitude*Math.PI/180;
																dlon = lon2 - lon1;
																dlat = lat2 - lat1;
																a = Math.pow(Math.sin(dlat/2), 2) +
(Math.cos(lat1)*Math.cos(lat2)*Math.pow(Math.sin(dlon/2), 2));
																c = 2*Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
																d = 3961 * c;
																
																if (d <= tempDistance ) {
																	count = count + 1;
																	someProp = someProp + Number(tempArray[j].propEleven);
																	tempCompArray.push({" push about 26 properties from one
array to a comparison array, plus some new values});
																	tempArray[i].propTwelve = istrue;
																}
															}
														}
													}
												}
											}
										}
									}
								}
							}						
						}
						
						if (count != 0) {											/*  Populate data if there is actually
data to be updated */
							average = someProp/Number(count);
							tempArray[i].propThirteen = count;
							tempArray[i].Prop14 = average;
							
							if (average == 0.0) {
								tempArray[i].propFourteen = 0.0;
							} else {
								tempArray[i].propFourteen = (Number(tempArray[i].propTen) -
average)/average *100.0;
							}
							
							tempArray[i].propFifteen = tempArray[i].propFourteen -
tempArray[i].propFifteen;
						}
						
						count = 0;
						average = 0.0;
						someNumber = 0.0;
					}
				}
				
				PopUpManager.removePopUp(processingPU);
				compArrayCollection.source = tempCompArray;
				mainArrayCollection.source = tempArray;
				
		     }



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13192.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Javier Guerrero García <ja...@gmail.com>.

Hi Bilbosax:

Just my 2 cents on some things to consider: according to your code, it's
something like

"If 10 conditions are met, do the math"

Instead of writing it as a series of nested ifs (a real mess for your CPU
branch prediction module), try this:

dothemath=(condition1) && (condition2) && .... && (condition10)
if (dothemath) do_the_math()

Since boolean expressions are evaluated left to right, you are not adding
any additional complexity, your loop is now turned to almost linear (just
one branch, not 10), flash compiler is much happier, your CPU branch
predictor can now do its job (instead of committing suicide), and if you
even manage to previously sort your data so dothemath is always true for
the first n records and false for the rest (or at least part of them,
sorting by condition1 for instance), you could always get full pipeline
coverage, getting a result every clock tick (and a gigahertz clock is a
very fast clock :)

Besides, instead of running it 38k*38k times, could you also precalculate
some part (or all) of your conditions linearly (38k), and then do the
nested loop?

Even better, couldn't you just store partial results in a 38k long
dictionary, and then just do a linear loop referencing the result on that
dictionary?

Even better, couldn't you just make your SQL server handle part of the job
on the query itself, and return a reduced result set based on those
conditions? It has most of the data (without needing to cast strings to
integers) at hand, and it reaaaaally knows how to iterate through 38k
records really fast (it's the only thing it does really well :).

In general, could you just give us more data on the calculations (change
constants, operations, anything you want except variable referencing) so we
can help you better? Me (and many others) do believe it can be further
optimized to seconds, but we do "need to know" :)

Hope it helps! :)

P.S. The other way round: would it hurt if you do the math linearly first
for every record (just 38k), and then just select the ones that meet all 10
conditions (38kx38k IFs)?

P.P.S. Of course, you have turned OFF automatic updates on all your
involved bindables before the loop, and enabled them afterwards, right? :)

On Wed, Aug 3, 2016 at 4:04 AM, bilbosax <wa...@comcast.net> wrote:

> What about changing datatypes, does this eat up a lot of time?  My database
> has some values that are typed as string but are actually numbers.  When I
> download them from the database, I assume that they are entered into the
> arraycollection as strings.  In my loops, I have to do math functions on
> them so I force them to a number type as such:
>
> var a:Number = Number(myArray[1].someProp)*10;
>
> When doing this a large number of times in big loops, will it eat up a
> significant amount of time or is type conversion pretty fast?
>
>
>
> --
> View this message in context:
> http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13158.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

What about changing datatypes, does this eat up a lot of time?  My database
has some values that are typed as string but are actually numbers.  When I
download them from the database, I assume that they are entered into the
arraycollection as strings.  In my loops, I have to do math functions on
them so I force them to a number type as such:

var a:Number = Number(myArray[1].someProp)*10;

When doing this a large number of times in big loops, will it eat up a
significant amount of time or is type conversion pretty fast?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13158.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Harbs <ha...@gmail.com>.

I thought that script timeouts only happen in the browser.

I’m assuming this is not a browser Flash app, but an AIR one.

Harbs

On Aug 2, 2016, at 9:58 PM, bilbosax <wa...@comcast.net> wrote:

> To be honest with you, I have no idea why a script timeout isn't happening.
> Until this week I had never heard of one. I thought that maybe timeouts
> happened when you are sitting and waiting for execution to happen and it
> takes too long, like waiting on a web service or a callresponder. My program
> is not ever getting hung up on any operation, it just chugs along through
> the data. Or perhaps it is because the entire time my operation is working,
> the program only advances 1 frame, never stopping, pausing, or waiting.
> Other than that, I don't understand enough about script timeouts to give you
> a good answe as to why I am not receiving one.
> 
> 
> 
> --
> View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13156.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

To be honest with you, I have no idea why a script timeout isn't happening.
Until this week I had never heard of one. I thought that maybe timeouts
happened when you are sitting and waiting for execution to happen and it
takes too long, like waiting on a web service or a callresponder. My program
is not ever getting hung up on any operation, it just chugs along through
the data. Or perhaps it is because the entire time my operation is working,
the program only advances 1 frame, never stopping, pausing, or waiting.
Other than that, I don't understand enough about script timeouts to give you
a good answe as to why I am not receiving one.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13156.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/2/16, 9:07 AM, "bilbosax" <wa...@comcast.net> wrote:

>From what I have learned from this process, I think I am going to change
>my
>logic in the loop a little. I no longer feel that the math functions are
>the
>bottleneck near as much as getting data in and out of the arrays and
>comparing them. So I am going to flip flop the logic and see what happens.
>
>This is probably above my current pay grade, but I will ask about it out
>of
>curiosity. I was reading about FlasCC. Would I see even more performance
>gains if I passed the data to a C++ function, perhaps as an ANE?
>

If you can get the data from AS to C++ efficiently, then I would expect
better performance in C++, but how much still depends on how well the JIT
compiler in Flash can optimize the inner loops.

I would also consider scalability:  If the number of items is growing at
900 per day, eventually you'll have billions of items to convert between
AS and C++ and that might become the bottleneck.  You still haven't
answered how you are able to run this loop without hitting the script
timeout, but you may want to build in some progress indicators for the
user now, and I think that can be a bit tricky when communicating with
ANEs and FlasCC.

-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

From what I have learned from this process, I think I am going to change my
logic in the loop a little. I no longer feel that the math functions are the
bottleneck near as much as getting data in and out of the arrays and
comparing them. So I am going to flip flop the logic and see what happens.

This is probably above my current pay grade, but I will ask about it out of
curiosity. I was reading about FlasCC. Would I see even more performance
gains if I passed the data to a C++ function, perhaps as an ANE?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13154.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> So inside of all my Loops where I calculate the trig, I update an
> arraycollection(containing ObjectProxies) with the new values that I
> calculate.  Do you think if I passed this data to a regular array instead,
> and then over to the ArrayCollection when the processing was finished that I
> may be able to whittle away some of that 24 seconds?

I’d guess it’s probbaly likely as the number of updates are going to be large vs a single update of the entire array collection. But again would be worth a try as it can probably be done in a couple  of lines of code.

Given it's running reasonable quickly now (well comparatively) you may want to try replacing each of the conditions with a named function and see if any of those it taking up the time.

So if you had:

if (a ==b) {
…
}

Change it to:

function compareAB(var a:int, var b:int):boolean {
	return a == b;
}

…


if (compareAB(a,b)) {
...
}

This will run a little slower than having the code inlined but will enable you to work out if any of the conditions are taking up time and where you may be able to optimise further. It’s going to depend on what those conditions are - so that’s just a guess (again).

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

HaHa!! This is becoming quite entertaining and FUN!!  By using a parallel
ArrayCollection of just simple objects and then using it's source as the
array that everything is calculated from, we are now down to 2.7 minutes!! 
WooHoo!!  So here is the breakdown now:

Total time = 162 sec
ObjectProxy.setProperty = 24 seconds
garbage collection = .07 sec

As you predicted, ObjectProxy.getProperty and garbage collection are
essentially 0.

So inside of all my Loops where I calculate the trig, I update an
arraycollection(containing ObjectProxies) with the new values that I
calculate.  Do you think if I passed this data to a regular array instead,
and then over to the ArrayCollection when the processing was finished that I
may be able to whittle away some of that 24 seconds?

>Alex
>That makes me want to ask a potentially important question:  Does this 
>data need to be bindable in the first place?

Alex, I certainly think that it does.  I am using an MXML itemrenderer on
the arraycollection, and the itemrenderer was sending out silent errors that
it was not able to bind to the data it was given.  So I needed it to be
bindable.  The data is also displayed in a datagrid, which needs it to be
bindable.  But using Justin's suggestion of just populating a regular array
with non-objectproxy data in it to do the calculations knocked the time down
to nearly 2.7 minutes.  I could not be more pleased by the strides we have
all accomplished, and by all that I have learned about what goes on inside
of components by profiling.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13145.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

AW: Workers and Speed

Posted by Christofer Dutz <ch...@c-ware.de>.

Well the optimizations I did in order to get my applications to speed up dramatically were on the one side using Arrays instead of collections. The biggest thing I noticed were, that per default my model classes were annotated with [Bindable], which caused every property to be bindable. By explicitly handing the events and not relying on a bindable model, I cut the overhead by 9/10th ... could check this (Have to admit that I haven't read all of this lengthy thread though ... so if I'm suggesting something that's already been suggested ... sorry for that ;-) )


Chris

________________________________
Von: bilbosax <wa...@comcast.net>
Gesendet: Dienstag, 2. August 2016 07:24:17
An: users@flex.apache.org
Betreff: Re: Workers and Speed

Alright!!!  Now we are getting somewhere! Passing the ArrayCollection to a
standard Array cut the time in Half!  From almost 50 minutes down to 23
minutes.  So here is the breakdown now:

Total time = 1396 sec
ObjectProxy.getProperty --> 804 sec
Garbarge Collection --> 198 sec
ObjectProxy.setProperty --> 25 sec
(times related to the ArrayCollection previously are gone!)

I wish there was a way to get rid of some of that ObjectProxy time.
Regardless of it is is an object or an objectproxy or a bindable named
class, there is still going to be some time involved in plucking the data
out of the array to work on.  I don't know how severe of a penalty it is
that the data is inside of an ObjectProxy.  But I also don't understand how
to use the bindable named class either.

The way that it works is I am reading ALL of the data from a database
(SELECT * FROM main), and making the results the source of my
mainArrayCollection that displays in my datagrid and is used in all of my
calculations.  I don't know how to go about taking that data and assigning
it to a bindable class.  Taking each object and converting it to an
objectproxy was a really easy process.  If you think that it would help my
speed problems, could you help me to understand how to use this bindable
class that you and Alex have referred to?

Thanks for all of your help!!!  I've gone from over 2 hours down to 23
minutes!






--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13140.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

This is how the ArrayCollection gets populated:


sqlFile = File.applicationStorageDirectory.resolvePath("myDB.db");
				sqlConn = new SQLConnection();
				sqlConn.open(sqlFile);
				stmt.sqlConnection = sqlConn;
				stmt.text = "SELECT * FROM main";
				stmt.execute();
				
				var result:SQLResult = stmt.getResult();
				mainArrayCollection.source = result.data;
				
				sqlConn.close();


Since I don't ever actually handle the objects or assign any datatypes is
why I am confused on how to use such a bindable class.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13142.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> ObjectProxy.getProperty --> 804 sec
> Garbarge Collection --> 198 sec

With a little work you should be able to make both of these zero.

> I wish there was a way to get rid of some of that ObjectProxy time. 

There is one way. Keep the original Object based array and use that for your calculations. (At the cost of extra memory). Once calculations are done replace the entire ObjectProxy array. While there are probably better more elegant solutions that should be simple to implement and try.

> I don't know how severe of a penalty it is that the data is inside of an ObjectProxy.

The penalty is about 60% of you time from the numbers above. So if you remove it it's likely to run twice as fast (at a guess).

Are you also looking up the same property multiple times in those conditionals? If so then storing the result in a local variable may also help.

>  But I also don't understand how to use the bindable named class either.

Rather than use Objects to populate the array collection use instances of a (bindable) named data class. If you can post a few lines of code where you are populating the array collection Objects you should get an answer.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

Alright!!!  Now we are getting somewhere! Passing the ArrayCollection to a
standard Array cut the time in Half!  From almost 50 minutes down to 23
minutes.  So here is the breakdown now:

Total time = 1396 sec
ObjectProxy.getProperty --> 804 sec
Garbarge Collection --> 198 sec
ObjectProxy.setProperty --> 25 sec
(times related to the ArrayCollection previously are gone!)

I wish there was a way to get rid of some of that ObjectProxy time. 
Regardless of it is is an object or an objectproxy or a bindable named
class, there is still going to be some time involved in plucking the data
out of the array to work on.  I don't know how severe of a penalty it is
that the data is inside of an ObjectProxy.  But I also don't understand how
to use the bindable named class either.

The way that it works is I am reading ALL of the data from a database
(SELECT * FROM main), and making the results the source of my
mainArrayCollection that displays in my datagrid and is used in all of my
calculations.  I don't know how to go about taking that data and assigning
it to a bindable class.  Taking each object and converting it to an
objectproxy was a really easy process.  If you think that it would help my
speed problems, could you help me to understand how to use this bindable
class that you and Alex have referred to?

Thanks for all of your help!!!  I've gone from over 2 hours down to 23
minutes!






--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13140.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> for(var s:Object in mainArrayCollection)

From memory for in loops are slow I’d avoid this if possible - probably where all the length calculations are coming from?

> 				{
> 					proxyArray[s] = new ObjectProxy(mainArrayCollection[s]);
> 				}

If you really need to do this (and have enough memory) keep the mainArrayCollection around and create a proxy one to bind to, but only do the calculations on the original mainArrayCollection. Better still try and avoid the ObjectProxy if you can.

> I truly don't know how to make a bindable class of my own.

Something like this:

package
{
	[Bindable] public class Data
	{
		public var prop:Number;
		public var name:String;
	}
}


> I am in the process of converting my arraycollection to an array

There’s no need to convert, just use mainArrayCollection.source as that will give you the array underlying the array collection.

> As Gary posted earlier, do you think it would be even better if I transferred the data to Vectors?

Perhaps/perhaps not. I’d guess not as it seem to me likely your bottle necks are elsewhere.

Thanks,
Justin

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 8/1/16, 8:42 PM, "bilbosax" <wa...@comcast.net> wrote:

>> Are you using ObjectProxies or a bindable named class?
>
>Yes, I am using ObjectProxies.  I found this little bit of code that
>allowed
>me to convert all of my objects to object proxies so that my itemrenderers
>would see the data as bindable.  I was getting a lot of silent errors, and
>this cleared them up.

Sure, your errors went away, but the cost is performance.  ObjectProxy
removes structure and safety checks which generally help make things
faster and/or more accurate but can take longer to get the code right.

>
>I truly don't know how to make a bindable class of my own.

For some Data class:

public class Customer {
  public var name:String;
  public var address:String;
}

Making it bindable is as simple as adding {Bindable] metadata:

[Bindable]
public class Customer {
  public var name:String;
  public var address:String;
}

If you get errors when using this pattern instead of ObjectProxy, it is
probably because you are accessing properties on the data class that don't
exist on the data class.  IMO, those are worth debugging because the speed
advantage of using data classes instead of ObjectProxy will be significant.

>
>I am in the process of converting my arraycollection to an array and
>seeing
>how that improves my performance.  As Gary posted earlier, do you think it
>would be even better if I transferred the data to Vectors?

As Justin says, no need to convert, the array is already there in the
source property, or if you have filter and sorting on, you can call
toArray().  Then process the Array outside of the AC and replace the
source on the ArrayCollection.  If you want to use Vector it should be a
bit faster, but it may not be a huge factor, and the cost of transferring
to/from Vector could outweigh the gains.

HTH,
-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

> Are you using ObjectProxies or a bindable named class?

Yes, I am using ObjectProxies.  I found this little bit of code that allowed
me to convert all of my objects to object proxies so that my itemrenderers
would see the data as bindable.  I was getting a lot of silent errors, and
this cleared them up.

for(var s:Object in mainArrayCollection)
				{
					proxyArray[s] = new ObjectProxy(mainArrayCollection[s]);
				}
				mainArrayCollection = new ArrayCollection(proxyArray);

I truly don't know how to make a bindable class of my own.

I am in the process of converting my arraycollection to an array and seeing
how that improves my performance.  As Gary posted earlier, do you think it
would be even better if I transferred the data to Vectors?



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13137.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> The results are both interesting, and a little confusing. The main loop took
> 2560 seconds.  ObjectProxy.getProperty (mx utils) took 787 seconds.

Are you using ObjectProxies in your code? It may be the way you are accessing the data as well myAC[1].[“prop”] is AFAIK a lot slower than myAC[1].prop so if that the case I suggest your change your code to be this form. Using named Object may also improve performance considerably here.

> ListCollectionView.getProperty (mx.collections) took 713 seconds.

Looping over the source array directly will eliminate this.

>  Garbage Collection took 184 seconds.

That's still sizeable. To remove try and expanding the scope of any variables you have vared inside the loop to outside it. Be careful of temporary object created during string manipulation and the like (if you have any).

> ListCollectionView.getlength took 29 seconds(this is easily fixed).

Move the length calculation to outside the loops (I mentioned this before).

>  And the trig method I wrote only took 14 seconds.

Great you now know what to optimise first!

At a guess I’d first do this. Outside your loop add (with a better name of course):

var myArray:Array = myAC.source;

and change any references to myAC to myArray and I think you will see significant speed improvement. 

(This is assuming you have no filtering on the AC and the order in the AC doesn’t matter.)

Then rerun Scout and see what the differences are. Repeat until you got it down to your target time.

Perhaps run on a smaller set of data while making these changes then rerun at the end on the full set to confirm.

> I don't know if this is treating it more as an array or an array collection.

It’s treating like an array collection of object proxies this can be slow as it need to look up the properties many many times as the numbers show.

> I don't know if it would all process faster if I initially passed the arraycollection data off
> to a regular array to do all of the processing.

From the numbers above I think this would be significantly faster, but it may depend on what else your code is doing. It’s simple enough to do so I’d give it a try first.

>  I need the object proxies because my itemrenderers won't bind to my arraycollection otherwise.

Are you using ObjectProxies or a bindable named class?

> So the question now is, should I find an alternative to the ObjectProxies
> and try and optimize working with my array collection

That where I would spend my time at a guess you can make the above code lot faster (4-5x at least from the numbers) and that is likely to be better than using Workers alone.

People may also be able to help you a little more if you past a link to the code in question.

Thanks,
Justin

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

So, I finally got Scout to work with my program which turned out to be a
pain(Scout would time out, saying it was out of memory during the hour that
it took for the program to complete the calculations)

The results are both interesting, and a little confusing. The main loop took
2560 seconds. ObjectProxy.getProperty (mx utils) took 787 seconds.
ListCollectionView.getProperty (mx.collections) took 713 seconds. Garbage
Collection took 184 seconds. ListCollectionView.getlength took 29
seconds(this is easily fixed). And the trig method I wrote only took 14
seconds.

This clearly shows that the math is not the bottleneck, getting information
out of the arraycollection and objectproxy objects is the slowest process.
I am addressing the arraycollection as myAC[1].someProperty in all of my
comparisons and calculations as was suggested. I don't know if this is
treating it more as an array or an array collection. I don't know if it
would all process faster if I initially passed the arraycollection data off
to a regular array to do all of the processing. I need the objectproxies
because my itemrenderers won't bind to my arraycollection otherwise.

So the question now is, should I find an alternative to the ObjectProxies
and try and optimize working with my arraycollection, or should I simply
make another worker to chop down this processing time? Math is not the
bottleneck, getting the data together to do the math is the slow part.

--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13134.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> It occurred to me that even if you got it right, if your code doesn't make
> function calls or instantiate objects, there may not be good sample data.

It should still show the self time, garbage collection (if any) and calls the the trig functions being called (i.e. Math.cos or Math.sin).

You could also wrap each conditions statements in a function to see which one of those are eating up the time. Do this would make the code run a little slower but could give you some insight to why it’s taking so long.

Thanks,
Justin

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 7/31/16, 8:25 PM, "bilbosax" <wa...@comcast.net> wrote:

>Hi Justin.  I have never used the additional compiler arguments dialogue
>before.  I profiled my app in Scout as you suggested, and when browsing
>through the Session Info, it says that Advanced Telemetry is disabled, so
>I
>don't know if I entered the additional compiler arguments correctly.  This
>is what I have in the dialogue:
>
>-locale en_US
>-advanced-telemetry=true
>-debug=false

It occurred to me that even if you got it right, if your code doesn't make
function calls or instantiate objects, there may not be good sample data.
IIRC, the ActionScript telemetry samples the call stack and doesn't look
at lines of code in a function.

So you could instrument the code if that's the case.  But as I just
replied to Javier, it really matters if the work can be divided up.  If
so, and you have considered Javier's list of issues, the next thing may
just be to try Workers, unless you think there are other optimizations
that are worth pursuing.

-Alex

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I think I figured it out.  I took the SWF from my installed desktop AIR app
program folder, ran it through the utility, renamed it, and returned it back
to the program folder.  When I ran it, I could see that Advanced Telemetry
is on.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13123.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by bilbosax <wa...@comcast.net>.

I was going to try your utility to turn on the advanced telemetry in my
application because it is not working in the additional compiler arguements
dialogue, but can't figure out how to do it.  I can't drag my deployed app
onto the SWFScoutEnabler because it has to be a SWF.  The only SWF that seem
appropriate is the one in the bin-debug folder, but when I run it from the
desktop, it doesn't work.  Do I need to convert this SWF, rename it, and put
it back in my bin-debug folder and then export a release build?  very
confused.



--
View this message in context: http://apache-flex-users.2333346.n4.nabble.com/Workers-and-Speed-tp13098p13122.html
Sent from the Apache Flex Users mailing list archive at Nabble.com.

Re: Workers and Speed

Posted by Alex Harui <ah...@adobe.com>.

On 7/31/16, 9:40 PM, "Justin Mclean" <ju...@classsoftware.com> wrote:

>Hi,
>
>> -locale en_US
>> -advanced-telemetry=true
>> -debug=false
>
>May be the new lines are confusing it I usually put them all on one line
>like so:
>-locale en_US -advanced-telemetry=true -debug=false

There are some post processing utils as well [1]

Wrapping the trig in a function call might help Scout or FB see how much
time you are spending in the trig.

-Alex

[1] 
http://inflagrantedelicto.memoryspiral.com/2012/12/telemetryeasy-advanced-t
elemetry-utility-for-adobe-scout/

Re: Workers and Speed

Posted by Justin Mclean <ju...@classsoftware.com>.

Hi,

> -locale en_US
> -advanced-telemetry=true
> -debug=false

May be the new lines are confusing it I usually put them all on one line like so:
-locale en_US -advanced-telemetry=true -debug=false

Thanks,
Justin