You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2008/10/17 05:02:44 UTC

[jira] Created: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Use qMC sequence to improve the accuracy of PiEstimator
-------------------------------------------------------

                 Key: HADOOP-4437
                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
             Project: Hadoop Core
          Issue Type: Improvement
          Components: examples
            Reporter: Tsz Wo (Nicholas), SZE
            Priority: Minor


Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4437:
-------------------------------------------

    Fix Version/s: 0.20.0
         Assignee: Tsz Wo (Nicholas), SZE
     Hadoop Flags: [Reviewed]
           Status: Patch Available  (was: Open)

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644857#action_12644857 ] 

Chris Douglas commented on HADOOP-4437:
---------------------------------------

Since it's in the examples, a few brief notes on the initialization of HaltonSequence and on nextPoint would help orient readers. It doesn't need to be a course in statistics- the existing code isn't, either- but even a sentence or two on why PiEstimator is using it and perhaps a couple comments identifying the variables would be helpful.

+1 on the patch, though; documentation is just a suggestion.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>         Attachments: 4437_20081019.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645054#action_12645054 ] 

Konstantin Shvachko commented on HADOOP-4437:
---------------------------------------------

This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits?
Now, as an example it should imo have much better documentation.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4437:
-------------------------------------------

    Attachment: 4437_20081103.patch

4437_20081103.patch: added javadoc, no code changes.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645062#action_12645062 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4437:
------------------------------------------------

> This is nice. I understand it is just an example but if we run more maps for a longer period of time can we get more Pi digits?

Yes, the more samples used the more digits will get in both Monte Carlo method (java.util.Random) and qMC method (Halton sequence).

However, the discrepancy for Halton sequence is smaller than java.util.Random.  The expected error of java.util.Random is O(1/sqrt(N)) while the expected error of using Halton sequence is O((ln N)/N), where N is the number for samples.  For estimating Pi with 100,000,000 samples, the accuracy of Halton is ~7 digits but java.util.Random is only ~4 digits as shown previously.

> Now, as an example it should imo have much better documentation.

This is a good point.  I plan to further improve the PiEstimator.  Let me also improve the documentation in the next issue.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4437:
-------------------------------------------

    Attachment: 4437_20081019.patch

4437_20081019.patch: replace java.util.Random with Halton sequence.

Try totally 100000000 samples
- Before the patch:
Job Finished in 22.422 seconds
Estimated value of PI is 3.14145832

- After the patch:
Job Finished in 13.375 seconds
Estimated value of PI is 3.14159256000000000000


> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>         Attachments: 4437_20081019.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4437:
-------------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tested manually with a 100-nodes cluster.  The biggest sample set I have run was 

- 1000 maps and 10000000 samples per map.
Job Finished in 67.337 seconds
Estimated value of PI is 3.14159264520000000000

I just committed this.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4437) Use qMC sequence to improve the accuracy of PiEstimator

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645055#action_12645055 ] 

Hadoop QA commented on HADOOP-4437:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12393275/4437_20081103.patch
  against trunk revision 709609.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3527/console

This message is automatically generated.

> Use qMC sequence to improve the accuracy of PiEstimator
> -------------------------------------------------------
>
>                 Key: HADOOP-4437
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4437
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: examples
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: 4437_20081019.patch, 4437_20081103.patch
>
>
> Currently, PiEstimator uses java.util.Random to generate random 2d-points for estimating pi. The numbers generated by java.util.Random are uniformly distributed.  The 2d-points generated tense to have clump and gap. So the accuracy of the estimated pi is low.  The accuracy can be improved by using a quasi-Monte Carlo (qMC) sequence.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.