You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2012/02/07 12:24:03 UTC

[Hadoop Wiki] Trivial Update of "PoweredBy" by LarsFrancke

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PoweredBy" page has been changed by LarsFrancke:
http://wiki.apache.org/hadoop/PoweredBy?action=diff&rev1=394&rev2=395

Comment:
Fix formatting and remove what looks like Spam

  This page documents an alphabetical list of institutions that are using Hadoop for educational or production uses. Companies that offer services on or based around Hadoop are listed in [[Distributions and Commercial Support]]. Please include details about your cluster hardware and size. Entries without this may be mistaken for spam references and deleted.'' ''
  
- To add entries you need write permission to the wiki, which you can get by subscribing to the core-dev@hadoop.apache.org mailing list and asking for the wiki account you have just created to get this permission. If you are using Hadoop in production you ought to consider getting involved in the development process anyway, by filing bugs, testing beta releases, reviewing the code and turning your notes into shared documentation. Your participation in this process will ensure your needs get met.
+ To add entries you need write permission to the wiki, which you can get by subscribing to the common-dev@hadoop.apache.org mailing list and asking for the wiki account you have just created to get this permission. If you are using Hadoop in production you ought to consider getting involved in the development process anyway, by filing bugs, testing beta releases, reviewing the code and turning your notes into shared documentation. Your participation in this process will ensure your needs get met.
  
  {{{
  }}}
@@ -176, +176 @@

    * ''532 nodes cluster (8 * 532 cores, 5.3PB). ''
    * ''Heavy usage of Java MapReduce, Pig, Hive, HBase ''
    * ''Using it for Search optimization and Research. ''
+ 
-  * ''[[http://ecircle.com|eCircle]] ''
+  * ''[[http://ecircle.com|eCircle]]''
-   * ''two 60 nodes cluster each >1000 cores, total 5T Ram, 1PB
+   * ''two 60 nodes cluster each >1000 cores, total 5T Ram, 1PB''
-   * mostly HBase, some M/R
+   * ''mostly HBase, some M/R''
-   * marketing data handling
+   * ''marketing data handling''
+ 
   * ''[[http://www.enet.gr|Enet]], 'Eleftherotypia' newspaper, Greece ''
    * ''Experimental installation - storage for logs and digital assets ''
    * ''Currently 5 nodes cluster ''
@@ -211, +213 @@

  = F =
   * ''[[http://www.facebook.com/|Facebook]] ''
    * ''We use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning. ''
+   * ''Currently we have 2 major clusters:''
-   * ''Currently we have 2 major clusters:    * A 1100-machine cluster with 8800 cores and about 12 PB raw storage. ''
+    * ''A 1100-machine cluster with 8800 cores and about 12 PB raw storage. ''
     * ''A 300-machine cluster with 2400 cores and about 3 PB raw storage. ''
     * ''Each (commodity) node has 8 cores and 12 TB of storage. ''
     * ''We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS. ''
@@ -372, +375 @@

    * ''Used for user profile analysis, statistical analysis,cookie level reporting tools. ''
    * ''Some Hive but mainly automated Java MapReduce jobs that process ~150MM new events/day. ''
  
+  * ''[[https://lbg.unc.edu|Lineberger Comprehensive Cancer Center - Bioinformatics Group]]''
-  * ''[[https://lbg.unc.edu|Lineberger Comprehensive Cancer Center - Bioinformatics Group]] This is the cancer center at UNC Chapel Hill. We are using Hadoop/HBase for databasing and analyzing Next Generation Sequencing (NGS) data produced for the [[http://cancergenome.nih.gov/|Cancer Genome Atlas]] (TCGA) project and other groups. This development is based on the [[http://seqware.sf.net|SeqWare]] open source project which includes SeqWare Query Engine, a database and web service built on top of HBase that stores sequence data types. Our prototype cluster includes: ''
+   * ''This is the cancer center at UNC Chapel Hill. We are using Hadoop/HBase for databasing and analyzing Next Generation Sequencing (NGS) data produced for the [[http://cancergenome.nih.gov/|Cancer Genome Atlas]] (TCGA) project and other groups. This development is based on the [[http://seqware.sf.net|SeqWare]] open source project which includes SeqWare Query Engine, a database and web service built on top of HBase that stores sequence data types. Our prototype cluster includes: ''
-   * ''8 dual quad core nodes running CentOS ''
+    * ''8 dual quad core nodes running CentOS ''
-   * ''total of 48TB of HDFS storage ''
+    * ''total of 48TB of HDFS storage ''
-   * ''HBase & Hadoop version 0.20 ''
+    * ''HBase & Hadoop version 0.20 ''
  
   * ''[[http://www.legolas-media.com|Legolas Media]] ''
  
@@ -391, +395 @@

      * ''Pig 0.9 heavily customized ''
      * ''Azkaban for scheduling ''
      * ''Hive, Avro, Kafka, and other bits and pieces... ''
- 
-  * ''We use these things for discovering People You May Know and [[http://www.linkedin.com/careerexplorer/dashboard|other]] [[http://inmaps.linkedinlabs.com/|fun]] [[http://www.linkedin.com/skills/|facts]]. ''
+   * ''We use these things for discovering People You May Know and [[http://www.linkedin.com/careerexplorer/dashboard|other]] [[http://inmaps.linkedinlabs.com/|fun]] [[http://www.linkedin.com/skills/|facts]]. ''
  
   * ''[[http://www.lookery.com|Lookery]] ''
    * ''We use Hadoop to process clickstream and demographic data in order to create web analytic reports. ''
@@ -524, +527 @@

    * ''Also used as a proof of concept cluster for a cloud based ERP system. ''
  
   * ''[[http://www.psgtech.edu/|PSG Tech, Coimbatore, India]] ''
-   * ''[[http://www.kraloyun.gen.tr/yeni-oyunlar/|Yeni Oyunlar]] ''
-   * ''[[http://www.ben10oyun.net/|Ben 10 Oyunları]] ''
-   * ''[[http://www.giysilerigiydirmeoyunlari.com/|Giysi Giydirme]]
    * ''Multiple alignment of protein sequences helps to determine evolutionary linkages and to predict molecular structures. The dynamic nature of the algorithm coupled with data and compute parallelism of Hadoop data grids improves the accuracy and speed of sequence alignment. Parallelism at the sequence and block level reduces the time complexity of MSA problems. The scalable nature of Hadoop makes it apt to solve large scale alignment problems. ''
    * ''Our cluster size varies from 5 to 10 nodes. Cluster nodes vary from 2950 Quad Core Rack Server, with 2x6MB Cache and 4 x 500 GB SATA Hard Drive to E7200 / E7400 processors with 4 GB RAM and 160 GB HDD. ''
  
@@ -694, +694 @@

    . ''We currently run one medium-sized Hadoop cluster (1.6PB) to store and serve up physics data for the computing portion of the Compact Muon Solenoid (CMS) experiment. This requires a filesystem which can download data at multiple Gbps and process data at an even higher rate locally. Additionally, several of our students are involved in research projects on Hadoop. ''
  
   * ''[[http://db.cs.utwente.nl|University of Twente, Database Group]] ''
-   . ''We run a 16 node cluster (dual core Xeon E3110 64 bit processors with 6MB cache, 8GB main memory, 1TB disk) as of December 2008. We teach MapReduce and use Hadoop in our computer science master's program, and for information retrieval research. For more information, see: http://mirex.sourceforge.net/
+   . ''We run a 16 node cluster (dual core Xeon E3110 64 bit processors with 6MB cache, 8GB main memory, 1TB disk) as of December 2008. We teach MapReduce and use Hadoop in our computer science master's program, and for information retrieval research. For more information, see: http://mirex.sourceforge.net/''
  
  = V =
   * ''[[http://www.veoh.com|Veoh]] ''
@@ -703, +703 @@

   * ''[[http://www.vibyggerhus.se/|Bygga hus]] ''
    * ''We use a Hadoop cluster to for search and indexing for our projects. ''
  
+  * ''[[http://www.visiblemeasures.com|Visible Measures Corporation]]
-  * ''[[http://www.visiblemeasures.com|Visible Measures Corporation]] uses Hadoop as a component in our Scalable Data Pipeline, which ultimately powers !VisibleSuite and other products. We use Hadoop to aggregate, store, and analyze data related to in-stream viewing behavior of Internet video audiences. Our current grid contains more than 128 CPU cores and in excess of 100 terabytes of storage, and we plan to grow that substantially during 2008. ''
+   . uses Hadoop as a component in our Scalable Data Pipeline, which ultimately powers !VisibleSuite and other products. We use Hadoop to aggregate, store, and analyze data related to in-stream viewing behavior of Internet video audiences. Our current grid contains more than 128 CPU cores and in excess of 100 terabytes of storage, and we plan to grow that substantially during 2008. ''
  
   * ''[[http://www.vksolutions.com/|VK Solutions]] ''
    * ''We use a small Hadoop cluster in the scope of our general research activities at [[http://www.vklabs.com|VK Labs]] to get a faster data access from web applications. ''