You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@opennlp.apache.org by bg...@apache.org on 2016/11/16 09:11:20 UTC

[24/51] [partial] opennlp-sandbox git commit: merge from bgalitsky's own git repo

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/101TediRoslingH_Poverty_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/101TediRoslingH_Poverty_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/101TediRoslingH_Poverty_EN.txt.txt
new file mode 100644
index 0000000..d40be93
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/101TediRoslingH_Poverty_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff I told you three things last year . I told you that the statistics of the world have not been made properly available . Because of that , we still have the old mindset of developing and industrialized countries , which is wrong . And that animated graphics can make a difference . Things are changing . And today , on the United Nations Statistic Division Home Page , it says , by first of May , full access to the databases . ( Applause ) And if I could share the image with you on the screen . So three things had happened . U. N. opened their statistic databases , and we have a new version of the software up working as a beta on the net , so you do n't have to download it any longer . And let me repeat what you saw last year . The bubbles are the countries . Here you have the fertility rate -- the number of children per woman -- and there you have the length of life in years . This is 1950 -- those were the industrialized countries , those were developing countries . At that time t
 here was a " we " and " them . " There was a huge difference in the world . But then it changed , and it went on quite well . And this is what happens . You can see how China is the red , big bubble ; the blue there is India . And they go over all this ... I 'm going to try to be a little more serious this year in showing you how things really changed . And it 's Africa which stands out as the problem down here , does n't it ? Large families still , and the HIV epidemic brought down the countries like this . This is more or less what we saw last year , and this is how it will go on into the future . And I will talk on , is this possible ? Because you see now , I presented statistics that do n't exist . Because this is where we are . Will it be possible that this will happen ? I cover my lifetime here , you know ? I expect to live 100 years . And this is where we are today . Now could we look here at instead the economic situation in the world ? And I would like to show that against 
 child survival . We 'll swap the axis : here you have child mortality -- that is , survival -- four kids dying there , 200 dying there . And this is GDP per capita on this axis . And this was 2007. And if I go back in time , I 've added some historical statistics -- here we go , here we go , here we go -- not so much statistics 100 years ago . Some countries still had statistics . We are looking down in the archive , and where we are down into 1820 , there is only Austria and Sweden that can produce numbers . ( Laughter ) But they were down here , they had 1,000 dollars per person per year . And they lost one-fifth of their kids before their first birthday . So this is what happens in the world , if we play the entire world . How they got slowly richer and richer , and they add statistics . Is n't it beautiful when they get statistics ? You see the importance of that ? And here , children do n't live longer . The last century , 1870 , was bad for the kids in Europe , because most of
  this statistics is Europe . It was only by the turn of the century that more than 90 percent of the children survived their first year . This is India coming up , with the first data from India . And this is the United States moving away here , earning more money . And we will soon see China coming up in the very far end corner here . And it moves up with Mao Tse-Tung getting health , not getting so rich . There he died , then Deng Xiaoping brings money , it moves this way over here . And the bubbles keep moving up there , and this is what the world looks like today . ( Applause ) Let us have a look at the United States . We have a function here -- I can tell the world , " Stay where you are . " And I take the United States -- we still want to see the background -- I put them up like this , and now we go backwards . And we can see that the United States goes to the right of the mainstream . They are on the money side all the time . And down in 1915 , the United States was a neighbo
 r of India -- present , contemporary India . And that means United States was richer , but lost more kids than India is doing today , proportionally . And look here -- compare to the Philippines of today . The Philippines of today has almost the same economy as the United States during the First World War . But we have to bring United States forward quite a while to find the same health of the United States as we have in the Philippines . About 1957 here , the health of the United States is the same as the Philippines . And this is the drama of this world which many call globalized , is that Asia , Arabic countries , Latin America , are much more ahead in being healthy , educated , having human resources than they are economically . There 's a discrepancy in what 's happening today in the emerging economies . There now , social benefits , social progress , are going ahead of economical progress . And 1957 -- the United States had the same economy as Chile has today . And how long do
  we have to bring United States to get the same health as Chile has today ? I think we have to go , there -- we have 2001 , or 2002 -- the United States has the same health than Chile . Chile 's catching up ! Within some years Chile may have better child survival than the United States . This is really a change , that you have this lag of more or less 30 , 40 years ' difference on the health . And behind the health is the educational level . And there 's a lot of infrastructure things , and general human resources are there . Now we can take away this -- and I would like to show you the rate of speed , the rate of change , how fast they have gone . And we go back to 1920 , and I want to look at Japan . And I want to look at Sweden and the United States . And I 'm going to stage a race here between this sort of yellowish Ford here and the red Toyota down there , and the brownish Volvo . ( Laughter ) And here we go , here we go . The Toyota has a very bad start down here , you can see
  , and the United States Ford is going off-road there . And the Volvo is doing quite fine . This is the war . The Toyota got off track , and now the Toyota is coming on the healthier side of Sweden -- can you see that ? And they are taking over Sweden , and they are now healthier than Sweden . That 's the part where I sold the Volvo and bought the Toyota . ( Laughter ) And now we can see that the rate of change was enormous in Japan . They really caught up . And this changes gradually . We have to look over generations to understand it . And let me show you my own sort of family history -- we made these graphs here . And this is the same thing , money down there , and health , you know ? And this is my family . This is Sweden , 1830 , when my great-great-grandma was born . Sweden was like Sierra Leone today . And this is when great-grandma was born , 1863. And Sweden was like Mozambique . And this is when my grandma was born , 1891. She took care of me as a child , so I 'm not talki
 ng about statistic now -- now it 's oral history in my family . That 's when I believe statistics , when it 's grandma-verified statistics . ( Laughter ) I think it 's the best way of verifying historical statistics . Sweden was like Ghana . It 's interesting to see the enormous diversity within sub-Saharan Africa . I told you last year , I 'll tell you again , my mother was born in Egypt , and I -- who am I ? I 'm the Mexican in the family . And my daughter , she was born in Chile , and the grand-daughter was born in Singapore , now the healthiest country on this Earth . It bypassed Sweden about two to three years ago , with better child survival . But they 're very small , you know . They 're so close to the hospital we can never beat them out in these forests . ( Laughter ) But homage to Singapore . Singapore are the best ones , now . Now this looks also like a very good story . But it 's not really that easy , that it 's all a good story . Because I have to show you one of the o
 ther facility . We can also make the color here represent the variable -- and what am I choosing here ? Carbon-dioxide emission , metric ton per capita . This is 1962 , and United States was emitting 16 tons per person . And China was emitting 0.6 , and India was emitting 0.32 tons per capita . And what happens when we moved on ? Well , you see the nice story of getting richer and getting healthier -- everyone did it at the cost of emission of carbon dioxide . There is no one who has done it so far . And we do n't have all the updated data any longer , because this is really hot data today . And there we are , 2001. And in the discussion I attended with global leaders , you know , many say now , the problem is the emerging economies , they are getting out too much carbon dioxide . The Minister of the Environment of India said , " Well , you were the one who caused the problem . " The OECD countries -- the high-income countries -- they were the ones who caused the climate change . " 
 But we forgive you , because you did n't know it . But from now on , we count per capita . From now on we count per capita . And everyone is responsible for the per capita emission . " This really shows you , we have not seen good economic and health progress anywhere in the world without destroying the climate . And this is really what has to be changed . I 've been criticized for showing you a too positive image of the world , but I do n't think it 's like this . The world is quite a messy place . This we can call Dollar Street . Everyone lives on this street here . What they earn here -- what number they live on -- is how much they earn per day . This family earns about one dollar per day . We drive up the street here , we find a family here which earns about two to three dollars a day . And we drive away here -- we find the first garden in the street , and they earn 10 to 50 dollars a day . And how do they live ? If we look at the bed here , we can see that they sleep on a rug o
 n the floor . This is what poverty line is -- 80 percent of the family income is just to cover the energy needs , the food for the day . This is two to five dollars , you have a bed . And here it 's a much nicer bedroom , you can see . I lectured on this for Ikea , and they wanted to see the sofa immediately here . ( Laughter ) And this is the sofa , how it will emerge from there . And the interesting thing , when you go around here in the photo panorama , you see the family still sitting on the floor there , although there is a sofa . If you watch in the kitchen , you can see that the great difference for women does not come between one to 10 dollar . It comes beyond here , when you really can get good working conditions in the family . And if you really want to see the difference , you look at the toilet over here . This can change , this can change . These are all pictures and images from Africa , and it can become much better . We can get out of poverty . My own research has not
  been in IT or anything like this . I spent 20 years in interviews with African farmers who were on the verge of famine . And this is the result of the farmers-needs research . The nice thing here is that you ca n't see who are the researchers in this picture . That 's when research functions for societies -- you must really live with the people . When you 're in poverty , everything is about survival . It 's about having food . And these two young farmers , they are girls now -- because the parents are dead from HIV and AIDS -- they discuss with a trained agronomist . This is one of the best agronomists in Malawi , Junatambe Kumbira , and he 's discussing what sort of cassava they will plant -- the best converter of sunshine to food that man has found . And they are very , very eagerly interested to get advice , and that 's to survive in poverty . That 's one context . Getting out of poverty . The women told us one thing . " Get us technology . We hate this mortar , to stand hours 
 and hours . Get us a mill so that we can mill our flour , then we will be able to pay for the rest ourselves . " Technology will bring you out of poverty , but there 's a need for a market to get away from poverty . And this woman is very happy now , bringing her products to the market . But she 's very thankful for the public investment in schooling so she can count , and wo n't be cheated when she reaches the market . She wants her kid to be healthy , so she can go to the market and does n't have to stay home . And she wants the infrastructure -- it is nice with a paved road There 's also good with credit . Micro-credits gave her the bicycle , you know . And information will tell her when to go to market with which product . You can do this . I find my experience from 20 years of Africa is that the seemingly impossible is possible . Africa has not done bad . In 50 years they 've gone from a pre-Medieval situation to a very decent 100-year-ago Europe , with a functioning nation and
  state . I would say that sub-Saharan Africa has done best in the world during the last 50 years . Because we do n't consider where they came from . It 's this stupid concept of developing countries which puts us , Argentina and Mozambique together 50 years ago , and says that Mozambique did worse . We have to know a little more about the world . I have a neighbor who knows 200 types of wine . He knows everything . He knows the name of the grape , the temperature and everything . I only know two types of wine -- red and white . ( Laughter ) But my neighbor only knows two types of countries -- industrialized and developing . And I know 200 , I know about the small data . But you can do that . ( Applause ) But I have to get serious . And how do you get serious ? You make a PowerPoint , you know ? ( Laughter ) Homage to the Office package , no ? What is this , what is this , what am I telling ? I 'm telling you that there are many dimensions of development . Everyone wants your pet thi
 ng . If you are in the corporate sector , you love micro-credit . If you are fighting in a non-governmental organization , you love equity between gender . Or if you are a teacher , you 'll love UNESCO , and so on . On the global level , we have to have more than our own thing . We need everything . All these things are important for development , especially when you just get out of poverty and you should go towards welfare . Now , what we need to think about is , what is a goal for development , and what are the means for development ? Let me first grade what are the most important means . Economic growth to me , as a public-health professor , is the most important thing for development , because it explains 80 percent of survival . Governance . To have a government that functions -- that 's what brought California out of the misery of 1850. It was the government which made law function finally . Education , human resources are important . Health is also important , but not that mu
 ch as a mean . Environment is important . Human rights is also important , but it just gets one cross . Now what about goals ? Where are we going toward ? We are not interested in money . Money is not a goal . It 's the best mean , but I give it zero as a goal . Governance , well it 's fun to vote in a little thing , but it 's not a goal . And going to school , that 's not a goal , it 's a mean . Health I give two points . I mean it 's nice to be healthy -- at my age especially -- you can stand here , you 're healthy . And that 's good , it gets two plusses . Environment is very , very crucial . There 's nothing for the grandkid if you do n't save up . But where are the important goals ? Of course , it 's human rights . Human rights is the goal , but it 's not that strong of a mean for achieving development . And culture . Culture is the most important thing , I would say , because that 's what brings joy to life . That 's the value of living . So the seemingly impossible is possibl
 e . Even African countries can achieve this . And I 've shown you the shot where the seemingly impossible is possible . And remember , please remember my main message , which is this : the seemingly impossible is possible . We can have a good world . I showed you the shots , I proved it in the PowerPoint , and I think I will convince you also by culture . ( Laughter ) ( Applause ) Bring me my sword ! Sword swallowing is from ancient India . It 's a cultural expression that for thousands of years has inspired human beings to think beyond the obvious . ( Laughter ) And I will now prove to you that the seemingly impossible is possible by taking this piece of steel -- solid steel -- this is the army bayonet from the Swedish Army , 1850 , in the last year we had war . And it 's all solid steel -- you can hear here . And I 'm going to take this blade of steel , and push it down through my body of blood and flesh , and prove to you that the seemingly impossible is possible . Can I request 
 a moment of absolute silence ? ( Applause ) 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/102TediSinclairC_OpenArchitech_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/102TediSinclairC_OpenArchitech_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/102TediSinclairC_OpenArchitech_EN.txt.txt
new file mode 100644
index 0000000..143b858
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/102TediSinclairC_OpenArchitech_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff I 'm going to take you on a journey very quickly . To explain the wish , I 'm going to have to take you somewhere which many people have n't been , and that 's around the world . When I was about 24 years old , Kate Store and myself started an organization to get architects and designers involved in humanitarian work . Not only about responding to natural disasters , but involved in systemic issues . We believed that where the resources and expertise are scarce , innovative , sustainable design can really make a difference in people 's lives . So this all began my -- I started my life as an architect , or training as an architect , and I was always interested in socially responsible design , and how you can really make an impact . But when I went to architectural school , it seemed that I was the black sheep in the family . Many architects seemed to think that when you design , you design a jewel , and it 's a jewel that you try and crave for . Whereas I felt that when you desig
 n , you either improve or you create a detriment to the community in which you 're designing in . So you 're not just doing a building for the residents or for the people who are going to use it , but for the community as a whole . And in 1999 , we started by responding to the issue of the housing crisis for returning refugees in Kosovo and I did n't know what I was doing , like I say , mid-20s , and I 'm the , I 'm the Internet generation , so we started a website . We put a call out there , and to my surprise in a couple of months we had hundreds of entries from around the world . That led to a number of prototypes being built and really experimenting with some ideas . Two years later we started doing a project on developing mobile health clinics in sub-Saharan Africa , responding to the HIV/AIDS pandemic . That -- that led to 550 entries from 53 countries . We also have designers from around the world that participate . And we had an exhibit of work that followed that . 2004 was 
 the tipping point for us . We started responding to natural disasters and getting involved in Iran and Bam , also following up on our work in Africa . Working within the United States , most people look at poverty and they see the face of a foreigner , but go live -- I live in Bozeman , Montana -- go up to the north plains on the reservations , or go down to Alabama or Mississippi pre-Katrina , and I could have shown you places that have far worse conditions than many developing countries I 've been to . So we got involved in and worked in inner cities and elsewhere . And then also I will go into some more projects . 2005 Mother Nature kicked our arse . I think we can pretty much assume that 2005 was a horrific year when it comes to natural disasters . And because of the Internet , and because of connections to blogs and so forth , within literally hours of the tsunami , we were already raising funds , getting involved , working with people on the ground . We run from a couple of la
 ptops in the first couple of days , I had 4,000 emails from people needing help . So we began to get involved in projects there , and I 'll talk about some others . And then of course , this year we 've been responding to Katrina , as well as following up on our reconstruction works . This is a brief overview . In 2004 , I really could n't manage the number of people who wanted to help , or the number of requests that I was getting . It was all coming into my laptop and cell phone . So we decided to embrace an open -- basically an open source model of business , that anyone , anywhere in the world , could start a local chapter , and they can get involved in local problems . Because I believe there is no such thing as Utopia . All problems are local . All solutions are local . So , and that means , you know , somebody who is based in , in Mississippi , knows more about Mississippi than I do . So what happened is , we used MeetUp and all these other kind of Internet tools , and we end
 ed up having 40 chapters starting up , thousands of architects in 104 countries . So the , the bullet point -- sorry , I never do a suit , so I knew that I was going to take this off . OK , because I 'm going to do it very quick . So in the past seven years , this is n't just about nonprofit . What it showed me is that there 's a grassroots movement going on of socially responsible designers who really believe that this world has got a lot smaller , and that we have the opportunity -- not the responsibility , but the opportunity -- to really get involved in making change . ( Laughter ) I 'm adding that to my time . So what you do n't know is , we 've got these thousands of designers working around the world , connected basically by a website , and we have a staff of three . By doing something , the fact that nobody told us we could n't do it , we did it . And so there 's something to be said about naivete . So seven years later , we 've developed so that we 've got advocacy , instig
 ation and implementation . We advocate for good design , not only through student workshops and lectures and public forums , op-eds , we have a book on humanitarian work , but also disaster mitigation and dealing with public policy . We can talk about FEMA , but that 's another talk . Instigation , developing ideas with communities and NGOs doing open-source design competitions . Referring , matchmaking with communities and then implementing -- actually going out there and doing the work , because when you invent , it 's never a reality until it 's built . So it 's really important that if we 're designing and trying to create change , we build that change . So here 's a select number of projects . Kosovo . This is Kosovo in '99 . We did an open design competition , like I said . It led to a whole variety of ideas , and this was n't about emergency shelter , but transitional shelter that would last five to 10 years , that would be placed next to the land the resident lived in , and 
 that they would rebuild their own home . This was n't imposing an architecture on a community , this was giving them the tools and , and the space to allow them to rebuild and regrow the way they want to . We have from the sublime to the ridiculous , but they worked . This is an inflatable hemp house . It was built ; it works . This is a shipping container . Built and works . And a whole variety of ideas that not only dealt with architectural building , but also the issues of governance and the idea of creating communities through complex networks . So we 've engaged not just designers , but also , you know , a whole variety of technology-based professionals . Using rubble from destroyed homes to create new homes . Using strawbale construction , creating heat walls . And then something remarkable happened in '99 . We went to Africa , originally to look at the housing issue . Within three days , we realized the problem was not housing ; it was the growing pandemic of HIV/AIDS . And i
 t was n't doctors telling us this ; it was actual villagers that we were staying with . And so we came up with the bright idea that instead of getting people to walk 10 , 15 kilometers to see doctors , you get the doctors to the people . And we started engaging the the medical community . And I thought , you know , we thought we were real bright , you know , sparks -- we 've come up with this great idea , mobile health clinics that can -- widely distributed throughout sub-Saharan Africa . And the community , the medical community there said , " We 've said this for the last decade . We know this . We just do n't know how to show this . " So in a way , we had taken a pre-existing need and shown solutions . And so again , we had a whole variety of ideas that came in . This one I personally love , because the idea that architecture is not just about solutions , but about raising awareness . This is a kenaf clinic . You get seed and you grow it in a plot of land , and then once -- and i
 t grows 14 feet in a month . And on the fourth week , the doctors come and they mow out an area , put a tensile structure on the top and when the doctors have finished treating and seeing patients and villagers , you cut down the clinic and you eat it . It 's an Eat Your Own Clinic . ( Laughter ) So it 's dealing with the fact that if you have AIDS , you also need to have nutrition rates , and the idea that the idea of nutrition is as important as getting anti-retrovirals out there . So you know , this is a serious solution . This one I love . The idea is it 's not just a clinic -- it 's a community center . This looked at setting up trade routes and economic engines within the community , so it can be a self-sustaining project . Every one of these projects is sustainable . That 's not because I 'm a tree-hugging green person . It 's because when you live on four dollars a day , you 're living on survival and you have to be sustainable . You have to know where your energy is coming 
 from . You have to know where your resource is coming from . And you have to keep the maintenance down . So this is about getting an economic engine , and then at night it turns into a movie theater . So it 's not an AIDS clinic . It 's a community center . So you can see ideas . And these ideas developed into prototypes , and they were eventually built . And currently as of this year , there are clinics rolling out in Nigeria and Kenya . From that we also developed Siyathemba , which was a project -- the community came to us and said , the problem is that the girls do n't have education . And we 're working in an area where young women between the ages of 16 and 24 have a 50 percent HIV/AIDS rate . And that 's not because they 're promiscuous , it 's because there 's no knowledge . And so we decided to look at the idea of sports and create a youth sports center that doubled as an HIV/AIDS outreach center , and the coaches of the girls ' team were also trained doctors . So that ther
 e would be a very slow way of developing kind of confidence in health care . And we picked nine finalists , and then those nine finalists were distributed throughout the entire region , and then the community picked their design . They said , this is our design , because it 's not only about engaging a community , it 's about empowering a community and about getting them to be a part of the rebuilding process . So the winning design is here , and then of course , we actually go and work with the community and the clients . So this is the designer . He 's out there working with the first ever women 's soccer team in Kwa-Zulu Natal , Siyathemba , and they can tell it better . Video : Well , my name is Sisi , because I work at the African center . I 'm a consultant and I 'm also the national football player for South Africa , Bafana Bafana , and I also play in the Vodacom League for the team called Tembisa , which has now changed to Siyathemba . This is our home ground . Cameron Sincla
 ir : So I 'm going to show that later because I 'm running out of time . I can see Chris looking at me slyly . This was a connection , just a meeting with somebody who wanted to develop Africa 's first telemedicine center , in Tanzania . And we met , literally , a couple of months ago . We 've already developed a design , and the team is over there , working in partnership . This was a matchmaking , thanks to a couple of TEDsters : [ unclear ] Cheryl Heller and Andrew Zolli , who connected me with this amazing African woman . And we start construction in June , and it will be opened by TEDGlobal . So when you come to TEDGlobal , you can check it out . But what we 're known probably most for is dealing with disasters and development , and we 've been involved in a lot of issues , such as the tsunami and also things like Hurricane Katrina . This is a 370 dollar shelter that can be easily assembled . This is a community design . A community-designed community center . And what that mea
 ns is we actually live and work with the community , and they 're part of the design process . The kids actually get involved in mapping out where the the community center should be , and then eventually , the community is actually , through skills training , end up building the building with us . Here is another school . This is what the U. N. gave these guys for six months -- 12 plastic tarps . This was in August . This was the replacement , and it 's supposed to last for two years . When the rain comes down , you ca n't hear a thing , and in the summer it 's about 140 degrees inside . So we said , if the rain 's coming down , let 's get fresh water . So every one of our schools have rain water collection systems , very low cost . A class , three classrooms and rainwater collection is five thousand dollars . This was raised by hot chocolate sales in Atlanta . It 's built by the parents of the kids . The kids are out there on-site , building the buildings . And it opened a couple o
 f weeks ago , and there 's 600 kids that are now using the schools . ( Applause ) So , disaster hits home . We 've see the bad stories on CNN and Fox and all that , but we do n't see the good stories . Here is a community that got together and they said no to wait , to waiting . They formed a partnership , a diverse partnership of players to actually map out East Biloxi , to figure out who is getting involved . We 've had 1,500 volunteers rebuilding , rehabbing homes . Figuring out what FEMA regulations are , not waiting for them to dictate to us how you should rebuild . Working with residents , getting out -- them out of their homes , so they do n't get ill . This is what they 're cleaning up on their own . Designing housing . This house is going to go in , in a couple of weeks . This is a rehabbed home , done in four days. This is a utility room for a woman who is on a walker . She 's 70 years old . This is what FEMA gave her . 600 bucks , happened two days ago . We put together v
 ery quickly a washroom . It 's built , it 's running and she just started a business today , where she 's washing other peoples clothes . This is Shandra and the Calhouns . They 're photographers who have documented the Lower Ninth for the last 40 years . That was their home , and these are the photographs they took . And we 're helping , working with them to create a new building . Projects we 've done . Projects we 've been a part of , support . Why do n't aid agencies do this ? This is the U. N. tent . This is the new U. N. tent , just introduced this year . Quick to assemble . It 's got a flap , that 's the invention . It took 20 years to design this and get it implemented in the field . I was 12 years old . There 's a problem here . Luckily , we 're not alone . There are hundreds and hundreds and hundreds and hundreds and hundreds of architects and designers and inventors around the world that are getting involved in humanitarian work . More hemp houses -- it 's a theme in Japa
 n apparently . I 'm not sure what they 're smoking . This is a grip clip designed by somebody who said , all you need is some way to attach membrane structures to physical support beams . This guy , designed for NASA -- is now doing housing . I 'm going to whip through this quickly , because I know I 've got only a couple of minutes . So this is all done in the last two years . I showed you something that took 20 years to do . And this is just a selection of things that got happened in -- that were built in the last couple of years . From Brazil to India , Mexico , Alabama , China , Israel , Palestine , Vietnam . The average age of a designer who gets involved in this project is 32 -- that 's how old I am . So it 's a young -- I just have to stop here , because Arup is in the room and this is the best-designed toilet in the world . If you 're ever , ever in India , go use this toilet . ( Laughter ) Chris Luebkeman will tell you why . I 'm sure that 's how he wanted to spend the part
 y , but -- but the future is not going to be the sky-scraping cities of New York , but this . And when you look at this , you see crisis . What I see is many , many inventors . One billion people live in abject poverty . We hear about them all the time . Four billion live in growing but fragile economies . One in seven live in unplanned settlements . If we do nothing about the housing crisis that 's about to happen , in 20 years , one in three people will live in an unplanned settlement or a refugee camp . Look left , look right : one of you will be there . How do we improve the living standards of five billion people ? With 10 million solutions . So I wish to develop a community that actively embraces innovative and sustainable design to improve the living conditions for everyone . Chris Anderson : Wait a second . Wait a second . That 's your wish ? CS : That 's my wish . CA : That 's his wish ! ( Applause ) We started Architecture for Humanity with 700 dollars and a website . So C
 hris somehow decided to give me 100,000 dollars . So why not this many people ? Open-source architecture is the way to go . You have a diverse community of participants -- and we 're not just talking about inventors and designers , but we 're talking about the funding model . My role is not as a designer ; it 's a conduit between the design world and the humanitarian world . And what we need is something that replicates me globally , because I have n't slept in seven years . ( Laughter ) Secondly , what will this thing be ? Designers want to respond to issues of humanitarian crisis , but they do n't want some company in the West taking their idea and basically profiting from it . So Creative Commons has developed the developing nations license . And what that means that a designer can -- the Siyathemba project I showed was the first ever building to have a Creative Commons license on it . As soon as that is built , anyone in Africa or any developing nation can take the construction 
 documents and replicate it for free . ( Applause ) So why not allow designers the opportunity to do this , but still protecting their rights , here ? We want to have a community where you can upload ideas , and those ideas can be tested in earthquake , in flood , in all sorts of austere environments . The reason that 's important is I do n't want to wait for the next Katrina to find out if my house works . That 's too late . We need to do it now . So doing that globally . And I want this whole thing to work multi-lingually . When you look at the face of an architect , most people think a gray-haired white guy . I do n't see that . I see the face of the world . So I want everyone from all over the planet , to be able to be a part of this design and development . The idea of needs-based competitions -- X-Prize for the other 98 percent , if you want to call it that . We also want to look at ways of matchmaking and putting funding partners together . And the idea of integrating manufact
 urers -- fab labs in every country . When I hear about the 100 dollar laptop and it 's going to educate every child , educate every designer in the world . Put one in every favela , every slum settlement , because you know what , innovation will happen . And I need to know that . It 's called the leap-back . We talk about leapfrog technologies . I write with Worldchanging , and the one thing we 've been talking about is , I learn more on the ground than I 've ever learned here . So let 's take those ideas , adapt them and we can use them . These ideas are supposed to have adaptable , they 're allowed to be -- they should have the potential for evolution , they should be developed by every nation on the world and useful for every nation on the world . What will it take ? There should be a sheet . I do n't have time to read this , because I 'm going to be yanked off . CA : Just leave it up there for a sec . CS : Well , what will it take ? You guys are smart . So it 's going to take a 
 lot of computing power , because I want this to -- I want the idea that any laptop anywhere in the world can plug into the system and be able to not only participate in developing these designs , but utilize the designs . Also , a process of reviewing the designs . I want every Arup engineer in the world to check and make sure that we 're doing stuff that 's standing , because those guys are the best in the world . Plug . And so you know , I want these -- and I just should note , I have two laptops and one of them there , is there and that has 3000 designs on it . If I drop that laptop , what happens ? So it 's important to have these proven ideas put up there , easy to use , easy to get ahold of . My mom once said , there 's nothing worse than being all mouth and no trousers . ( Laughing ) I 'm fed up of talking about making change . You only make it by doing it . We 've changed FEMA guidelines . We 've changed public policy . We 've changed international response -- based on build
 ing things . So for me , it 's important that we create a real conduit for innovation , and that it 's free innovation . Think of free culture -- this is free innovation . Somebody said this a couple of years back . I will give points for those who know it , I think the man was maybe 25 years too early , so let 's do it . Thank you . ( Applause ) 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/92TediAndersonEt_NuclearEnergy_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/92TediAndersonEt_NuclearEnergy_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/92TediAndersonEt_NuclearEnergy_EN.txt.txt
new file mode 100644
index 0000000..2f9e523
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/92TediAndersonEt_NuclearEnergy_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff Chris Anderson : We 're having a debate . The debate is over the proposition " What the world needs now is nuclear energy " -- true or false ? And before we have the debate , I 'd like to actually take a show of hands -- on balance , right now , are you for or against this ? So those who are " yes , " raise your hand . " For . " Okay , hands down . Those who are " against , " raise your hands . Okay , I 'm reading that at about 75-25 in favor at the start . Which means we 're going to take a vote at the end and see how that shifts , if at all . So here 's the format : They 're going to have six minutes each , and then after one little , quick exchange between them , I want two people on each side of this debate in the audience to have 30 seconds to make one short , crisp , pungent , powerful point . So , in favor of the proposition , possibly shockingly , is one of , truly , the founders of the environmental movement , a long-standing TEDster , the founder of the Whole Earth Cat
 alog , someone we all know and love , Stewart Brand . Stewart Brand : Whoa . ( Applause ) The saying is that with climate , those who know the most are the most worried . With nuclear , those who know the most are the least worried . A classic example is James Hansen , a NASA climatologist pushing for 350 parts per million carbon dioxide in the atmosphere . He came out with a wonderful book recently called " Storms of My Grandchildren . " And Hansen is hard over for nuclear power , as are most climatologists who are engaging this issue seriously . This is the design situation : a planet that is facing climate change and is now half urban . Look at the client base for this . Five out of six of us live in the developing world . We are moving to cities . We are moving up in the world . And we are educating our kids , having fewer kids , basically good news all around . But we move to cities , toward the bright lights , and one of the things that is there that we want , besides jobs , i
 s electricity . And if it is n't easily gotten , we 'll go ahead and steal it . This is one of the most desired things by poor people all over the world , in the cities and in the countryside . Electricity for cities , at its best , is what 's called baseload electricity . That 's where it is on all the time . And so far there are only three major sources of that -- coal and gas , hydro-electric , which in most places is maxed-out -- and nuclear . I would love to have something in the fourth place here , but in terms of constant , clean , scalable energy , [ solar ] and wind and the other renewables are n't there yet because they 're inconstant . Nuclear is and has been for 40 years . Now , from an environmental standpoint , the main thing you want to look at is what happens to the waste from nuclear and from coal , the two major sources of electricity . If all of your electricity in your lifetime came from nuclear , the waste from that lifetime of electricity would go in a Coke can
  -- a pretty heavy Coke can , about two pounds . But one day of coal adds up to one hell of a lot of carbon dioxide in a normal one-gigawatt coal-fired plant . Then what happens to the waste ? The nuclear waste typically goes into a dry cask storage out back of the parking lot at the reactor site because most places do n't have underground storage yet . It 's just as well , because it can stay where it is . While the carbon dioxide , vast quantities of it , gigatons , goes into the atmosphere where we ca n't get it back , yet , and where it is causing the problems that we 're most concerned about . So when you add up the greenhouse gases in the lifetime of these various energy sources , nuclear is down there with wind and hydro , below solar and way below , obviously , all the fossil fuels . Wind is wonderful ; I love wind . I love being around these big wind generators . But one of the things we 're discovering is that wind , like solar , is an actually relatively dilute source of 
 energy . And so it takes a very large footprint on the land , a very large footprint in terms of materials , five to 10 times what you 'd use for nuclear , and typically to get one gigawatt of electricity is on the order of 250 sq . mi . of wind farm . In places like Denmark and Germany , they 've maxed out on wind already . They 've run out of good sites . The power lines are getting overloaded . And you peak out . Likewise , with solar , especially here in California , we 're discovering that the 80 solar farm schemes that are going forward want to basically bulldoze 1,000 sq . mi . of southern California desert . Well , as an environmentalist , we would rather that did n't happen . It 's okay on frapped-out agricultural land . Solar 's wonderful on rooftops . But out in the landscape , one gigawatt is on the order of 50 sq . mi . of bulldozed desert . When you add all these things up -- Saul Griffith did the numbers and figured out what it would take to get 13 clean terawatts of 
 energy from wind , solar and biofuels , and that area would be roughly the size the United States , an area he refers to as " Renewistan . " A guy who 's added all this up very well is David Mackay , a physicist in England , and in his wonderful book , " Sustainable Energy , " among other things , he says , " I 'm not trying to be pro-nuclear . I 'm just pro-arithmetic . " ( Laughter ) In terms of weapons , the best disarmament tool so far is nuclear energy . We have been taking down the Russian warheads , turning it into electricity . 10 percent of American electricity comes from decommissioned warheads . We have n't even started the American stockpile . I think of most interest to a TED audience would be the new generation of reactors that are very small , down around 10 to 125 megawatts . This is one from Toshiba . Here 's one that the Russians are already building that floats on a barge . And that would be very interesting in the developing world . Typically , these things are p
 ut in the ground . They 're referred to as nuclear batteries . They 're incredibly safe , weapons proliferation-proof and all the rest of it . Here is a commercial version from New Mexico called the Hyperion , and another one from Oregon called NuScale . Babcock & Wilcox that make nuclear reactors ... here 's an integral fast reactor . Thorium reactor that Nathan Myhrvold 's involved in . The governments of the world are going to have to decide that coal needs to be made expensive , and these will go ahead . And here 's the future . ( Applause ) CA : Okay . Okay . ( Applause ) So arguing against , a man who 's been at the nitty , gritty heart of the energy debate and the climate change debate for years . In 2000 , he discovered that soot was probably the second leading cause of global warming , after CO2 . His team have been making detailed calculations of the relative impacts of different energy sources . His first time at TED , possibly a disadvantage -- we shall see -- from Stanf
 ord , Professor Mark Jacobson . Good luck . Mark Jacobson : Thank you . ( Applause ) So my premise here is that nuclear energy puts out more carbon dioxide , puts out more air pollutants , enhances mortality more and takes longer to put up than real renewable energy systems , namely wind , solar , geothermal power , hydro-tidal wave power . And it also enhances nuclear weapons proliferation . So let 's just start by looking at the CO2 emissions from the life cycle . CO2e emissions are equivalent emissions of all the greenhouse gases and particles that cause warming , and converted to CO2 . And if you look , wind and concentrated solar have the lowest CO2 emissions , if you look at the graph . Nuclear -- there are two bars here . One is a low estimate , and one is a high estimate . The low estimate is the nuclear energy industry estimate of nuclear . The high is the average of 103 scientific , peer-reviewed studies . And this is just the CO2 from the life cycle . If we look at the de
 lays , it takes between 10 and 19 years to put up a nuclear power plant from planning to operation . This includes about three and a half to six years for a site permit . and another two and a half to four years for a construction permit and issue , and then four to nine years for actual construction . And in China , right now , they 're putting up five gigawatts of nuclear . And the average , just for the construction time of these , is 7.1 years on top of any planning times . While you 're waiting around for your nuclear , you have to run the regular electric power grid , which is mostly coal in the United States and around the world . And the chart here shows the difference between the emissions from the regular grid , resulting if you use nuclear , or anything else , versus wind , CSP or photovoltaics . Wind takes about two to five years on average , same as concentrated solar and photovoltaics . So the difference is the opportunity cost of using nuclear versus wind , or somethi
 ng else . So if you add these two together , alone , you can see a separation that nuclear puts out at least nine to 17 times more CO2 equivalent emissions than wind energy . And this does n't even account for the footprint on the ground . If you look at the air pollution health effects , this is the number of deaths per year in 2020 just from vehicle exhaust . Let 's say we converted all the vehicles in the United States to battery electric vehicles , hydrogen fuel cell vehicles or flex fuel vehicles run on E85 . Well , right now in the United States , 50 to 100,000 people die per year from air pollution , and vehicles are about 25,000 of those . In 2020 , the number will go down to 15,000 due to improvements . And so , on the right , you see gasoline emissions , the death rates of 2020. If you go to corn or cellulosic ethanol , you 'd actually increase the death rate slightly . If you go to nuclear , you do get a big reduction , but it 's not as much as with wind and/or concentrat
 ed solar . Now if you consider the fact that nuclear weapons proliferation is associated with nuclear energy proliferation , because we know for example , India and Pakistan developed nuclear weapons secretly by enriching uranium in nuclear energy facilities . North Korea did that to some extent . Iran is doing that right now . And Venezuela would be doing it if they started with their nuclear energy facilities . If you do a large scale expansion of nuclear energy across the world , and as a result there was just one nuclear bomb created that was used to destroy a city such as Mumbai or some other big city , megacity , the additional death rates due to this averaged over 30 years and scaled to the population of the U. S. would be this . So , do we need this ? The next thing is : What about the footprint ? Stewart mentioned the footprint . Actually , the footprint on the ground for wind is by far the smallest of any energy source in the world . That , because the footprint , as you c
 an see , is just the pole touching the ground . And you can power the entire U. S. vehicle fleet with 73,000 to 145,000 five-megawatt wind turbines . That would take between one and three sq . km . of footprint on the ground , entirely . The spacing is something else . That 's the footprint that 's always being confused . People confuse footprint with spacing . As you can see from these pictures , the spacing between can be used for multiple purposes including agricultural land , range land or open space . Over the ocean , it 's not even land . Now if we look at nuclear -- ( Laughter ) With nuclear , what do we have ? We have facilities around there . You also have a buffer zone that 's 17 sq . km . And you have the uranium mining that you have to deal with . Now if we go to the area , lots is worse than nuclear or wind . For example , cellulosic ethanol , to power the entire U. S. vehicle fleet , this is how much land you would need . That 's cellulosic , second generation biofuels
  from prairie grass . Here 's corn ethanol . It 's smaller . This is based on ranges from data , but if you look at nuclear , it would be the size of Rhode Island to power the U. S. vehicle fleet . For wind , there 's a larger area , but much smaller footprint . And of course , with wind , you could put it all over the east coast , offshore theoretically , or you can split it up . And now , if you go back to looking at geothermal , it 's even smaller than both , and solar is slightly larger than the nuclear spacing , but it 's still pretty small . And this is to power the entire U. S. vehicle fleet . To power the entire world with 50 percent wind , you would need about one percent of world land . Matching the reliability , base load is actually irrelevant . We want to match the hour-by-hour power supply . You can do that by combining renewables . This is from real data in California , looking at wind data and solar data . And it considers just using existing hydro to match the hour-
 by-hour power demand . Here are the world wind resources . There 's 5 to 10 times more wind available worldwide than we need for all the world . So then the finally ranking . And one last slide I just want to show : this is the choice . You can either have wind or nuclear . If you use wind , you guarantee ice will last . Nuclear , the time lag alone will allow the Arctic to melt and other places to melt more . And we can guarantee a clean , blue sky or an uncertain future with nuclear power . ( Applause ) CA : All right . So while they 're having their comebacks on each other -- and yours is slightly short because you slightly overran -- I need two people from either side . So if you 're for this , if you 're for nuclear power , put up two hands . If you 're against , put up one . And I want two of each for the mics . Now then , you guys have -- you have a minute comeback on him to pick up a point he said , challenge it , whatever . SB : I think a point of difference we 're having ,
  Mark , has to do with weapons and energy . These diagrams that show that nuclear is somehow putting out a lot of greenhouse gases -- a lot of those studies will include , " Well of course war will be inevitable and therefore we 'll have cities burning and stuff like that , " which is kind of finessing it a little bit , I think . The reality is that there 's , what , 21 nations that have nuclear power ? Of those , seven have nuclear weapons . In every case , they got the weapons before they got the nuclear power . There are two nations , North Korea and Israel , that have nuclear weapons and do n't have nuclear power at all . The places that we would most like to have really clean energy occur are China , India , Europe , North America , all of which have sorted out their situation in relation to nuclear weapons . So that leaves a couple of places like Iran , maybe Venezuela , that you would like to have very close surveillance of anything that goes on with fissile stuff . Pushing a
 head with nuclear power will mean we really know where all of the fissile material is , and we can move toward zero weapons left , once we know all that . CA : Mark , 30 seconds , either on that or on anything Stewart said . MJ : Well we know India and Pakistan had nuclear energy first , and then they developed nuclear weapons secretly in the factories . So the other thing is , we do n't need nuclear energy . There 's plenty of solar and wind . You can make it reliable , as I showed with that diagram . That 's from real data . And this is an ongoing research . This is not rocket science . Solving the world 's problems can be done , if you 're really put your mind to it and use clean , renewable energy . There 's absolutely no need for nuclear power . ( Applause ) CA : We need someone for . Rod Beckstrom : Thank you Chris . I 'm Rod Beckstrom , CEO of ICANN . I 've been involved in global warming policy since 1994 , when I joined the board of Environmental Defense Fund that was one o
 f the crafters of the Kyoto Protocol . And I want to support Stewart Brand 's position . I 've come around in the last 10 years . I used to be against nuclear power . I 'm now supporting Stewart 's position , softly , from a risk-management standpoint , agreeing that the risks of overheating the planet outweigh the risk of nuclear incident , which certainly is possible and is a very real problem . However , I think there may be a win-win solution here where both parties can win this debate , and that is , we face a situation where it 's carbon caps on this planet or die . And in the United States Senate , we need bipartisan support -- only one or two votes are needed -- to move global warming through the Senate , and this room can help . So if we get that through , then Mark will solve these problems . Thanks Chris . CA : Thank you Rod Beckstrom . Against . David Fanton : Hi , I 'm David Fanton . I just want to say a couple quick things . The first is : be aware of the propaganda . 
 The propaganda from the industry has been very , very strong . And we have not had the other side of the argument fully aired so that people can draw their own conclusions . Be very aware of the propaganda . Secondly , think about this . If we build all these nuclear power plants , all that waste is going to be on hundreds , if not thousands , of trucks and trains , moving through this country every day . Tell me they 're not going to have accidents . Tell me that those accidents are n't going to put material into the environment that is poisonous for hundreds of thousands of years And then tell me that each and every one of those trucks and trains is n't a potential terrorist target . CA : Thank you . For . Anyone else for ? Go . Alex : Hi , I 'm Alex . I just wanted to say , I 'm , first of all , renewable energy 's biggest fan . I 've got solar PV on my roof . I 've got a hydro conversion at a watermill that I own . And I 'm , you know , very much " pro " that kind of stuff . How
 ever , there 's a basic arithmetic problem here . The capability of the sun shining , the wind blowing and the rain falling , simply is n't enough to add up . So if we want to keep the lights on , we actually need a solution which is going to keep generating all of the time . I campaigned against nuclear weapons in the 80s , and I continue to do so now . But we 've got an opportunity to recycle them into something more useful that enables us to get energy all of the time . And , ultimately , the arithmetic problem is n't going to go away . We 're not going to get enough energy from renewables alone . We need a solution that generates all of the time . If we 're going to keep the lights on , nuclear is that solution . CA : Thank you . Anyone else against ? Man : The last person who was in favor made the premise that we do n't have enough alternative renewable resources . And our " against " proponent up here made it clear that we actually do . And so the fallacy that we need this res
 ource and we can actually make it in a time frame that is meaningful is not possible . I will also add one other thing . Ray Kurzweil and all the other talks -- we know that the stick is going up exponentially . So you ca n't look at state-of-the-art technologies in renewables and say , " That 's all we have . " Because five years from now , it will blow you away what we 'll actually have as alternatives to this horrible , disastrous nuclear power . CA : Point well made . Thank you . ( Applause ) So each of you has really just a couple sentences -- 30 seconds each to sum up . Your final pitch , Stewart . SB : I loved your " It all balances out " chart that you had there . It was a sunny day and a windy night . And just now in England they had a cold spell . All of the wind in the entire country shut down for a week . None of those things were stirring . And as usual , they had to buy nuclear power from France . Two gigawatts comes through the Chunnel . This keeps happening . I used 
 to worry about the 10,000 year factor . And the fact is , we 're going to use the nuclear waste we have for fuel in the fourth generation of reactors that are coming along . And especially the small reactors need to go forward . I heard from Nathan Myhrvold -- and I think here 's the action point -- it 'll take an act of Congress to make the Nuclear Regulatory Commission start moving quickly on these small reactors , which we need very much , here and in the world . ( Applause ) MJ : So we 've analyzed the hour-by-hour power demand and supply , looking at solar , wind , using data for California . And you can match that demand , hour-by-hour , for the whole year almost . Now , with regard to the resources , we 've developed the first wind map of the world , from data alone , at 80 meters . We know what the resources are . You can cover 15 percent . 15 percent of the entire U. S. has wind at fast-enough speeds to be cost-competitive . And there 's much more solar than there is wind .
  There 's plenty of resource . You can make it reliable . CA : Okay . So , thank you , Mark . ( Applause ) So if you were in Palm Springs ... ( Laughter ) ( Applause ) Shameless . Shameless . Shameless . ( Applause ) So , people of the TED community , I put it to you that what the world needs now is nuclear energy . All those in favor , raise your hands . ( Shouts ) And all those against . Ooooh . Now that is -- my take on that ... Just put up ... Hands up , people who changed their minds during the debate , who voted differently . Those of you who changed your mind in favor of " for " put your hands up . Okay . So here 's the read on it . Both people won supporters , but on my count , the mood of the TED community shifted from about 75-25 to about 65-35 in favor , in favor . You both won . I congratulate both of you . Thank you for that . ( Applause ) 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/93TediBelcherA_Batteries_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/93TediBelcherA_Batteries_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/93TediBelcherA_Batteries_EN.txt.txt
new file mode 100644
index 0000000..acc3dbe
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/93TediBelcherA_Batteries_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff I thought I would talk a little bit about how nature makes materials . I brought along with me an abalone shell . This abalone shell is a biocomposite material that 's 98 percent by mass calcium carbonate and two percent by mass protein . Yet , it 's 3,000 times tougher than its geological counterpart . And a lot of people might use structures like abalone shells , like chalk . I 've been fascinated by how nature makes materials , and there 's a lot of sequence to how they do such an exquisite job . Part of it is that these materials are macroscopic in structure , but they 're formed at the nanoscale . They 're formed at the nanoscale , and they use proteins that are coded by the genetic level that allow them to build these really exquisite structures . So something I think is very fascinating is what if you could give life to non-living structures , like batteries and like solar cells ? What if they had some of the same capabilities that an abalone shell did , in terms of being
  able to build really exquisite structures at room temperature and room pressure , using non-toxic chemicals and adding no toxic materials back into the environment ? So that 's the vision that I 've been thinking about . And so what if you could grow a battery in a petri dish ? Or , what if you could give genetic information to a battery so that it could actually become better as a function of time , and do so in an environmentally friendly way ? And so , going back to this abalone shell , besides being nano-structured , one thing that 's fascinating , is when a male and a female abalone get together , they pass on the genetic information that says , " This is how to build an exquisite material . Here 's how to do it at room temperature and pressure , using non-toxic materials . " Same with diatoms , which are shone right here , which are glasseous structures . Every time the diatoms replicate , they give the genetic information that says , " Here 's how to build glass in the ocean
  that 's perfectly nano-structured . And you can do it the same , over and over again . " So what if you could do the same thing with a solar cell or a battery ? I like to say my favorite biomaterial is my four year-old . But anyone who 's ever had , or knows , small children knows they 're incredibly complex organisms . And so if you wanted to convince them to do something they do n't want to do , it 's very difficult . So when we think about future technologies , we actually think of using bacteria and virus , simple organisms . Can you convince them to work with a new tool box , so that they can build a structure that will be important to me ? Also , we think about future technologies . We start with the beginning of Earth . Basically , it took a billion years to have life on Earth . And very rapidly , they became multi-cellular , they could replicate , they could use photosynthesis as a way of getting their energy source . But it was n't until about 500 million years ago -- duri
 ng the Cambrian geologic time period -- that organisms in the ocean started making hard materials . Before that they were all soft , fluffy structures . And it was during this time that there was increased calcium and iron and silicon in the environment . And organisms learned how to make hard materials . And so that 's what I would like be able to do -- convince biology to work with the rest of the periodic table . Now if you look at biology , there 's many structures like DNA and antibodies and proteins and ribosomes that you 've heard about that are already nano-structured . So nature already gives us really exquisite structures on the nanoscale . What if we could harness them and convince them to not be an antibody that does something like HIV ? But what if we could convince them to build a solar cell for us ? So here are some examples : these are some natural shells . There are natural biological materials . The abalone shell here -- and if you fracture it , you can look at the
  fact that it 's nano-structured . There 's diatoms made out of SIO2 , and they 're magnetotactic bacteria that make small , single-domain magnets used for navigation . What all these have in common is these materials are structured at the nanoscale , and they have a DNA sequence that codes for a protein sequence , that gives them the blueprint to be able to build these really wonderful structures . Now , going back to the abalone shell , the abalone makes this shell by having these proteins . These proteins are very negatively charged . And they can pull calcium out of the environment , put down a layer of calcium and then carbonate , calcium and carbonate . It has the chemical sequences of amino acids which says , " This is how to build the structure . Here 's the DNA sequence , here 's the protein sequence in order to do it . " And so an interesting idea is , what if you could take any material that you wanted , or any element on the periodic table , and find its corresponding DN
 A sequence , then code it for a corresponding protein sequence to build a structure , but not build an abalone shell -- build something that , through nature , it has never had the opportunity to work with yet . And so here 's the periodic table . And I absolutely love the periodic table . Every year for the incoming freshman class at MIT , I have a periodic table made that says , " Welcome to MIT . Now you 're in your element . " And you flip it over , and it 's the amino acids with the PH at which they have different charges . And so I give this out to thousands of people . And I know it says MIT , and this is Caltech , but I have a couple extra if people want it . And I was really fortunate to have President Obama visit my lab this year on his visit to MIT , and I really wanted to give him a periodic table . So I stayed up at night , and I talked to my husband , " How do I give President Obama a periodic table ? What if he says , 'Oh , I already have one , ' or , 'I 've already m
 emorized it ' ? " And so he came to visit my lab and looked around -- it was a great visit . And then afterward , I said , " Sir , I want to give you the periodic table in case you 're ever in a bind and need to calculate molecular weight . " And I thought molecular weight sounded much less nerdy than molar mass . And so he looked at it , and he said , " Thank you . I 'll look at it periodically . " ( Laughter ) ( Applause ) And later in a lecture that he gave on clean energy , he pulled it out and said , " And people at MIT , they give out periodic tables . " So basically what I did n't tell you is that about 500 million years ago , organisms starter making materials , but it took them about 50 million years to get good at it . It took them about 50 million years to learn how to perfect how to make that abalone shell . And that 's a hard sell to a graduate student . " I have this great project -- 50 million years . " And so we had to develop a way of trying to do this more rapidly 
 . And so we use a virus that 's a non-toxic virus called M13 bacteriophage that 's job is to infect bacteria . Well it has a simple DNA structure that you can go in and cut and paste additional DNA sequences into it . And by doing that , it allows the virus to express random protein sequences . And this is pretty easy biotechnology . And you could basically do this a billion times . And so you can go in and have a billion different viruses that are all genetically identical , but they differ from each other based on their tips , on one sequence that codes for one protein . Now if you take all billion viruses , and you can put them in one drop of liquid , you can force them to interact with anything you want on the periodic table . And through a process of selection evolution , you can pull one of a billion that does something that you 'd like it to do , like grow a battery or grow a solar cell . So basically , viruses ca n't replicate themselves , they need a host . Once you find th
 at one out of a billion , you infect it into a bacteria , and you make millions and billions of copies of that particular sequence . And so the other thing that 's beautiful about biology is that biology gives you really exquisite structures with nice link scales . And these viruses are long and skinny , and we can get them to express the ability to grow something like semiconductors or materials for batteries . Now this is a high-powered battery that we grew in my lab . We engineered a virus to pick up carbon nanotubes . So one part of the virus grabs a carbon nanotube . The other part of the virus has a sequence that can grow an electrode material for a battery . And then it wires itself to the current collector . And so through a process of selection evolution , we went from having a virus that made a crummy battery to a virus that made a good battery to a virus that made a record-breaking , high-powered battery that 's all made at room temperature , basically at the bench top . 
 And that battery went to the White House for a press conference . I brought it here . You can see it in this case -- that 's lighting this LED . Now if we could scale this , you could actually use it to run your Prius , which is my dream -- to be able to drive a virus-powered car . But it 's basically -- you can pull one out of a billion . You can make lots of amplifications to it . Basically , you make an amplification in the lab . And then you get it to self-assemble into a structure like a battery . We 're able to do this also with catalysis . This is the example of photocatalytic splitting of water . And what we 've been able to do is engineer a virus to basically take dye absorbing molecules and line them up on the surface of the virus so it acts as an antenna , and you get an energy transfer across the virus . And then we give it a second gene to grow an inorganic material that can be used to split water into oxygen and hydrogen , that can be used for clean fuels . And I broug
 ht an example with me of that today . My students promised me it would work . These are virus-assembled nanowires . When you shine light on them , you can see them bubbling . In this case , you 're seeing oxygen bubbles come out . And basically by controlling the genes , you can control multiple materials to improve your device performance . The last example are solar cells . You can also do this with solar cells . We 've been able to engineer viruses to pick up carbon nanotubes and then grow titanium dioxide around them -- and use as a way of getting electrons through the device . And what we 've found is that , through genetic engineering , we can actually increase the efficiencies of these solar cells to record numbers for these types of dye-sensitized systems . And I brought one of those as well that you can play around with outside afterward . So this is a virus-based solar cell . Through evolution and selection , we took it from an eight percent efficiency solar cell to an 11 
 percent efficiency solar cell . So I hope that I 've convinced you that there 's a lot of great , interesting things to be learned about how nature makes materials -- and taking it to the next step to see if you can force , or whether you can take advantage of how nature makes materials , to make things that nature has n't yet dreamed of making . Thank you . 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/94TediGellMannM_Lang_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/94TediGellMannM_Lang_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/94TediGellMannM_Lang_EN.txt.txt
new file mode 100644
index 0000000..cc7b196
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/94TediGellMannM_Lang_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff Well , I 'm involved in other things besides physics . In fact , mostly now in other things . One thing is distant relationships among human languages . And the professional , historical linguists in the US and in Western Europe mostly try to stay away from any long-distance relationships ; big groupings , groupings that go back a long time , longer than the familiar families . They do n't like that ; they think it 's crank . I do n't think it 's crank . And there are some brilliant linguists , mostly Russians , who are working on that at Santa Fe Institute and in Moscow , and I would love to see where that leads . Does it really lead to a single ancestor some 20 , 25,000 years ago ? And what if we go back beyond that single ancestor , when there was presumably a competition among many languages ? How far back does that go ? How far back does modern language go ? How many tens of thousands of years does it go back ? Chris Anderson : Do you have a hunch or a hope for what the ans
 wer to that is ? Murray Gell-Mann : Well , I would guess that modern language must be older than the cave paintings and cave engravings and cave sculptures and dance steps in the soft clay in the caves in Western Europe in the Aurignacian Period some 35,000 years ago , or earlier . I ca n't believe they did all those things and did n't also have a modern language . So I would guess that the actual origin goes back at least that far and maybe further . But that does n't mean that all , or many , or most of today 's attested languages could n't descend perhaps from one that 's much younger than that , like say 20,000 years , or something of that kind . It 's what we call a bottleneck . CA : Well , Philip Anderson may have been right . You may just know more about everything than anyone . So it 's been an honor . Thank you Murray Gell-Mann . ( Applause ) 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/opennlp-sandbox/blob/1f97041b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/95TediJakubowskiM_OpenTech_EN.txt.txt
----------------------------------------------------------------------
diff --git a/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/95TediJakubowskiM_OpenTech_EN.txt.txt b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/95TediJakubowskiM_OpenTech_EN.txt.txt
new file mode 100644
index 0000000..a776cfa
--- /dev/null
+++ b/opennlp-similarity/src/test/resources/style_recognizer/txt/Tedi/95TediJakubowskiM_OpenTech_EN.txt.txt
@@ -0,0 +1,2 @@
+
+\ufeff Hi , my name is Marcin -- farmer , technologist . I was born in Poland , now in the U. S. I started a group called Open Source Ecology . We 've identified the 50 most important machines that we think it takes for modern life to exist -- things from tractors , bread ovens , circuit makers . Then we set out to create an open source , DIY , do it yourself version that anyone can build and maintain at a fraction of the cost . We call this the Global Village Construction Set . So let me tell you a story . So I finished my 20s with a Ph. D. in fusion energy , and I discovered I was useless . I had no practical skills . The world presented me with options , and I took them . I guess you can call it the consumer lifestyle . So I started a farm in Missouri and learned about the economics of farming . I bought a tractor -- then it broke . I paid to get it repaired -- then it broke again . Then pretty soon I was broke too . I realized that the truly appropriate , low-cost tools that I need
 ed to start a sustainable farm and settlement just did n't exist yet . I needed tools that were robust , modular , highly efficient and optimized , low-cost , made from local and recycled materials that would last a lifetime , not designed for obsolescence . I found that I would have to build them myself . So I did just that . And I tested them . And I found that industrial productivity can be achieved on a small scale . So then I published the 3D designs , schematics , instructional videos and budgets on a wiki . Then contributors from all over the world began showing up , prototyping new machines during dedicated project visits . So far , we have prototyped eight of the 50 machines . And now the project is beginning to grow on its own . We know that open source has succeeded with tools for managing knowledge and creativity . And the same is starting to happen with hardware too . We 're focusing on hardware because it is hardware that can change people 's lives in such tangible mat
 erial ways . If we can lower the barriers to farming , building , manufacturing , then we can unleash just massive amounts of human potential . That 's not only in the developing world . Our tools are being made for the American farmer , builder , entrepreneur , maker . We 've seen lots of excitement from these people , who can now start a construction business , parts manufacturing , organic CSA or just selling power back to the grid . Our goal is a repository of published designs so clear , so complete , that a single burned DVD is effectively a civilization starter kit . I 've planted a hundred trees in a day . I 've pressed 5,000 bricks in one day from the dirt beneath my feet and built a tractor in six days. From what I 've seen , this is only the beginning . If this idea is truly sound , then the implications are significant . A greater distribution of the means of production , environmentally sound supply chains , and a newly-relevant DIY maker culture can hope to transcend a
 rtificial scarcity . We 're exploring the limits of what we all can do to make a better world with open hardware technology . Thank you . ( Applause ) 
\ No newline at end of file