You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Ivan VAGANOV <iv...@mail.ru> on 2014/07/28 22:06:41 UTC

building corpus fast - searching for advice from experts

Dear Community Experts!

To start the text mining, we need the corpus.

 

Did any of you come across any open source solutions that can do the
following tasks :

 

1.           A researcher enters a few keywords, to the program, for
example, "iphone", "Apple products", "MAcBook", restricts the results to the
time period of 1 week.

2.           The program goes to Google, searches for these keywords, 

3.           Creates a list of 200 first URLS for these queries.

4.           Downloads the WebPages with these results as txt files,
cleaning up the trash such as advertisements.

A researcher can work with the results in openNLP or other text mining
program.

 

Thank you for your advice in case of a spare minute!

All the best in what you do, 

Ivan

 

 



---
Это сообщение свободно от вирусов и вредоносного ПО благодаря защите от вирусов avast!
http://www.avast.com