You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Tri Nguyen <yt...@gmail.com> on 2011/06/15 05:03:44 UTC

Hotel Name model

Hi,

Could somebody guide me how to build a Hotel Name model?

Thanks,
Tri.

Re: Hotel Name model

Posted by Tri Nguyen <yt...@gmail.com>.

Hi Vaijanath,

Thanks so much for your help. It makes me more clearly. I will study the
thing in your link.
Yes, I have been using the Yahoo Search Boss, we can retrieve the text by
Apache Tika. But I think it takes a lot of time to manually to tag the Hotel
name as it require at least 15000 sentences.

Thanks so much,
Nguyen Van Tri.


On Wed, Jun 15, 2011 at 4:14 PM, Rao, Vaijanath
<va...@teamaol.com>wrote:

> Hi Tri,
>
> The link to download DBPedia data is http://wiki.dbpedia.org/Downloads36 .
> There might be some issues with DBPedia servers but I think that will be
> sorted out and we might be able to get data to download.  ( this might take
> a day or 2 to get corrected )
>
> Once I can download I should be able to help you more in getting DBPedia
> data to create training set for you.
>
> Regarding using Google search engine, might be a good option, but I have
> never tried myself as it will involve html parsing and other stuff. However
> I had in past used Yahoo's WebSearch API (
> http://developer.yahoo.com/search/web/V1/webSearch.html ) where you would
> get some description of the query term and would not involve writing html
> parser.
>
> Let me know if any of the above things helps you.
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> -----Original Message-----
> From: Tri Nguyen [mailto:ytedientu@gmail.com]
> Sent: Wednesday, June 15, 2011 2:16 PM
> To: opennlp-dev@incubator.apache.org
> Subject: Re: Hotel Name model
>
> Hi Vaijanath,
>
> It means that the title is the name of a hotel and you try to find the
> sentences containing that name to be train data line, am I correct? Can we
> get the urls of the article in DBPedia? I am sorry to ask you so much
> because I don't know about DBPedia.
> Since we can not download data from DBPedia, can we choose the hotel names
> and query to Google to collect the top pages to be data sets? But I think
> this way is not high precision.
>
> Thanks for your explanation,
> Nguyen Van Tri.
>
> On Wed, Jun 15, 2011 at 3:21 PM, Rao, Vaijanath
> <va...@teamaol.com>wrote:
>
> > Hi Tri,
> >
> > The link of DBPedia says that it identified hotel, now if we parse the
> > DBPedia data and get only those elements which have Hotel as it class
> > ( Or parent class) we can then mark that data for training. So Each of
> > the article in DBPedia will have title and description, So in worst
> > case we can look for title in the description and mark that entity name
> for training.
> >
> > For some reason DBPedia is not allowing me to download data. But Once
> > I get it to download I will able to code the wrapper from DBPedia to
> > OpenNLP in couple of days time.
> >
> > --Thanks and Regards
> > Vaijanath N. Rao
> >
> > -----Original Message-----
> > From: Tri Nguyen [mailto:ytedientu@gmail.com]
> > Sent: Wednesday, June 15, 2011 12:57 PM
> > To: opennlp-dev@incubator.apache.org
> > Subject: Re: Hotel Name model
> >
> > Hi Vaijanath,
> >
> > Thanks so much for your reply. At first I think I can make a Hotel
> > model like the Job Title model which is described in chapter 6 of the
> > book Introduction to Linguistic Annotation an Text Analytics. But it
> > is difficult to me to choose the right corpus to build the train data.
> > Because Hotel is a sub class of the Organization class (
> > http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26),
> > I think I can get the corpus of Organization model and remove the
> > non-hotel train data to be train data for Hotel model?. But, I don't
> > know what is the corpus to build Organization model? Could you show to me
> what is it?
> >
> > Could you please explain more detail on your link? You mean that we
> > can collect Hotel names and build a train data? I see a large list
> > hotel names at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it
> > helpful to us to build train data?
> >
> > Thanks so much for your patience to read long question,
> >
> > Nguyen Van Tri.
> >
> >
> > On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath
> > <va...@teamaol.com>wrote:
> >
> > > Hi Tri,
> > >
> > > You can try Model similar to Organization and you would need some
> > > training data for Hotel. You can start looking at DBPedia data as
> > > initial Sample data.
> > >
> > > http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is
> > > Hotel ontology ). If there is a larger interest I can work on
> > > contibuting DBPedia Data as  training set for a particular type.
> > >
> > >
> > > --Thanks and Regards
> > > Vaijanath N. Rao
> > >
> > > ________________________________________
> > > From: Tri Nguyen [ytedientu@gmail.com]
> > > Sent: Wednesday, June 15, 2011 08:33
> > > To: opennlp-dev@incubator.apache.org
> > > Subject: Hotel Name model
> > >
> > > Hi,
> > >
> > > Could somebody guide me how to build a Hotel Name model?
> > >
> > > Thanks,
> > > Tri.
> > >
> >
>

RE: Hotel Name model

Posted by "Rao, Vaijanath" <va...@teamaol.com>.

Hi Tri,

The link to download DBPedia data is http://wiki.dbpedia.org/Downloads36 . There might be some issues with DBPedia servers but I think that will be sorted out and we might be able to get data to download.  ( this might take a day or 2 to get corrected )

Once I can download I should be able to help you more in getting DBPedia data to create training set for you.

Regarding using Google search engine, might be a good option, but I have never tried myself as it will involve html parsing and other stuff. However I had in past used Yahoo's WebSearch API ( http://developer.yahoo.com/search/web/V1/webSearch.html ) where you would get some description of the query term and would not involve writing html parser.

Let me know if any of the above things helps you.

--Thanks and Regards
Vaijanath N. Rao

-----Original Message-----
From: Tri Nguyen [mailto:ytedientu@gmail.com] 
Sent: Wednesday, June 15, 2011 2:16 PM
To: opennlp-dev@incubator.apache.org
Subject: Re: Hotel Name model

Hi Vaijanath,

It means that the title is the name of a hotel and you try to find the sentences containing that name to be train data line, am I correct? Can we get the urls of the article in DBPedia? I am sorry to ask you so much because I don't know about DBPedia.
Since we can not download data from DBPedia, can we choose the hotel names and query to Google to collect the top pages to be data sets? But I think this way is not high precision.

Thanks for your explanation,
Nguyen Van Tri.

On Wed, Jun 15, 2011 at 3:21 PM, Rao, Vaijanath
<va...@teamaol.com>wrote:

> Hi Tri,
>
> The link of DBPedia says that it identified hotel, now if we parse the 
> DBPedia data and get only those elements which have Hotel as it class 
> ( Or parent class) we can then mark that data for training. So Each of 
> the article in DBPedia will have title and description, So in worst 
> case we can look for title in the description and mark that entity name for training.
>
> For some reason DBPedia is not allowing me to download data. But Once 
> I get it to download I will able to code the wrapper from DBPedia to 
> OpenNLP in couple of days time.
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> -----Original Message-----
> From: Tri Nguyen [mailto:ytedientu@gmail.com]
> Sent: Wednesday, June 15, 2011 12:57 PM
> To: opennlp-dev@incubator.apache.org
> Subject: Re: Hotel Name model
>
> Hi Vaijanath,
>
> Thanks so much for your reply. At first I think I can make a Hotel 
> model like the Job Title model which is described in chapter 6 of the 
> book Introduction to Linguistic Annotation an Text Analytics. But it 
> is difficult to me to choose the right corpus to build the train data. 
> Because Hotel is a sub class of the Organization class ( 
> http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), 
> I think I can get the corpus of Organization model and remove the 
> non-hotel train data to be train data for Hotel model?. But, I don't 
> know what is the corpus to build Organization model? Could you show to me what is it?
>
> Could you please explain more detail on your link? You mean that we 
> can collect Hotel names and build a train data? I see a large list 
> hotel names at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it 
> helpful to us to build train data?
>
> Thanks so much for your patience to read long question,
>
> Nguyen Van Tri.
>
>
> On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath
> <va...@teamaol.com>wrote:
>
> > Hi Tri,
> >
> > You can try Model similar to Organization and you would need some 
> > training data for Hotel. You can start looking at DBPedia data as 
> > initial Sample data.
> >
> > http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is 
> > Hotel ontology ). If there is a larger interest I can work on 
> > contibuting DBPedia Data as  training set for a particular type.
> >
> >
> > --Thanks and Regards
> > Vaijanath N. Rao
> >
> > ________________________________________
> > From: Tri Nguyen [ytedientu@gmail.com]
> > Sent: Wednesday, June 15, 2011 08:33
> > To: opennlp-dev@incubator.apache.org
> > Subject: Hotel Name model
> >
> > Hi,
> >
> > Could somebody guide me how to build a Hotel Name model?
> >
> > Thanks,
> > Tri.
> >
>

Re: Hotel Name model

Posted by Tri Nguyen <yt...@gmail.com>.

Hi Vaijanath,

It means that the title is the name of a hotel and you try to find the
sentences containing that name to be train data line, am I correct? Can we
get the urls of the article in DBPedia? I am sorry to ask you so much
because I don't know about DBPedia.
Since we can not download data from DBPedia, can we choose the hotel names
and query to Google to collect the top pages to be data sets? But I think
this way is not high precision.

Thanks for your explanation,
Nguyen Van Tri.

On Wed, Jun 15, 2011 at 3:21 PM, Rao, Vaijanath
<va...@teamaol.com>wrote:

> Hi Tri,
>
> The link of DBPedia says that it identified hotel, now if we parse the
> DBPedia data and get only those elements which have Hotel as it class ( Or
> parent class) we can then mark that data for training. So Each of the
> article in DBPedia will have title and description, So in worst case we can
> look for title in the description and mark that entity name for training.
>
> For some reason DBPedia is not allowing me to download data. But Once I get
> it to download I will able to code the wrapper from DBPedia to OpenNLP in
> couple of days time.
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> -----Original Message-----
> From: Tri Nguyen [mailto:ytedientu@gmail.com]
> Sent: Wednesday, June 15, 2011 12:57 PM
> To: opennlp-dev@incubator.apache.org
> Subject: Re: Hotel Name model
>
> Hi Vaijanath,
>
> Thanks so much for your reply. At first I think I can make a Hotel model
> like the Job Title model which is described in chapter 6 of the book
> Introduction to Linguistic Annotation an Text Analytics. But it is difficult
> to me to choose the right corpus to build the train data. Because Hotel is a
> sub class of the Organization class (
> http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), I
> think I can get the corpus of Organization model and remove the non-hotel
> train data to be train data for Hotel model?. But, I don't know what is the
> corpus to build Organization model? Could you show to me what is it?
>
> Could you please explain more detail on your link? You mean that we can
> collect Hotel names and build a train data? I see a large list hotel names
> at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it helpful to us to
> build train data?
>
> Thanks so much for your patience to read long question,
>
> Nguyen Van Tri.
>
>
> On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath
> <va...@teamaol.com>wrote:
>
> > Hi Tri,
> >
> > You can try Model similar to Organization and you would need some
> > training data for Hotel. You can start looking at DBPedia data as
> > initial Sample data.
> >
> > http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is
> > Hotel ontology ). If there is a larger interest I can work on
> > contibuting DBPedia Data as  training set for a particular type.
> >
> >
> > --Thanks and Regards
> > Vaijanath N. Rao
> >
> > ________________________________________
> > From: Tri Nguyen [ytedientu@gmail.com]
> > Sent: Wednesday, June 15, 2011 08:33
> > To: opennlp-dev@incubator.apache.org
> > Subject: Hotel Name model
> >
> > Hi,
> >
> > Could somebody guide me how to build a Hotel Name model?
> >
> > Thanks,
> > Tri.
> >
>

RE: Hotel Name model

Posted by "Rao, Vaijanath" <va...@teamaol.com>.

Hi Tri,

The link of DBPedia says that it identified hotel, now if we parse the DBPedia data and get only those elements which have Hotel as it class ( Or parent class) we can then mark that data for training. So Each of the article in DBPedia will have title and description, So in worst case we can look for title in the description and mark that entity name for training.

For some reason DBPedia is not allowing me to download data. But Once I get it to download I will able to code the wrapper from DBPedia to OpenNLP in couple of days time.

--Thanks and Regards
Vaijanath N. Rao

-----Original Message-----
From: Tri Nguyen [mailto:ytedientu@gmail.com] 
Sent: Wednesday, June 15, 2011 12:57 PM
To: opennlp-dev@incubator.apache.org
Subject: Re: Hotel Name model

Hi Vaijanath,

Thanks so much for your reply. At first I think I can make a Hotel model like the Job Title model which is described in chapter 6 of the book Introduction to Linguistic Annotation an Text Analytics. But it is difficult to me to choose the right corpus to build the train data. Because Hotel is a sub class of the Organization class ( http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), I think I can get the corpus of Organization model and remove the non-hotel train data to be train data for Hotel model?. But, I don't know what is the corpus to build Organization model? Could you show to me what is it?

Could you please explain more detail on your link? You mean that we can collect Hotel names and build a train data? I see a large list hotel names at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it helpful to us to build train data?

Thanks so much for your patience to read long question,

Nguyen Van Tri.

On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath
<va...@teamaol.com>wrote:

> Hi Tri,
>
> You can try Model similar to Organization and you would need some 
> training data for Hotel. You can start looking at DBPedia data as 
> initial Sample data.
>
> http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is 
> Hotel ontology ). If there is a larger interest I can work on 
> contibuting DBPedia Data as  training set for a particular type.
>
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> ________________________________________
> From: Tri Nguyen [ytedientu@gmail.com]
> Sent: Wednesday, June 15, 2011 08:33
> To: opennlp-dev@incubator.apache.org
> Subject: Hotel Name model
>
> Hi,
>
> Could somebody guide me how to build a Hotel Name model?
>
> Thanks,
> Tri.
>

Re: Hotel Name model

Posted by Tri Nguyen <yt...@gmail.com>.

Hi Vaijanath,

Thanks so much for your reply. At first I think I can make a Hotel model
like the Job Title model which is described in chapter 6 of the book
Introduction to Linguistic Annotation an Text Analytics. But it is difficult
to me to choose the right corpus to build the train data. Because Hotel is a
sub class of the Organization class (
http://cs.nyu.edu/cs/faculty/grishman/NEtask20.book_8.html#HEADING26), I
think I can get the corpus of Organization model and remove the non-hotel
train data to be train data for Hotel model?. But, I don’t know what is the
corpus to build Organization model? Could you show to me what is it?

Could you please explain more detail on your link? You mean that we can
collect Hotel names and build a train data? I see a large list hotel names
at http://rtw.ml.cmu.edu/rtw/kbbrowser/pred:hotel, is it helpful to us to
build train data?

Thanks so much for your patience to read long question,

Nguyen Van Tri.

On Wed, Jun 15, 2011 at 12:35 PM, Rao, Vaijanath
<va...@teamaol.com>wrote:

> Hi Tri,
>
> You can try Model similar to Organization and you would need some training
> data for Hotel. You can start looking at DBPedia data as initial Sample
> data.
>
> http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is Hotel
> ontology ). If there is a larger interest I can work on contibuting DBPedia
> Data as  training set for a particular type.
>
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> ________________________________________
> From: Tri Nguyen [ytedientu@gmail.com]
> Sent: Wednesday, June 15, 2011 08:33
> To: opennlp-dev@incubator.apache.org
> Subject: Hotel Name model
>
> Hi,
>
> Could somebody guide me how to build a Hotel Name model?
>
> Thanks,
> Tri.
>

RE: Hotel Name model

Posted by "Rao, Vaijanath" <va...@teamaol.com>.

Hi Tri,

You can try Model similar to Organization and you would need some training data for Hotel. You can start looking at DBPedia data as initial Sample data.

http://mappings.dbpedia.org/index.php/OntologyClass:Hotel ( This is Hotel ontology ). If there is a larger interest I can work on contibuting DBPedia Data as  training set for a particular type.


--Thanks and Regards
Vaijanath N. Rao

________________________________________
From: Tri Nguyen [ytedientu@gmail.com]
Sent: Wednesday, June 15, 2011 08:33
To: opennlp-dev@incubator.apache.org
Subject: Hotel Name model

Hi,

Could somebody guide me how to build a Hotel Name model?

Thanks,
Tri.