You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Miles Osborne <mi...@inf.ed.ac.uk> on 2010/02/04 12:36:50 UTC

Re: Large-scale Language Models

My trusty Google alert spotted this!

But yes, I have code which builds large LMs using Hadoop.  That is,
taking raw text and building ngrams and counts for later hosting.  In
parallel with this there is a current effort to host this using
HyperTable.

What I don't have is Hadoop code to smooth the ngrams.  But, if you
need to use Hadoop to build your LMs then the chances are you don't
need to do any fancy smoothing either.

Miles

>
Miles Osborne and Chris Dyer have worked on this separately.

Hopefully Miles is listening.

On Wed, Feb 3, 2010 at 10:07 AM, Mandar Rahurkar <ra...@...> wrote:

> Hi, All,
> I was wondering if there has been an initiative to implement large
> scale language models using hadoop. If not and if there is sufficient
> interest, I would be interested in adding that functionality.
>
> regards,
> Mandar
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Large-scale Language Models

Posted by Ted Dunning <te...@gmail.com>.

Yeah... rumor is that version 20 had a huge performance improvement over
19.  Lots of applications became much more feasible at that point.

On Wed, Feb 10, 2010 at 3:52 AM, Isabel Drost <is...@apache.org> wrote:

> On Thu Mandar Rahurkar <ra...@gmail.com> wrote:
> > 4. On related note, does anyone here have experience with hypertable
> > or similar open source distributed storage paradigms for production
> > systems.
>
> The Hadoop equivalent to Hypertable would be HBase. There are several
> people using that for production systems, e.g. Lars George, who can be
> found over on the HBase lists.
>
> Isabel
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Large-scale Language Models

Posted by Isabel Drost <is...@apache.org>.

On Thu Mandar Rahurkar <ra...@gmail.com> wrote:
> 4. On related note, does anyone here have experience with hypertable
> or similar open source distributed storage paradigms for production
> systems.

The Hadoop equivalent to Hypertable would be HBase. There are several
people using that for production systems, e.g. Lars George, who can be
found over on the HBase lists.

Isabel

Re: Large-scale Language Models

Posted by Ted Dunning <te...@gmail.com>.

Lots of us use HDFS as part of hadoop.  I have been using it for years in
production.  Yahoo and Facebook have been doing so for years.

I don't personally know of anybody except Miles using hypertable.

On Thu, Feb 4, 2010 at 11:28 AM, Mandar Rahurkar <ra...@gmail.com> wrote:

> 4. On related note, does anyone here have experience with hypertable
> or similar open source distributed storage paradigms for production
> systems.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Large-scale Language Models

Posted by Mandar Rahurkar <ra...@gmail.com>.

Thanks Miles,
1. I agree that I might not have to use any fancy smoothing, but even
at Google scale using simple smoothing seems to aid performance (at
least for Machine translation)
http://acl.ldc.upenn.edu/D/D07/D07-1090.pdf
Has that been your experience as well?

2. Is your code open source?

3. I was also looking to understand if there were any efforts to store
these large sets optimally for real time access. Can you please point
me to effort on hosting LM's using hypertable effort?

4. On related note, does anyone here have experience with hypertable
or similar open source distributed storage paradigms for production
systems.

Mandar

On Thu, Feb 4, 2010 at 3:36 AM, Miles Osborne <mi...@inf.ed.ac.uk> wrote:
> My trusty Google alert spotted this!
>
> But yes, I have code which builds large LMs using Hadoop.  That is,
> taking raw text and building ngrams and counts for later hosting.  In
> parallel with this there is a current effort to host this using
> HyperTable.
>
> What I don't have is Hadoop code to smooth the ngrams.  But, if you
> need to use Hadoop to build your LMs then the chances are you don't
> need to do any fancy smoothing either.
>
> Miles
>
>>
> Miles Osborne and Chris Dyer have worked on this separately.
>
> Hopefully Miles is listening.
>
> On Wed, Feb 3, 2010 at 10:07 AM, Mandar Rahurkar <ra...@...> wrote:
>
>> Hi, All,
>> I was wondering if there has been an initiative to implement large
>> scale language models using hadoop. If not and if there is sufficient
>> interest, I would be interested in adding that functionality.
>>
>> regards,
>> Mandar
>>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>