You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Dan Filimon <da...@gmail.com> on 2013/04/04 20:43:25 UTC

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Any news on this front? Did we get approved/assigned a slot/anything?


On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <da...@gmail.com>wrote:

> Ok, updated!
>
>
> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <an...@gmail.com> wrote:
>
>> Dan,
>>
>> I think what you've written is fine (I wanted to edit to remove the
>> '?' around random forests but couldn't).
>>
>> ok?
>>
>>
>>
>> On 29 March 2013 11:14, Dan Filimon <da...@gmail.com> wrote:
>> > I added Andy's first suggestion and Ted's suggestion as ideas.
>> >
>> > Andy, could you flesh out your second suggestion into a project and
>> make an
>> > issue please?
>> >
>> >
>> > On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >
>> >> It should be possible to view a Lucene index as a matrix.  This would
>> >> require that we standardize on a way to convert documents to rows.
>>  There
>> >> are many choices, the discussion of which should be deferred to the
>> actual
>> >> work on the project, but there are a few obvious constraints:
>> >>
>> >> a) it should be possible to get the same result as dumping the term
>> vectors
>> >> for each document each to a line and converting that result using
>> standard
>> >> Mahout methods.
>> >>
>> >> b) numeric fields ought to work somehow.
>> >>
>> >> c) if there are multiple text fields that ought to work sensibly as
>> well.
>> >>  Two options include dumping multiple matrices or to convert the fields
>> >> into a single row of a single matrix.
>> >>
>> >> d) it should be possible to refer back from a row of the matrix to
>> find the
>> >> correct document.  THis might be because we remember the Lucene doc
>> number
>> >> or because a field is named as holding a unique id.
>> >>
>> >> e) named vectors and matrices should be used if plausible.
>> >>
>> >> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>> dangeorge.filimon@gmail.com
>> >> >wrote:
>> >>
>> >> > ...
>> >> > Ted, could you explain a bit more what you mean by "simplify the
>> >> connection
>> >> > to Lucene for clustering and classification"? It's too vague for an
>> idea
>> >> > proposal.
>> >> >
>> >>
>>
>>
>>
>> --
>> Dr Andy Twigg
>> Junior Research Fellow, St Johns College, Oxford
>> Room 351, Department of Computer Science
>> http://www.cs.ox.ac.uk/people/andy.twigg/
>> andy.twigg@cs.ox.ac.uk | +447799647538
>>
>
>

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Posted by Dan Filimon <da...@gmail.com>.

On Tue, Apr 9, 2013 at 5:15 PM, Shannon Quinn <sq...@gatech.edu> wrote:

> I volunteer.


Brave man! Thank you!

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Posted by Shannon Quinn <sq...@gatech.edu>.

I volunteer.

On 4/9/13 10:12 AM, Dan Filimon wrote:
> I can confirm Apache got in! :)
> The slot assignment is not yet clear however.
>
> And, because mailing people to death is what I do, volunteers for mentoring?
>
>
> On Thu, Apr 4, 2013 at 9:49 PM, Shannon Quinn <sq...@gatech.edu> wrote:
>
>> According to the GSoC calendar, accepted organizations aren't posted until
>> April 8 (Monday), at which point (assuming Apache is accepted...I can't
>> imagine it wouldn't be) slots will be doled out internally. This will
>> probably take at least a day or two, so probably by middle of next week
>> we'll know how many slots Mahout has.
>>
>> Speaking of which: how do the various subprojects negotiate for slots? Is
>> there a central spreadsheet, or an IRC meeting to attend? Or did I miss the
>> email detailing this?
>>
>>
>> On 4/4/13 2:43 PM, Dan Filimon wrote:
>>
>>> Any news on this front? Did we get approved/assigned a slot/anything?
>>>
>>>
>>> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <dangeorge.filimon@gmail.com
>>>> **wrote:
>>>   Ok, updated!
>>>>
>>>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <an...@gmail.com>
>>>> wrote:
>>>>
>>>>   Dan,
>>>>> I think what you've written is fine (I wanted to edit to remove the
>>>>> '?' around random forests but couldn't).
>>>>>
>>>>> ok?
>>>>>
>>>>>
>>>>>
>>>>> On 29 March 2013 11:14, Dan Filimon <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>>>
>>>>>> Andy, could you flesh out your second suggestion into a project and
>>>>>>
>>>>> make an
>>>>>
>>>>>> issue please?
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <te...@gmail.com>
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> It should be possible to view a Lucene index as a matrix.  This would
>>>>>>> require that we standardize on a way to convert documents to rows.
>>>>>>>
>>>>>>    There
>>>>>> are many choices, the discussion of which should be deferred to the
>>>>>> actual
>>>>>> work on the project, but there are a few obvious constraints:
>>>>>>> a) it should be possible to get the same result as dumping the term
>>>>>>>
>>>>>> vectors
>>>>>> for each document each to a line and converting that result using
>>>>>> standard
>>>>>> Mahout methods.
>>>>>>> b) numeric fields ought to work somehow.
>>>>>>>
>>>>>>> c) if there are multiple text fields that ought to work sensibly as
>>>>>>>
>>>>>> well.
>>>>>>    Two options include dumping multiple matrices or to convert the
>>>>>>> fields
>>>>>>> into a single row of a single matrix.
>>>>>>>
>>>>>>> d) it should be possible to refer back from a row of the matrix to
>>>>>>>
>>>>>> find the
>>>>>> correct document.  THis might be because we remember the Lucene doc
>>>>>> number
>>>>>> or because a field is named as holding a unique id.
>>>>>>> e) named vectors and matrices should be used if plausible.
>>>>>>>
>>>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>>>>>>
>>>>>> dangeorge.filimon@gmail.com
>>>>>> wrote:
>>>>>>>> ...
>>>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>>>>>
>>>>>>> connection
>>>>>>>
>>>>>>>> to Lucene for clustering and classification"? It's too vague for an
>>>>>>>>
>>>>>>> idea
>>>>>> proposal.
>>>>>>>>
>>>>> --
>>>>> Dr Andy Twigg
>>>>> Junior Research Fellow, St Johns College, Oxford
>>>>> Room 351, Department of Computer Science
>>>>> http://www.cs.ox.ac.uk/people/**andy.twigg/<http://www.cs.ox.ac.uk/people/andy.twigg/>
>>>>> andy.twigg@cs.ox.ac.uk | +447799647538
>>>>>
>>>>>

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Posted by Dan Filimon <da...@gmail.com>.

I can confirm Apache got in! :)
The slot assignment is not yet clear however.

And, because mailing people to death is what I do, volunteers for mentoring?


On Thu, Apr 4, 2013 at 9:49 PM, Shannon Quinn <sq...@gatech.edu> wrote:

> According to the GSoC calendar, accepted organizations aren't posted until
> April 8 (Monday), at which point (assuming Apache is accepted...I can't
> imagine it wouldn't be) slots will be doled out internally. This will
> probably take at least a day or two, so probably by middle of next week
> we'll know how many slots Mahout has.
>
> Speaking of which: how do the various subprojects negotiate for slots? Is
> there a central spreadsheet, or an IRC meeting to attend? Or did I miss the
> email detailing this?
>
>
> On 4/4/13 2:43 PM, Dan Filimon wrote:
>
>> Any news on this front? Did we get approved/assigned a slot/anything?
>>
>>
>> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <dangeorge.filimon@gmail.com
>> >**wrote:
>>
>>  Ok, updated!
>>>
>>>
>>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <an...@gmail.com>
>>> wrote:
>>>
>>>  Dan,
>>>>
>>>> I think what you've written is fine (I wanted to edit to remove the
>>>> '?' around random forests but couldn't).
>>>>
>>>> ok?
>>>>
>>>>
>>>>
>>>> On 29 March 2013 11:14, Dan Filimon <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>>
>>>>> Andy, could you flesh out your second suggestion into a project and
>>>>>
>>>> make an
>>>>
>>>>> issue please?
>>>>>
>>>>>
>>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <te...@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>> It should be possible to view a Lucene index as a matrix.  This would
>>>>>> require that we standardize on a way to convert documents to rows.
>>>>>>
>>>>>   There
>>>>
>>>>> are many choices, the discussion of which should be deferred to the
>>>>>>
>>>>> actual
>>>>
>>>>> work on the project, but there are a few obvious constraints:
>>>>>>
>>>>>> a) it should be possible to get the same result as dumping the term
>>>>>>
>>>>> vectors
>>>>
>>>>> for each document each to a line and converting that result using
>>>>>>
>>>>> standard
>>>>
>>>>> Mahout methods.
>>>>>>
>>>>>> b) numeric fields ought to work somehow.
>>>>>>
>>>>>> c) if there are multiple text fields that ought to work sensibly as
>>>>>>
>>>>> well.
>>>>
>>>>>   Two options include dumping multiple matrices or to convert the
>>>>>> fields
>>>>>> into a single row of a single matrix.
>>>>>>
>>>>>> d) it should be possible to refer back from a row of the matrix to
>>>>>>
>>>>> find the
>>>>
>>>>> correct document.  THis might be because we remember the Lucene doc
>>>>>>
>>>>> number
>>>>
>>>>> or because a field is named as holding a unique id.
>>>>>>
>>>>>> e) named vectors and matrices should be used if plausible.
>>>>>>
>>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>>>>>
>>>>> dangeorge.filimon@gmail.com
>>>>
>>>>> wrote:
>>>>>>> ...
>>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>>>>
>>>>>> connection
>>>>>>
>>>>>>> to Lucene for clustering and classification"? It's too vague for an
>>>>>>>
>>>>>> idea
>>>>
>>>>> proposal.
>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Dr Andy Twigg
>>>> Junior Research Fellow, St Johns College, Oxford
>>>> Room 351, Department of Computer Science
>>>> http://www.cs.ox.ac.uk/people/**andy.twigg/<http://www.cs.ox.ac.uk/people/andy.twigg/>
>>>> andy.twigg@cs.ox.ac.uk | +447799647538
>>>>
>>>>
>>>
>

Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]

Posted by Shannon Quinn <sq...@gatech.edu>.

According to the GSoC calendar, accepted organizations aren't posted 
until April 8 (Monday), at which point (assuming Apache is accepted...I 
can't imagine it wouldn't be) slots will be doled out internally. This 
will probably take at least a day or two, so probably by middle of next 
week we'll know how many slots Mahout has.

Speaking of which: how do the various subprojects negotiate for slots? 
Is there a central spreadsheet, or an IRC meeting to attend? Or did I 
miss the email detailing this?

On 4/4/13 2:43 PM, Dan Filimon wrote:
> Any news on this front? Did we get approved/assigned a slot/anything?
>
>
> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <da...@gmail.com>wrote:
>
>> Ok, updated!
>>
>>
>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <an...@gmail.com> wrote:
>>
>>> Dan,
>>>
>>> I think what you've written is fine (I wanted to edit to remove the
>>> '?' around random forests but couldn't).
>>>
>>> ok?
>>>
>>>
>>>
>>> On 29 March 2013 11:14, Dan Filimon <da...@gmail.com> wrote:
>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>
>>>> Andy, could you flesh out your second suggestion into a project and
>>> make an
>>>> issue please?
>>>>
>>>>
>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <te...@gmail.com>
>>> wrote:
>>>>> It should be possible to view a Lucene index as a matrix.  This would
>>>>> require that we standardize on a way to convert documents to rows.
>>>   There
>>>>> are many choices, the discussion of which should be deferred to the
>>> actual
>>>>> work on the project, but there are a few obvious constraints:
>>>>>
>>>>> a) it should be possible to get the same result as dumping the term
>>> vectors
>>>>> for each document each to a line and converting that result using
>>> standard
>>>>> Mahout methods.
>>>>>
>>>>> b) numeric fields ought to work somehow.
>>>>>
>>>>> c) if there are multiple text fields that ought to work sensibly as
>>> well.
>>>>>   Two options include dumping multiple matrices or to convert the fields
>>>>> into a single row of a single matrix.
>>>>>
>>>>> d) it should be possible to refer back from a row of the matrix to
>>> find the
>>>>> correct document.  THis might be because we remember the Lucene doc
>>> number
>>>>> or because a field is named as holding a unique id.
>>>>>
>>>>> e) named vectors and matrices should be used if plausible.
>>>>>
>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>> dangeorge.filimon@gmail.com
>>>>>> wrote:
>>>>>> ...
>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>> connection
>>>>>> to Lucene for clustering and classification"? It's too vague for an
>>> idea
>>>>>> proposal.
>>>>>>
>>>
>>>
>>> --
>>> Dr Andy Twigg
>>> Junior Research Fellow, St Johns College, Oxford
>>> Room 351, Department of Computer Science
>>> http://www.cs.ox.ac.uk/people/andy.twigg/
>>> andy.twigg@cs.ox.ac.uk | +447799647538
>>>
>>