You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/07/07 13:37:24 UTC

ApacheCon Speakers

We need to decide on speakers, which means we need to decide on how to  
decide on speakers.  I'll try to put up a proposal sometime in the  
next few days, but others are welcome.

Re: ApacheCon Speakers

Posted by Grant Ingersoll <gs...@apache.org>.
On Jul 7, 2009, at 9:55 AM, Jukka Zitting wrote:

> Hi,
>
> On Tue, Jul 7, 2009 at 3:09 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> I am a little bit confused: I thought the call for papers/talks is  
>> over?
>
> Yes, the CFP has ended and we (currently just me and Grant) have all
> the submissions.



FWIW, All of them are already listed on Lucene Meetup page (http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009 
), so I think we can just state them.

I will send a follow up with some scheduling ideas.  We need to get  
this done.

CFPs:

===================================
TRACK: Lucene (two days)
===================================

347: Building Intelligent Search Applications with the Lucene Stack
Presentation, 60 minutes by Grant Ingersoll

Apache Lucene has evolved in recent years beyond a core search library
into a top level project containing a whole suite of tools for working
with content.  Starting with Solr, which builds on the core Lucene
search library, we can add in tools like Tika, Mahout, Droids and other
open source libraries to build intelligent search applications.
This talk will focus on how to leverage the various components
of the Lucene Stack to build out intelligent search applications
that better enable users to find what they are looking for in
today's sea of content.


366: Apache Solr: Out of the Box
Presentation, 60 minutes by Chris Hostetter

Apache Solr is an HTTP based enterprise search server built on top of
the Lucene Java search library.  In this session we will see how quick
and easy it can be to install and configure Solr to provide full-text
searching of structured data without needing to write any custom code.
We will demonstrate various built-in features such as:  loading data
from CSV files, tolerant parsing of user input, faceted searching,
highlighting matched text in results, and retrieving search results
in a variety of formats (XML, JSON, etc....) We will also look at
using Solr's Administrative interface to understand how different
text analysis configuration options affect our results, and why
various results score the way they do against different searches.
No previous Solr experience is expected.


367: Apache Solr: Beyond the Box
Presentation, 60 minutes by Chris Hostetter

Apache Solr is an HTTP based enterprise search server built
on top of the Lucene Java search library.  In this session we
will look at Solr's internal Java APIs and discuss how to write
various types of plugins for customizing it's behavior--
as well as some real world examples of "When" and "Why"
it makes sense to do so.


415: Implementing an Information Retrieval Framework for an
Organizational Repository
Presentation, 60 minutes by Sithu D Sudarsan

Successful Information Retrieval (IR) frameworks for large
repositories have been reported in recent times.  Invariably, all of
them have used machine readable repositories, where plain text
availability is the norm.  However, organizations with legacy
archives need to develop a framework which first converts
the non-electronic archive to electronic archive and then
extract machine readable text with an acceptable error rate.
The Food and Drug Administration (FDA) has electronic images
of the documents collected as part of their charter to approve
and monitor products related to health care.  These documents
date back multiple decades and have formats which range from
microfiche through early optical character recognition to recent
electronic formats.  We believe that a large knowledge base
hidden in them could be mined.  To mine this knowledge base,
we are developing a semantic mining framework using open
source tools such as lucene, pdfbox, solr, poi, and Java.
Challenges include determining the quality of text being extracted
and the ability to handle documents containing formatted text in part.
The text itself may contain specific vocabularies from medical,
legal, engineering and scientific domains and terminology that
evolves over time.  Careful thought needs to be given to selecting
analyzers for indexing and retrieval and implementing a framework
for heuristics useful to domain experts as well as novices.
An initial prototype is currently being evaluated with a sample size
of over 100,000 documents and 70GB of data for different extractors,
analyzers and search heuristics, with multiple indices for each
document stored in a distributed fashion.


424: Apache Mahout - Going from raw data to information
Presentation, 60 minutes by Isabel Drost

It has become very easy to create, publish, and collect data
in digital form.  The volume of structured and unstructured
data is increasing at tremendous pace.  This has led to a whole
new set of applications that can be build if one solves
the problem of turning raw data into valuable information.
Possible applications include but are not limited to:
Discovering new trends from a stream of weblog entries.
Automatic learning approaches for supplementing market research
processes for new products.

Machine learning provides tools for building such applications.
A large community of researchers has been working on the topic
of learning from data.  Although a lot of information on algorithms
and solutions to common problems are publicly available,
scaling these solutions into the range of terabytes and petabytes
is an open issue.  To scale algorithms to such dimensions it
is indispensable to distribute data as well as computation.
The mission of the Mahout project is to build a suite of scalable
machine learning algorithms that can cope with todays amount of data.
The project is built on top of Hadoop.

This talk provides a beginner-friendly introduction to the topic
of machine learning.  It presents a broad set of applications
that benefit machine learning.  The presentation gives a highlevel
overview of the project itself:  The types of tasks that can
be solved with each algorithm and the pitfalls one needs to look
out for when using it.


426: MIME Magic with Apache Tika
Presentation, 60 minutes by Jukka Zitting

Apache Tika is a Lucene subproject whose purpose is to make it
easier to extract metadata and structured text content from
all kinds of files.  Tika leverages libraries like Apache POI
and PDFBox to provide a powerful yet simple interface for parsing
dozens of document formats.  This makes Tika an ideal companion
for Apache Lucene or any other search engine that needs to be able
to index metadata and content from many different types of files.
This presentation introduces Apache Tika and shows how it's
being used in projects like Apache Solr and Apache Jackrabbit.
You will learn how to integrate Tika with your application
and how to configure and extend Tika to best suit your needs.
The presentation also summarizes the key characteristics
of the more widely used file formats and metadata standards,
and shows how Tika can help deal with that complexity.
The audience is expected to have basic understanding of Java
programming and MIME media types.


493: Solr Flair: User Interfaces, powered by Apache Solr
Presentation, 60 minutes by Erik Hatcher

Come see Solr in a new light, with snazzy innovative user interfaces.
We'll talk about Solr's flexible capabilities for driving custom
user interfaces and how projects like SolrJS and "Solritas"
bring Solr to the front-end. We'll experience user interfaces
in a variety of front-end technologies, including PHP, Ruby
on Rails, Java, Velocity, JQuery, and SIMILE Timeline.
We'll have Ajax, clouds, maps, timelines, and set visualizations, oh my!


512: Advanced Indexing Techniques with Apache Lucene
Presentation, 60 minutes by Michael Busch

Just as in 2007 and 2008 will we talk in this presentation about the
latest indexing and search innovations in Lucene and how to use them.
The payloads feature that was added in 2007 enabled many new
interesting use cases.  The Lucene developers continued working
on Flexible Indexing, and so far a new flexible TokenStream API,
a configurable indexing chain and pluggable indexing consumers
have been developed.  We are also working on column-stride fields,
a feature which will perform better than payloads for many use cases.
This talk will give an overview of the latest progress and demonstrate
the new features with interesting use cases.

Re: ApacheCon Speakers

Posted by Grant Ingersoll <gs...@apache.org>.
FWIW, I think the people who actually took the time to do the CFP  
deserve some extra credit, whatever that means.  Besides, given the  
number of submissions, I think we can accommodate all the CFPs and  
many of the others that have been proposed on the Wiki.



On Jul 7, 2009, at 9:55 AM, Jukka Zitting wrote:

> Hi,
>
> On Tue, Jul 7, 2009 at 3:09 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> I am a little bit confused: I thought the call for papers/talks is  
>> over?
>
> Yes, the CFP has ended and we (currently just me and Grant) have all
> the submissions.
>
> However, we do have quite a bit of freedom in planning the Lucene
> track, so we can look beyond the CFP submissions in selecting the
> talks. For example if there is no CFP submission on a certain topic
> then we could still ask someone to come up with such a presentation.
>
> There are also a number of presentation proposals that were made on
> the wiki page after the normal CFP had already ended. We can also look
> at those proposals when selecting the talks, though I would give
> preference to the CFP submissions as they tend to be more complete
> (summaries, speaker bio, etc.).
>
>> Nevertheless I will hopefully be at the conference (as speaker or  
>> not) in
>> October and meet all of you again!
>
> Excellent, looking forward to meeting you again!
>
> BR,
>
> Jukka Zitting



Re: ApacheCon Speakers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Jul 7, 2009 at 3:09 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> I am a little bit confused: I thought the call for papers/talks is over?

Yes, the CFP has ended and we (currently just me and Grant) have all
the submissions.

However, we do have quite a bit of freedom in planning the Lucene
track, so we can look beyond the CFP submissions in selecting the
talks. For example if there is no CFP submission on a certain topic
then we could still ask someone to come up with such a presentation.

There are also a number of presentation proposals that were made on
the wiki page after the normal CFP had already ended. We can also look
at those proposals when selecting the talks, though I would give
preference to the CFP submissions as they tend to be more complete
(summaries, speaker bio, etc.).

> Nevertheless I will hopefully be at the conference (as speaker or not) in
> October and meet all of you again!

Excellent, looking forward to meeting you again!

BR,

Jukka Zitting

RE: ApacheCon Speakers

Posted by Uwe Schindler <uw...@thetaphi.de>.
I am a little bit confused: I thought the call for papers/talks is over? Or
is this an internal discussion for the Lucene track, what talks could be
provided by whom? As this is my first time in the planning phase, I need
some clarification. :)

Nevertheless I will hopefully be at the conference (as speaker or not) in
October and meet all of you again!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org]
> Sent: Tuesday, July 07, 2009 1:37 PM
> To: general@lucene.apache.org
> Subject: ApacheCon Speakers
> 
> We need to decide on speakers, which means we need to decide on how to
> decide on speakers.  I'll try to put up a proposal sometime in the
> next few days, but others are welcome.


Re: ApacheCon Speakers

Posted by Chris Hostetter <ho...@fucit.org>.
: I would propose that we reserve the first presentation slot for an
: overview of Lucene and short (2-5 minutes, one slide) introductions of
: all the subprojects.

+1 ... we're up to 10 subprojects, so the first 50 minute block is perfect 
-- particularly since it's followed by a break to give people a chance to 
look at the whole con schedule and decide what talks they want to go to 
based on the summary of the projects.

we should make sure each of those 2-5 minute summaries draw attention to 
all of the sessions that will be touching that subproject (a slide listing 
them while the speaker talks about what hte project is would probably be 
enough)

: Beyond that I'd focus Lucene Java talks on one day and Solr talks on
: the other, and sprinkle the other topics in between there.

Hmmm... that's a tough call.  ther's a lot of powering in having solid 
"tracks" for people to go to, but personally i think it's better to mix 
things up over multiple days so you cna benefit from word of mouth -- It 
sucks to meet up with some people for dinner, and hear about this cool 
session they went to that day where they heard about this awesome Project 
Y, and then find out that every session covering Y is over because they 
were all on that one day.

Perhaps a better strategy would be to have the "no experience required" 
type talks (intro to X, tutorials for novice users, case studies, etc...) 
on thursday, and then on Friday do the more advanced talks where people 
are already expected to be fairly familiar with the projects (advanced X, 
performance tuning, new features in X, making X work with Y, etc...)

(there typically tend to be more "intro" talks then "advanced" talks, but 
it's no big deal to also have an intro talk on friday, you just make it a 
topic that doesn't have a corrisponding advanced talk)

: If scheduling gets tight, we could even combine the two slots at 16-18
: into a mini "Fast Feather Talk" session where we could do a series of
: short 15 minute presentations of various things that wouldn't
: necessarily fit in the normal schedule.

fast feather or lightening talks can be cool -- but typically only when 
they are unscheduled and just involve people talking (with NO laptop 
setup ... *maybe* with a whiteboard) for no more then 5 minutes or so.

there is a lot to be said however for combining some shorter talks -- four 
30 minute talks, or two 45 minutes and one 30 minute -- provided the 
speakers all coordinate in advance (merge their slides into a single 
"deck", use only one computer so there's less setup overhead, do one joint 
round of introductions, have it listed on the schedule as one block so 
there's no 10 minute intermission, etc...)


-Hoss


Re: ApacheCon Speakers

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Tue, Jul 7, 2009 at 1:37 PM, Grant Ingersoll<gs...@apache.org> wrote:
> We need to decide on speakers, which means we need to decide on how to
> decide on speakers.  I'll try to put up a proposal sometime in the next few
> days, but others are welcome.

Here are some ideas that have been proposed or used elsewhere:

    * Discuss (in private or public) until a consensus is reached
    * Set up a small committee (1-3 persons) to do the selection
    * Use some voting system to select the most popular talks

Before we decide on the actual speakers it would be useful to come up
with a rough idea of how we want to allocate the two days we have to
various topics and whether we want to fill everything with traditional
talks or perhaps sprinkle workshops or other kinds of content in
between.

I would propose that we reserve the first presentation slot for an
overview of Lucene and short (2-5 minutes, one slide) introductions of
all the subprojects.

Beyond that I'd focus Lucene Java talks on one day and Solr talks on
the other, and sprinkle the other topics in between there.

If scheduling gets tight, we could even combine the two slots at 16-18
into a mini "Fast Feather Talk" session where we could do a series of
short 15 minute presentations of various things that wouldn't
necessarily fit in the normal schedule.

PS. I added the track schedule template to
http://wiki.apache.org/lucene-java/LuceneAtApacheConUs2009.

BR,

Jukka Zitting