You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Furkan KAMACI <fu...@gmail.com> on 2013/04/02 14:45:31 UTC

Flow Chart of Solr

Is there any documentation something like flow chart of Solr. i.e.
Documents comes into Solr(maybe indicating which classes get documents) and
goes to parsing process (i.e. stemming processes etc.) and then reverse
indexes are get so on so forth?

Re: Flow Chart of Solr

Posted by Lance Norskog <go...@gmail.com>.
Seconded. Single-stepping really is the best way to follow the logic 
chains and see how the data mutates.

On 04/05/2013 06:36 AM, Erick Erickson wrote:
> Then there's my lazy method. Fire up the IDE and find a test case that
> looks close to something you want to understand further. Step through
> it all in the debugger. I admit there'll be some fumbling at the start
> to _find_ the test case, but they're pretty well named. In IntelliJ,
> all you have to do is right-click on the test case and the context
> menu says "debug blahbalbhabl".... You can chart the class
> relationships you actually wind up in as you go. This seems tedious,
> but it saves me getting lost in the class hierarchy.
>
> Also, there are some convenient tools in the IDE that will show you
> class hierarchies as you need.
>
> Or attach your debugger to a running Solr, which is actually very
> easy. In IntelliJ (and Eclipse has something very similar), create a
> "remote" project. That'll specify some parameters you start up with,
> e.g.:
> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
> -jar start.jar
>
> Now start up the remote debugging session you just created in the IDE
> and you are attached to a live solr instance and able to step through
> any code you want.
>
> Either way, you can make the IDE work for you!
>
> FWIW,
> Erick
>
> On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>> We're using the 4.x branch code as the basis for our writing. So,
>> effectively it will be for at least 4.3 when the book comes out in the
>> summer.
>>
>> Early access will be in about a month or so. O'Reilly will be showing a
>> galley proof for 200 pages of the book next week at Big Data TechCon next
>> week in Boston.
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Jack Park
>> Sent: Wednesday, April 03, 2013 12:56 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>> Jack,
>>
>> Is that new book up to the 4.+ series?
>>
>> Thanks
>> The other Jack
>>
>> On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <ja...@basetechnology.com>
>> wrote:
>>> And another one on the way:
>>>
>>> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>>>
>>> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Jack Park
>>> Sent: Wednesday, April 03, 2013 11:25 AM
>>>
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Flow Chart of Solr
>>>
>>> There are three books on Solr, two with that in the title, and one,
>>> Taming Text, each of which have been very valuable in understanding
>>> Solr.
>>>
>>> Jack
>>>
>>> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com>
>>> wrote:
>>>>
>>>> Sure, yes. But... it comes down to what level of detail you want and need
>>>> for a specific task. In other words, there are probably a dozen or more
>>>> levels of detail. The reality is that if you are going to work at the
>>>> Solr
>>>> code level, that is very, very different than being a "user" of Solr, and
>>>> at
>>>> that point your first step is to become familiar with the code itself.
>>>>
>>>> When you talk about "parsing" and "stemming", you are really talking
>>>> about
>>>> the user-level, not the Solr code level. Maybe what you really need is a
>>>> cheat sheet that maps a user-visible feature to the main Solr code
>>>> component
>>>> for that implements that user feature.
>>>>
>>>> There are a number of different forms of "parsing" in Solr - parsing of
>>>> what? Queries? Requests? Solr documents? Function queries?
>>>>
>>>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>>>> that.
>>>> Lucene does all of the "token filtering". Are you asking for details on
>>>> how
>>>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>>>> split between Solr and Lucene. Or maybe you simply wanted to know when
>>>> and
>>>> where term analysis is done. Tell us your specific problem or specific
>>>> question and we can probably quickly give you an answer.
>>>>
>>>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>>>> user-level
>>>> diagrams, but not down to the code level.
>>>>
>>>> If you could focus on specific questions, we could give you specific
>>>> answers.
>>>>
>>>> "Main steps"? That depends on what level you are working at. Tell us what
>>>> problem you are trying to solve and we can point you to the relevant
>>>> areas.
>>>>
>>>> In truth, if you become generally familiar with Solr at the user level
>>>> (study the wikis), you will already know what the "main steps" are.
>>>>
>>>> So, it is not "main steps of Solr", but main steps of some specific
>>>> "request" of Solr, and for a specified level of detail, and for a
>>>> specified
>>>> area of Solr if greater detail is needed. Be more specific, and then we
>>>> can
>>>> be more specific.
>>>>
>>>> For now, the general advice for people who need or want to go far beyond
>>>> the
>>>> user level is to "get familiar with the code" - just LOOK at it - a lot
>>>> of
>>>> the package and class names are OBVIOUS, really, and follow the class
>>>> hierarchy and code flow using the standard features of any modern Java
>>>> IDE.
>>>> If you are wondering where to start for some specific user-level feature,
>>>> please ask specifically about that feature. But... make a diligent effort
>>>> to
>>>> discover and learn on your own before asking open-ended questions.
>>>>
>>>> Sure, there are lots of things in Lucene and Solr that are rather complex
>>>> and seemingly convoluted, and not obvious, but people are more than
>>>> willing
>>>> to help you out if you simply ask a specific question. I mean, not
>>>> everybody
>>>> needs to know the fine detail of query parsing, analysis, building a
>>>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>>>> most
>>>> people would be more confused than enlightened.
>>>>
>>>> At which step are scores calculated? That's more of a Lucene question.
>>>> Or,
>>>> are you really asking what code in Solr invokes Lucene search methods
>>>> that
>>>> calculate basic scores?
>>>>
>>>> In short, you need to be more specific. Don't force us to guess what
>>>> problem
>>>> you are trying to solve.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> -----Original Message----- From: Furkan KAMACI
>>>> Sent: Wednesday, April 03, 2013 6:52 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Flow Chart of Solr
>>>>
>>>>
>>>> So, all in all, is there anybody who can write down just main steps of
>>>> Solr(including parsing, stemming etc.)?
>>>>
>>>>
>>>> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>>>>
>>>>> I think about myself as an example. I have started to make research
>>>>> about
>>>>> Solr just for some weeks. I have learned Solr and its related projects.
>>>>> My
>>>>> next step writing down the main steps Solr. We have separated learning
>>>>> curve of Solr into two main categories.
>>>>> First one is who are using it as out of the box components. Second one
>>>>> is
>>>>> developer side.
>>>>>
>>>>> Actually developer side branches into two way.
>>>>>
>>>>> First one is general steps of it. i.e. document comes into Solr (i.e.
>>>>> crawled data of Nutch). which analyzing processes are going to done
>>>>> (stamming, hamming etc.), what will be doing after parsing step by step.
>>>>> When a search query happens what happens step by step, at which step
>>>>> scores
>>>>> are calculated so on so forth.
>>>>> Second one is more code specific i.e. which handlers takes into account
>>>>> data that will going to be indexed(no need the explain every handler at
>>>>> this step) . Which are the analyzer, tokenizer classes and what are the
>>>>> flow between them. How response handlers works and what are they.
>>>>>
>>>>> Also explaining about cloud side is other work.
>>>>>
>>>>> Some of explanations are currently presents at wiki (but some of them
>>>>> are
>>>>> at very deep places at wiki and it is not easy to find the parent topic
>>>>> of
>>>>> it, maybe starting wiki from a top age and branching all other topics as
>>>>> possible as from it could be better)
>>>>>
>>>>> If we could show the big picture, and beside of it the smaller pictures
>>>>> within it, it would be great (if you know the main parts it will be easy
>>>>> to
>>>>> go deep into the code i.e. you don't need to explain every handler, if
>>>>> you
>>>>> show the way to the developer he/she could debug and find the needs)
>>>>>
>>>>> When I think about myself as an example, I have to write down the steps
>>>>> of
>>>>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>>>>> see that it is not easy even writing down the big picture of developer
>>>>> side.
>>>>>
>>>>>
>>>>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>>>>
>>>>>> Yago,
>>>>>>
>>>>>> My point - perhaps lost in too much text - was that Solr is presented -
>>>>>> and
>>>>>> can function - as a black-box. Which makes it different from more
>>>>>> traditional open-source project. So, the stage-2 happens exactly when
>>>>>> the
>>>>>> non-programmers have to cross the boundary from the black-box into
>>>>>> code-first approach and the hand-off is not particularly smooth. Or
>>>>>> even
>>>>>> when - say - php or .Net programmer  tries to get beyond the basic
>>>>>> operations their client library and has the understand the server-side
>>>>>> aspects of Solr.
>>>>>>
>>>>>> Regards,
>>>>>>     Alex.
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Alexandre,
>>>>>>>
>>>>>>> You describe the normal path when a beginner try to use a source of >
>>>>>>> code
>>>>>>> that doesn't understand, black-box, reading code, hacking, ok now I >
>>>>>>> know
>>>>>>> 10% of the project, with lucky :p.
>>>>>>>
>>>>>>
>>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>> - Time is the quality of nature that keeps events from happening all at
>>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>>> book)
>>>>>>
>>>>>


Re: Flow Chart of Solr

Posted by Furkan KAMACI <fu...@gmail.com>.
I have read books and wikis of Solr and Lucene and I had to debug the code
to find which parts comes from other. I will tidy up my notes and share the
pig picture flow and the detailed one. After that I will ask you for your
opinions, thanks.


2013/4/5 Erick Erickson <er...@gmail.com>

> Then there's my lazy method. Fire up the IDE and find a test case that
> looks close to something you want to understand further. Step through
> it all in the debugger. I admit there'll be some fumbling at the start
> to _find_ the test case, but they're pretty well named. In IntelliJ,
> all you have to do is right-click on the test case and the context
> menu says "debug blahbalbhabl".... You can chart the class
> relationships you actually wind up in as you go. This seems tedious,
> but it saves me getting lost in the class hierarchy.
>
> Also, there are some convenient tools in the IDE that will show you
> class hierarchies as you need.
>
> Or attach your debugger to a running Solr, which is actually very
> easy. In IntelliJ (and Eclipse has something very similar), create a
> "remote" project. That'll specify some parameters you start up with,
> e.g.:
> java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
> -jar start.jar
>
> Now start up the remote debugging session you just created in the IDE
> and you are attached to a live solr instance and able to step through
> any code you want.
>
> Either way, you can make the IDE work for you!
>
> FWIW,
> Erick
>
> On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
> > We're using the 4.x branch code as the basis for our writing. So,
> > effectively it will be for at least 4.3 when the book comes out in the
> > summer.
> >
> > Early access will be in about a month or so. O'Reilly will be showing a
> > galley proof for 200 pages of the book next week at Big Data TechCon next
> > week in Boston.
> >
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Jack Park
> > Sent: Wednesday, April 03, 2013 12:56 PM
> >
> > To: solr-user@lucene.apache.org
> > Subject: Re: Flow Chart of Solr
> >
> > Jack,
> >
> > Is that new book up to the 4.+ series?
> >
> > Thanks
> > The other Jack
> >
> > On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <ja...@basetechnology.com>
> > wrote:
> >>
> >> And another one on the way:
> >>
> >>
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
> >>
> >> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
> >>
> >> -- Jack Krupansky
> >>
> >> -----Original Message----- From: Jack Park
> >> Sent: Wednesday, April 03, 2013 11:25 AM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Flow Chart of Solr
> >>
> >> There are three books on Solr, two with that in the title, and one,
> >> Taming Text, each of which have been very valuable in understanding
> >> Solr.
> >>
> >> Jack
> >>
> >> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <jack@basetechnology.com
> >
> >> wrote:
> >>>
> >>>
> >>> Sure, yes. But... it comes down to what level of detail you want and
> need
> >>> for a specific task. In other words, there are probably a dozen or more
> >>> levels of detail. The reality is that if you are going to work at the
> >>> Solr
> >>> code level, that is very, very different than being a "user" of Solr,
> and
> >>> at
> >>> that point your first step is to become familiar with the code itself.
> >>>
> >>> When you talk about "parsing" and "stemming", you are really talking
> >>> about
> >>> the user-level, not the Solr code level. Maybe what you really need is
> a
> >>> cheat sheet that maps a user-visible feature to the main Solr code
> >>> component
> >>> for that implements that user feature.
> >>>
> >>> There are a number of different forms of "parsing" in Solr - parsing of
> >>> what? Queries? Requests? Solr documents? Function queries?
> >>>
> >>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
> >>> that.
> >>> Lucene does all of the "token filtering". Are you asking for details on
> >>> how
> >>> Lucene works? Maybe you meant to ask how "term analysis" works, which
> is
> >>> split between Solr and Lucene. Or maybe you simply wanted to know when
> >>> and
> >>> where term analysis is done. Tell us your specific problem or specific
> >>> question and we can probably quickly give you an answer.
> >>>
> >>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
> >>> user-level
> >>> diagrams, but not down to the code level.
> >>>
> >>> If you could focus on specific questions, we could give you specific
> >>> answers.
> >>>
> >>> "Main steps"? That depends on what level you are working at. Tell us
> what
> >>> problem you are trying to solve and we can point you to the relevant
> >>> areas.
> >>>
> >>> In truth, if you become generally familiar with Solr at the user level
> >>> (study the wikis), you will already know what the "main steps" are.
> >>>
> >>> So, it is not "main steps of Solr", but main steps of some specific
> >>> "request" of Solr, and for a specified level of detail, and for a
> >>> specified
> >>> area of Solr if greater detail is needed. Be more specific, and then we
> >>> can
> >>> be more specific.
> >>>
> >>> For now, the general advice for people who need or want to go far
> beyond
> >>> the
> >>> user level is to "get familiar with the code" - just LOOK at it - a lot
> >>> of
> >>> the package and class names are OBVIOUS, really, and follow the class
> >>> hierarchy and code flow using the standard features of any modern Java
> >>> IDE.
> >>> If you are wondering where to start for some specific user-level
> feature,
> >>> please ask specifically about that feature. But... make a diligent
> effort
> >>> to
> >>> discover and learn on your own before asking open-ended questions.
> >>>
> >>> Sure, there are lots of things in Lucene and Solr that are rather
> complex
> >>> and seemingly convoluted, and not obvious, but people are more than
> >>> willing
> >>> to help you out if you simply ask a specific question. I mean, not
> >>> everybody
> >>> needs to know the fine detail of query parsing, analysis, building a
> >>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
> >>> most
> >>> people would be more confused than enlightened.
> >>>
> >>> At which step are scores calculated? That's more of a Lucene question.
> >>> Or,
> >>> are you really asking what code in Solr invokes Lucene search methods
> >>> that
> >>> calculate basic scores?
> >>>
> >>> In short, you need to be more specific. Don't force us to guess what
> >>> problem
> >>> you are trying to solve.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -----Original Message----- From: Furkan KAMACI
> >>> Sent: Wednesday, April 03, 2013 6:52 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Flow Chart of Solr
> >>>
> >>>
> >>> So, all in all, is there anybody who can write down just main steps of
> >>> Solr(including parsing, stemming etc.)?
> >>>
> >>>
> >>> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
> >>>
> >>>> I think about myself as an example. I have started to make research
> >>>> about
> >>>> Solr just for some weeks. I have learned Solr and its related
> projects.
> >>>> My
> >>>> next step writing down the main steps Solr. We have separated learning
> >>>> curve of Solr into two main categories.
> >>>> First one is who are using it as out of the box components. Second one
> >>>> is
> >>>> developer side.
> >>>>
> >>>> Actually developer side branches into two way.
> >>>>
> >>>> First one is general steps of it. i.e. document comes into Solr (i.e.
> >>>> crawled data of Nutch). which analyzing processes are going to done
> >>>> (stamming, hamming etc.), what will be doing after parsing step by
> step.
> >>>> When a search query happens what happens step by step, at which step
> >>>> scores
> >>>> are calculated so on so forth.
> >>>> Second one is more code specific i.e. which handlers takes into
> account
> >>>> data that will going to be indexed(no need the explain every handler
> at
> >>>> this step) . Which are the analyzer, tokenizer classes and what are
> the
> >>>> flow between them. How response handlers works and what are they.
> >>>>
> >>>> Also explaining about cloud side is other work.
> >>>>
> >>>> Some of explanations are currently presents at wiki (but some of them
> >>>> are
> >>>> at very deep places at wiki and it is not easy to find the parent
> topic
> >>>> of
> >>>> it, maybe starting wiki from a top age and branching all other topics
> as
> >>>> possible as from it could be better)
> >>>>
> >>>> If we could show the big picture, and beside of it the smaller
> pictures
> >>>> within it, it would be great (if you know the main parts it will be
> easy
> >>>> to
> >>>> go deep into the code i.e. you don't need to explain every handler, if
> >>>> you
> >>>> show the way to the developer he/she could debug and find the needs)
> >>>>
> >>>> When I think about myself as an example, I have to write down the
> steps
> >>>> of
> >>>> Solr a bit detail  even I read many pages at wiki and a book about
> it, I
> >>>> see that it is not easy even writing down the big picture of developer
> >>>> side.
> >>>>
> >>>>
> >>>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
> >>>>
> >>>>> Yago,
> >>>>>
> >>>>> My point - perhaps lost in too much text - was that Solr is
> presented -
> >>>>> and
> >>>>> can function - as a black-box. Which makes it different from more
> >>>>> traditional open-source project. So, the stage-2 happens exactly when
> >>>>> the
> >>>>> non-programmers have to cross the boundary from the black-box into
> >>>>> code-first approach and the hand-off is not particularly smooth. Or
> >>>>> even
> >>>>> when - say - php or .Net programmer  tries to get beyond the basic
> >>>>> operations their client library and has the understand the
> server-side
> >>>>> aspects of Solr.
> >>>>>
> >>>>> Regards,
> >>>>>    Alex.
> >>>>>
> >>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <yago.riveiro@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>> > Alexandre,
> >>>>> >
> >>>>> > You describe the normal path when a beginner try to use a source
> of >
> >>>>> > code
> >>>>> > that doesn't understand, black-box, reading code, hacking, ok now
> I >
> >>>>> > know
> >>>>> > 10% of the project, with lucky :p.
> >>>>> >
> >>>>>
> >>>>>
> >>>>> Personal blog: http://blog.outerthoughts.com/
> >>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>>>> - Time is the quality of nature that keeps events from happening all
> at
> >>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> >>>>> book)
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Flow Chart of Solr

Posted by Erick Erickson <er...@gmail.com>.
Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
it all in the debugger. I admit there'll be some fumbling at the start
to _find_ the test case, but they're pretty well named. In IntelliJ,
all you have to do is right-click on the test case and the context
menu says "debug blahbalbhabl".... You can chart the class
relationships you actually wind up in as you go. This seems tedious,
but it saves me getting lost in the class hierarchy.

Also, there are some convenient tools in the IDE that will show you
class hierarchies as you need.

Or attach your debugger to a running Solr, which is actually very
easy. In IntelliJ (and Eclipse has something very similar), create a
"remote" project. That'll specify some parameters you start up with,
e.g.:
java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5900
-jar start.jar

Now start up the remote debugging session you just created in the IDE
and you are attached to a live solr instance and able to step through
any code you want.

Either way, you can make the IDE work for you!

FWIW,
Erick

On Wed, Apr 3, 2013 at 12:03 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> We're using the 4.x branch code as the basis for our writing. So,
> effectively it will be for at least 4.3 when the book comes out in the
> summer.
>
> Early access will be in about a month or so. O'Reilly will be showing a
> galley proof for 200 pages of the book next week at Big Data TechCon next
> week in Boston.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jack Park
> Sent: Wednesday, April 03, 2013 12:56 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
> Jack,
>
> Is that new book up to the 4.+ series?
>
> Thanks
> The other Jack
>
> On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>>
>> And another one on the way:
>>
>> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>>
>> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Jack Park
>> Sent: Wednesday, April 03, 2013 11:25 AM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>> There are three books on Solr, two with that in the title, and one,
>> Taming Text, each of which have been very valuable in understanding
>> Solr.
>>
>> Jack
>>
>> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com>
>> wrote:
>>>
>>>
>>> Sure, yes. But... it comes down to what level of detail you want and need
>>> for a specific task. In other words, there are probably a dozen or more
>>> levels of detail. The reality is that if you are going to work at the
>>> Solr
>>> code level, that is very, very different than being a "user" of Solr, and
>>> at
>>> that point your first step is to become familiar with the code itself.
>>>
>>> When you talk about "parsing" and "stemming", you are really talking
>>> about
>>> the user-level, not the Solr code level. Maybe what you really need is a
>>> cheat sheet that maps a user-visible feature to the main Solr code
>>> component
>>> for that implements that user feature.
>>>
>>> There are a number of different forms of "parsing" in Solr - parsing of
>>> what? Queries? Requests? Solr documents? Function queries?
>>>
>>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>>> that.
>>> Lucene does all of the "token filtering". Are you asking for details on
>>> how
>>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>>> split between Solr and Lucene. Or maybe you simply wanted to know when
>>> and
>>> where term analysis is done. Tell us your specific problem or specific
>>> question and we can probably quickly give you an answer.
>>>
>>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>>> user-level
>>> diagrams, but not down to the code level.
>>>
>>> If you could focus on specific questions, we could give you specific
>>> answers.
>>>
>>> "Main steps"? That depends on what level you are working at. Tell us what
>>> problem you are trying to solve and we can point you to the relevant
>>> areas.
>>>
>>> In truth, if you become generally familiar with Solr at the user level
>>> (study the wikis), you will already know what the "main steps" are.
>>>
>>> So, it is not "main steps of Solr", but main steps of some specific
>>> "request" of Solr, and for a specified level of detail, and for a
>>> specified
>>> area of Solr if greater detail is needed. Be more specific, and then we
>>> can
>>> be more specific.
>>>
>>> For now, the general advice for people who need or want to go far beyond
>>> the
>>> user level is to "get familiar with the code" - just LOOK at it - a lot
>>> of
>>> the package and class names are OBVIOUS, really, and follow the class
>>> hierarchy and code flow using the standard features of any modern Java
>>> IDE.
>>> If you are wondering where to start for some specific user-level feature,
>>> please ask specifically about that feature. But... make a diligent effort
>>> to
>>> discover and learn on your own before asking open-ended questions.
>>>
>>> Sure, there are lots of things in Lucene and Solr that are rather complex
>>> and seemingly convoluted, and not obvious, but people are more than
>>> willing
>>> to help you out if you simply ask a specific question. I mean, not
>>> everybody
>>> needs to know the fine detail of query parsing, analysis, building a
>>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>>> most
>>> people would be more confused than enlightened.
>>>
>>> At which step are scores calculated? That's more of a Lucene question.
>>> Or,
>>> are you really asking what code in Solr invokes Lucene search methods
>>> that
>>> calculate basic scores?
>>>
>>> In short, you need to be more specific. Don't force us to guess what
>>> problem
>>> you are trying to solve.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Furkan KAMACI
>>> Sent: Wednesday, April 03, 2013 6:52 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Flow Chart of Solr
>>>
>>>
>>> So, all in all, is there anybody who can write down just main steps of
>>> Solr(including parsing, stemming etc.)?
>>>
>>>
>>> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>>>
>>>> I think about myself as an example. I have started to make research
>>>> about
>>>> Solr just for some weeks. I have learned Solr and its related projects.
>>>> My
>>>> next step writing down the main steps Solr. We have separated learning
>>>> curve of Solr into two main categories.
>>>> First one is who are using it as out of the box components. Second one
>>>> is
>>>> developer side.
>>>>
>>>> Actually developer side branches into two way.
>>>>
>>>> First one is general steps of it. i.e. document comes into Solr (i.e.
>>>> crawled data of Nutch). which analyzing processes are going to done
>>>> (stamming, hamming etc.), what will be doing after parsing step by step.
>>>> When a search query happens what happens step by step, at which step
>>>> scores
>>>> are calculated so on so forth.
>>>> Second one is more code specific i.e. which handlers takes into account
>>>> data that will going to be indexed(no need the explain every handler at
>>>> this step) . Which are the analyzer, tokenizer classes and what are the
>>>> flow between them. How response handlers works and what are they.
>>>>
>>>> Also explaining about cloud side is other work.
>>>>
>>>> Some of explanations are currently presents at wiki (but some of them
>>>> are
>>>> at very deep places at wiki and it is not easy to find the parent topic
>>>> of
>>>> it, maybe starting wiki from a top age and branching all other topics as
>>>> possible as from it could be better)
>>>>
>>>> If we could show the big picture, and beside of it the smaller pictures
>>>> within it, it would be great (if you know the main parts it will be easy
>>>> to
>>>> go deep into the code i.e. you don't need to explain every handler, if
>>>> you
>>>> show the way to the developer he/she could debug and find the needs)
>>>>
>>>> When I think about myself as an example, I have to write down the steps
>>>> of
>>>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>>>> see that it is not easy even writing down the big picture of developer
>>>> side.
>>>>
>>>>
>>>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>>>
>>>>> Yago,
>>>>>
>>>>> My point - perhaps lost in too much text - was that Solr is presented -
>>>>> and
>>>>> can function - as a black-box. Which makes it different from more
>>>>> traditional open-source project. So, the stage-2 happens exactly when
>>>>> the
>>>>> non-programmers have to cross the boundary from the black-box into
>>>>> code-first approach and the hand-off is not particularly smooth. Or
>>>>> even
>>>>> when - say - php or .Net programmer  tries to get beyond the basic
>>>>> operations their client library and has the understand the server-side
>>>>> aspects of Solr.
>>>>>
>>>>> Regards,
>>>>>    Alex.
>>>>>
>>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Alexandre,
>>>>> >
>>>>> > You describe the normal path when a beginner try to use a source of >
>>>>> > code
>>>>> > that doesn't understand, black-box, reading code, hacking, ok now I >
>>>>> > know
>>>>> > 10% of the project, with lucky :p.
>>>>> >
>>>>>
>>>>>
>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>> - Time is the quality of nature that keeps events from happening all at
>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>>> book)
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Flow Chart of Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
We're using the 4.x branch code as the basis for our writing. So, 
effectively it will be for at least 4.3 when the book comes out in the 
summer.

Early access will be in about a month or so. O'Reilly will be showing a 
galley proof for 200 pages of the book next week at Big Data TechCon next 
week in Boston.

-- Jack Krupansky

-----Original Message----- 
From: Jack Park
Sent: Wednesday, April 03, 2013 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
> And another one on the way:
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>
> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jack Park
> Sent: Wednesday, April 03, 2013 11:25 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
> There are three books on Solr, two with that in the title, and one,
> Taming Text, each of which have been very valuable in understanding
> Solr.
>
> Jack
>
> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>>
>> Sure, yes. But... it comes down to what level of detail you want and need
>> for a specific task. In other words, there are probably a dozen or more
>> levels of detail. The reality is that if you are going to work at the 
>> Solr
>> code level, that is very, very different than being a "user" of Solr, and
>> at
>> that point your first step is to become familiar with the code itself.
>>
>> When you talk about "parsing" and "stemming", you are really talking 
>> about
>> the user-level, not the Solr code level. Maybe what you really need is a
>> cheat sheet that maps a user-visible feature to the main Solr code
>> component
>> for that implements that user feature.
>>
>> There are a number of different forms of "parsing" in Solr - parsing of
>> what? Queries? Requests? Solr documents? Function queries?
>>
>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>> that.
>> Lucene does all of the "token filtering". Are you asking for details on
>> how
>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>> split between Solr and Lucene. Or maybe you simply wanted to know when 
>> and
>> where term analysis is done. Tell us your specific problem or specific
>> question and we can probably quickly give you an answer.
>>
>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>> user-level
>> diagrams, but not down to the code level.
>>
>> If you could focus on specific questions, we could give you specific
>> answers.
>>
>> "Main steps"? That depends on what level you are working at. Tell us what
>> problem you are trying to solve and we can point you to the relevant
>> areas.
>>
>> In truth, if you become generally familiar with Solr at the user level
>> (study the wikis), you will already know what the "main steps" are.
>>
>> So, it is not "main steps of Solr", but main steps of some specific
>> "request" of Solr, and for a specified level of detail, and for a
>> specified
>> area of Solr if greater detail is needed. Be more specific, and then we
>> can
>> be more specific.
>>
>> For now, the general advice for people who need or want to go far beyond
>> the
>> user level is to "get familiar with the code" - just LOOK at it - a lot 
>> of
>> the package and class names are OBVIOUS, really, and follow the class
>> hierarchy and code flow using the standard features of any modern Java
>> IDE.
>> If you are wondering where to start for some specific user-level feature,
>> please ask specifically about that feature. But... make a diligent effort
>> to
>> discover and learn on your own before asking open-ended questions.
>>
>> Sure, there are lots of things in Lucene and Solr that are rather complex
>> and seemingly convoluted, and not obvious, but people are more than
>> willing
>> to help you out if you simply ask a specific question. I mean, not
>> everybody
>> needs to know the fine detail of query parsing, analysis, building a
>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>> most
>> people would be more confused than enlightened.
>>
>> At which step are scores calculated? That's more of a Lucene question. 
>> Or,
>> are you really asking what code in Solr invokes Lucene search methods 
>> that
>> calculate basic scores?
>>
>> In short, you need to be more specific. Don't force us to guess what
>> problem
>> you are trying to solve.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Furkan KAMACI
>> Sent: Wednesday, April 03, 2013 6:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>>
>> So, all in all, is there anybody who can write down just main steps of
>> Solr(including parsing, stemming etc.)?
>>
>>
>> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>>
>>> I think about myself as an example. I have started to make research 
>>> about
>>> Solr just for some weeks. I have learned Solr and its related projects.
>>> My
>>> next step writing down the main steps Solr. We have separated learning
>>> curve of Solr into two main categories.
>>> First one is who are using it as out of the box components. Second one 
>>> is
>>> developer side.
>>>
>>> Actually developer side branches into two way.
>>>
>>> First one is general steps of it. i.e. document comes into Solr (i.e.
>>> crawled data of Nutch). which analyzing processes are going to done
>>> (stamming, hamming etc.), what will be doing after parsing step by step.
>>> When a search query happens what happens step by step, at which step
>>> scores
>>> are calculated so on so forth.
>>> Second one is more code specific i.e. which handlers takes into account
>>> data that will going to be indexed(no need the explain every handler at
>>> this step) . Which are the analyzer, tokenizer classes and what are the
>>> flow between them. How response handlers works and what are they.
>>>
>>> Also explaining about cloud side is other work.
>>>
>>> Some of explanations are currently presents at wiki (but some of them 
>>> are
>>> at very deep places at wiki and it is not easy to find the parent topic
>>> of
>>> it, maybe starting wiki from a top age and branching all other topics as
>>> possible as from it could be better)
>>>
>>> If we could show the big picture, and beside of it the smaller pictures
>>> within it, it would be great (if you know the main parts it will be easy
>>> to
>>> go deep into the code i.e. you don't need to explain every handler, if
>>> you
>>> show the way to the developer he/she could debug and find the needs)
>>>
>>> When I think about myself as an example, I have to write down the steps
>>> of
>>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>>> see that it is not easy even writing down the big picture of developer
>>> side.
>>>
>>>
>>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>>
>>>> Yago,
>>>>
>>>> My point - perhaps lost in too much text - was that Solr is presented -
>>>> and
>>>> can function - as a black-box. Which makes it different from more
>>>> traditional open-source project. So, the stage-2 happens exactly when
>>>> the
>>>> non-programmers have to cross the boundary from the black-box into
>>>> code-first approach and the hand-off is not particularly smooth. Or 
>>>> even
>>>> when - say - php or .Net programmer  tries to get beyond the basic
>>>> operations their client library and has the understand the server-side
>>>> aspects of Solr.
>>>>
>>>> Regards,
>>>>    Alex.
>>>>
>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>>> wrote:
>>>>
>>>> > Alexandre,
>>>> >
>>>> > You describe the normal path when a beginner try to use a source of >
>>>> > code
>>>> > that doesn't understand, black-box, reading code, hacking, ok now I >
>>>> > know
>>>> > 10% of the project, with lucky :p.
>>>> >
>>>>
>>>>
>>>> Personal blog: http://blog.outerthoughts.com/
>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>> - Time is the quality of nature that keeps events from happening all at
>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>> book)
>>>>
>>>
>>>
>>
> 


Re: Flow Chart of Solr

Posted by Jack Park <ja...@topicquests.org>.
Jack,

Is that new book up to the 4.+ series?

Thanks
The other Jack

On Wed, Apr 3, 2013 at 9:19 AM, Jack Krupansky <ja...@basetechnology.com> wrote:
> And another one on the way:
> http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957
>
> Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jack Park
> Sent: Wednesday, April 03, 2013 11:25 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
> There are three books on Solr, two with that in the title, and one,
> Taming Text, each of which have been very valuable in understanding
> Solr.
>
> Jack
>
> On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>>
>> Sure, yes. But... it comes down to what level of detail you want and need
>> for a specific task. In other words, there are probably a dozen or more
>> levels of detail. The reality is that if you are going to work at the Solr
>> code level, that is very, very different than being a "user" of Solr, and
>> at
>> that point your first step is to become familiar with the code itself.
>>
>> When you talk about "parsing" and "stemming", you are really talking about
>> the user-level, not the Solr code level. Maybe what you really need is a
>> cheat sheet that maps a user-visible feature to the main Solr code
>> component
>> for that implements that user feature.
>>
>> There are a number of different forms of "parsing" in Solr - parsing of
>> what? Queries? Requests? Solr documents? Function queries?
>>
>> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does
>> that.
>> Lucene does all of the "token filtering". Are you asking for details on
>> how
>> Lucene works? Maybe you meant to ask how "term analysis" works, which is
>> split between Solr and Lucene. Or maybe you simply wanted to know when and
>> where term analysis is done. Tell us your specific problem or specific
>> question and we can probably quickly give you an answer.
>>
>> In truth, NOBODY uses "flow charts" anymore. Sure, there are some
>> user-level
>> diagrams, but not down to the code level.
>>
>> If you could focus on specific questions, we could give you specific
>> answers.
>>
>> "Main steps"? That depends on what level you are working at. Tell us what
>> problem you are trying to solve and we can point you to the relevant
>> areas.
>>
>> In truth, if you become generally familiar with Solr at the user level
>> (study the wikis), you will already know what the "main steps" are.
>>
>> So, it is not "main steps of Solr", but main steps of some specific
>> "request" of Solr, and for a specified level of detail, and for a
>> specified
>> area of Solr if greater detail is needed. Be more specific, and then we
>> can
>> be more specific.
>>
>> For now, the general advice for people who need or want to go far beyond
>> the
>> user level is to "get familiar with the code" - just LOOK at it - a lot of
>> the package and class names are OBVIOUS, really, and follow the class
>> hierarchy and code flow using the standard features of any modern Java
>> IDE.
>> If you are wondering where to start for some specific user-level feature,
>> please ask specifically about that feature. But... make a diligent effort
>> to
>> discover and learn on your own before asking open-ended questions.
>>
>> Sure, there are lots of things in Lucene and Solr that are rather complex
>> and seemingly convoluted, and not obvious, but people are more than
>> willing
>> to help you out if you simply ask a specific question. I mean, not
>> everybody
>> needs to know the fine detail of query parsing, analysis, building a
>> Lucene-level stemmer, etc. If we tried to put all of that in a diagram,
>> most
>> people would be more confused than enlightened.
>>
>> At which step are scores calculated? That's more of a Lucene question. Or,
>> are you really asking what code in Solr invokes Lucene search methods that
>> calculate basic scores?
>>
>> In short, you need to be more specific. Don't force us to guess what
>> problem
>> you are trying to solve.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Furkan KAMACI
>> Sent: Wednesday, April 03, 2013 6:52 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Flow Chart of Solr
>>
>>
>> So, all in all, is there anybody who can write down just main steps of
>> Solr(including parsing, stemming etc.)?
>>
>>
>> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>>
>>> I think about myself as an example. I have started to make research about
>>> Solr just for some weeks. I have learned Solr and its related projects.
>>> My
>>> next step writing down the main steps Solr. We have separated learning
>>> curve of Solr into two main categories.
>>> First one is who are using it as out of the box components. Second one is
>>> developer side.
>>>
>>> Actually developer side branches into two way.
>>>
>>> First one is general steps of it. i.e. document comes into Solr (i.e.
>>> crawled data of Nutch). which analyzing processes are going to done
>>> (stamming, hamming etc.), what will be doing after parsing step by step.
>>> When a search query happens what happens step by step, at which step
>>> scores
>>> are calculated so on so forth.
>>> Second one is more code specific i.e. which handlers takes into account
>>> data that will going to be indexed(no need the explain every handler at
>>> this step) . Which are the analyzer, tokenizer classes and what are the
>>> flow between them. How response handlers works and what are they.
>>>
>>> Also explaining about cloud side is other work.
>>>
>>> Some of explanations are currently presents at wiki (but some of them are
>>> at very deep places at wiki and it is not easy to find the parent topic
>>> of
>>> it, maybe starting wiki from a top age and branching all other topics as
>>> possible as from it could be better)
>>>
>>> If we could show the big picture, and beside of it the smaller pictures
>>> within it, it would be great (if you know the main parts it will be easy
>>> to
>>> go deep into the code i.e. you don't need to explain every handler, if
>>> you
>>> show the way to the developer he/she could debug and find the needs)
>>>
>>> When I think about myself as an example, I have to write down the steps
>>> of
>>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>>> see that it is not easy even writing down the big picture of developer
>>> side.
>>>
>>>
>>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>>
>>>> Yago,
>>>>
>>>> My point - perhaps lost in too much text - was that Solr is presented -
>>>> and
>>>> can function - as a black-box. Which makes it different from more
>>>> traditional open-source project. So, the stage-2 happens exactly when
>>>> the
>>>> non-programmers have to cross the boundary from the black-box into
>>>> code-first approach and the hand-off is not particularly smooth. Or even
>>>> when - say - php or .Net programmer  tries to get beyond the basic
>>>> operations their client library and has the understand the server-side
>>>> aspects of Solr.
>>>>
>>>> Regards,
>>>>    Alex.
>>>>
>>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>>> wrote:
>>>>
>>>> > Alexandre,
>>>> >
>>>> > You describe the normal path when a beginner try to use a source of >
>>>> > code
>>>> > that doesn't understand, black-box, reading code, hacking, ok now I >
>>>> > know
>>>> > 10% of the project, with lucky :p.
>>>> >
>>>>
>>>>
>>>> Personal blog: http://blog.outerthoughts.com/
>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>> - Time is the quality of nature that keeps events from happening all at
>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>>>> book)
>>>>
>>>
>>>
>>
>

Re: Flow Chart of Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
And another one on the way:
http://www.amazon.com/Lucene-Solr-Definitive-comprehensive-realtime/dp/1449359957

Hopefully that help a lot as well. Plenty of diagrams. Lots of examples.

-- Jack Krupansky

-----Original Message----- 
From: Jack Park
Sent: Wednesday, April 03, 2013 11:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
> Sure, yes. But... it comes down to what level of detail you want and need
> for a specific task. In other words, there are probably a dozen or more
> levels of detail. The reality is that if you are going to work at the Solr
> code level, that is very, very different than being a "user" of Solr, and 
> at
> that point your first step is to become familiar with the code itself.
>
> When you talk about "parsing" and "stemming", you are really talking about
> the user-level, not the Solr code level. Maybe what you really need is a
> cheat sheet that maps a user-visible feature to the main Solr code 
> component
> for that implements that user feature.
>
> There are a number of different forms of "parsing" in Solr - parsing of
> what? Queries? Requests? Solr documents? Function queries?
>
> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does 
> that.
> Lucene does all of the "token filtering". Are you asking for details on 
> how
> Lucene works? Maybe you meant to ask how "term analysis" works, which is
> split between Solr and Lucene. Or maybe you simply wanted to know when and
> where term analysis is done. Tell us your specific problem or specific
> question and we can probably quickly give you an answer.
>
> In truth, NOBODY uses "flow charts" anymore. Sure, there are some 
> user-level
> diagrams, but not down to the code level.
>
> If you could focus on specific questions, we could give you specific
> answers.
>
> "Main steps"? That depends on what level you are working at. Tell us what
> problem you are trying to solve and we can point you to the relevant 
> areas.
>
> In truth, if you become generally familiar with Solr at the user level
> (study the wikis), you will already know what the "main steps" are.
>
> So, it is not "main steps of Solr", but main steps of some specific
> "request" of Solr, and for a specified level of detail, and for a 
> specified
> area of Solr if greater detail is needed. Be more specific, and then we 
> can
> be more specific.
>
> For now, the general advice for people who need or want to go far beyond 
> the
> user level is to "get familiar with the code" - just LOOK at it - a lot of
> the package and class names are OBVIOUS, really, and follow the class
> hierarchy and code flow using the standard features of any modern Java 
> IDE.
> If you are wondering where to start for some specific user-level feature,
> please ask specifically about that feature. But... make a diligent effort 
> to
> discover and learn on your own before asking open-ended questions.
>
> Sure, there are lots of things in Lucene and Solr that are rather complex
> and seemingly convoluted, and not obvious, but people are more than 
> willing
> to help you out if you simply ask a specific question. I mean, not 
> everybody
> needs to know the fine detail of query parsing, analysis, building a
> Lucene-level stemmer, etc. If we tried to put all of that in a diagram, 
> most
> people would be more confused than enlightened.
>
> At which step are scores calculated? That's more of a Lucene question. Or,
> are you really asking what code in Solr invokes Lucene search methods that
> calculate basic scores?
>
> In short, you need to be more specific. Don't force us to guess what 
> problem
> you are trying to solve.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI
> Sent: Wednesday, April 03, 2013 6:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
>
> So, all in all, is there anybody who can write down just main steps of
> Solr(including parsing, stemming etc.)?
>
>
> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>
>> I think about myself as an example. I have started to make research about
>> Solr just for some weeks. I have learned Solr and its related projects. 
>> My
>> next step writing down the main steps Solr. We have separated learning
>> curve of Solr into two main categories.
>> First one is who are using it as out of the box components. Second one is
>> developer side.
>>
>> Actually developer side branches into two way.
>>
>> First one is general steps of it. i.e. document comes into Solr (i.e.
>> crawled data of Nutch). which analyzing processes are going to done
>> (stamming, hamming etc.), what will be doing after parsing step by step.
>> When a search query happens what happens step by step, at which step
>> scores
>> are calculated so on so forth.
>> Second one is more code specific i.e. which handlers takes into account
>> data that will going to be indexed(no need the explain every handler at
>> this step) . Which are the analyzer, tokenizer classes and what are the
>> flow between them. How response handlers works and what are they.
>>
>> Also explaining about cloud side is other work.
>>
>> Some of explanations are currently presents at wiki (but some of them are
>> at very deep places at wiki and it is not easy to find the parent topic 
>> of
>> it, maybe starting wiki from a top age and branching all other topics as
>> possible as from it could be better)
>>
>> If we could show the big picture, and beside of it the smaller pictures
>> within it, it would be great (if you know the main parts it will be easy
>> to
>> go deep into the code i.e. you don't need to explain every handler, if 
>> you
>> show the way to the developer he/she could debug and find the needs)
>>
>> When I think about myself as an example, I have to write down the steps 
>> of
>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>> see that it is not easy even writing down the big picture of developer
>> side.
>>
>>
>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>
>>> Yago,
>>>
>>> My point - perhaps lost in too much text - was that Solr is presented -
>>> and
>>> can function - as a black-box. Which makes it different from more
>>> traditional open-source project. So, the stage-2 happens exactly when 
>>> the
>>> non-programmers have to cross the boundary from the black-box into
>>> code-first approach and the hand-off is not particularly smooth. Or even
>>> when - say - php or .Net programmer  tries to get beyond the basic
>>> operations their client library and has the understand the server-side
>>> aspects of Solr.
>>>
>>> Regards,
>>>    Alex.
>>>
>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>> wrote:
>>>
>>> > Alexandre,
>>> >
>>> > You describe the normal path when a beginner try to use a source of >
>>> > code
>>> > that doesn't understand, black-box, reading code, hacking, ok now I >
>>> > know
>>> > 10% of the project, with lucky :p.
>>> >
>>>
>>>
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all at
>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
>>> book)
>>>
>>
>>
> 


Re: Flow Chart of Solr

Posted by Jack Park <ja...@topicquests.org>.
There are three books on Solr, two with that in the title, and one,
Taming Text, each of which have been very valuable in understanding
Solr.

Jack

On Wed, Apr 3, 2013 at 5:25 AM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Sure, yes. But... it comes down to what level of detail you want and need
> for a specific task. In other words, there are probably a dozen or more
> levels of detail. The reality is that if you are going to work at the Solr
> code level, that is very, very different than being a "user" of Solr, and at
> that point your first step is to become familiar with the code itself.
>
> When you talk about "parsing" and "stemming", you are really talking about
> the user-level, not the Solr code level. Maybe what you really need is a
> cheat sheet that maps a user-visible feature to the main Solr code component
> for that implements that user feature.
>
> There are a number of different forms of "parsing" in Solr - parsing of
> what? Queries? Requests? Solr documents? Function queries?
>
> Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does that.
> Lucene does all of the "token filtering". Are you asking for details on how
> Lucene works? Maybe you meant to ask how "term analysis" works, which is
> split between Solr and Lucene. Or maybe you simply wanted to know when and
> where term analysis is done. Tell us your specific problem or specific
> question and we can probably quickly give you an answer.
>
> In truth, NOBODY uses "flow charts" anymore. Sure, there are some user-level
> diagrams, but not down to the code level.
>
> If you could focus on specific questions, we could give you specific
> answers.
>
> "Main steps"? That depends on what level you are working at. Tell us what
> problem you are trying to solve and we can point you to the relevant areas.
>
> In truth, if you become generally familiar with Solr at the user level
> (study the wikis), you will already know what the "main steps" are.
>
> So, it is not "main steps of Solr", but main steps of some specific
> "request" of Solr, and for a specified level of detail, and for a specified
> area of Solr if greater detail is needed. Be more specific, and then we can
> be more specific.
>
> For now, the general advice for people who need or want to go far beyond the
> user level is to "get familiar with the code" - just LOOK at it - a lot of
> the package and class names are OBVIOUS, really, and follow the class
> hierarchy and code flow using the standard features of any modern Java IDE.
> If you are wondering where to start for some specific user-level feature,
> please ask specifically about that feature. But... make a diligent effort to
> discover and learn on your own before asking open-ended questions.
>
> Sure, there are lots of things in Lucene and Solr that are rather complex
> and seemingly convoluted, and not obvious, but people are more than willing
> to help you out if you simply ask a specific question. I mean, not everybody
> needs to know the fine detail of query parsing, analysis, building a
> Lucene-level stemmer, etc. If we tried to put all of that in a diagram, most
> people would be more confused than enlightened.
>
> At which step are scores calculated? That's more of a Lucene question. Or,
> are you really asking what code in Solr invokes Lucene search methods that
> calculate basic scores?
>
> In short, you need to be more specific. Don't force us to guess what problem
> you are trying to solve.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Furkan KAMACI
> Sent: Wednesday, April 03, 2013 6:52 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Flow Chart of Solr
>
>
> So, all in all, is there anybody who can write down just main steps of
> Solr(including parsing, stemming etc.)?
>
>
> 2013/4/2 Furkan KAMACI <fu...@gmail.com>
>
>> I think about myself as an example. I have started to make research about
>> Solr just for some weeks. I have learned Solr and its related projects. My
>> next step writing down the main steps Solr. We have separated learning
>> curve of Solr into two main categories.
>> First one is who are using it as out of the box components. Second one is
>> developer side.
>>
>> Actually developer side branches into two way.
>>
>> First one is general steps of it. i.e. document comes into Solr (i.e.
>> crawled data of Nutch). which analyzing processes are going to done
>> (stamming, hamming etc.), what will be doing after parsing step by step.
>> When a search query happens what happens step by step, at which step
>> scores
>> are calculated so on so forth.
>> Second one is more code specific i.e. which handlers takes into account
>> data that will going to be indexed(no need the explain every handler at
>> this step) . Which are the analyzer, tokenizer classes and what are the
>> flow between them. How response handlers works and what are they.
>>
>> Also explaining about cloud side is other work.
>>
>> Some of explanations are currently presents at wiki (but some of them are
>> at very deep places at wiki and it is not easy to find the parent topic of
>> it, maybe starting wiki from a top age and branching all other topics as
>> possible as from it could be better)
>>
>> If we could show the big picture, and beside of it the smaller pictures
>> within it, it would be great (if you know the main parts it will be easy
>> to
>> go deep into the code i.e. you don't need to explain every handler, if you
>> show the way to the developer he/she could debug and find the needs)
>>
>> When I think about myself as an example, I have to write down the steps of
>> Solr a bit detail  even I read many pages at wiki and a book about it, I
>> see that it is not easy even writing down the big picture of developer
>> side.
>>
>>
>> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>>
>>> Yago,
>>>
>>> My point - perhaps lost in too much text - was that Solr is presented -
>>> and
>>> can function - as a black-box. Which makes it different from more
>>> traditional open-source project. So, the stage-2 happens exactly when the
>>> non-programmers have to cross the boundary from the black-box into
>>> code-first approach and the hand-off is not particularly smooth. Or even
>>> when - say - php or .Net programmer  tries to get beyond the basic
>>> operations their client library and has the understand the server-side
>>> aspects of Solr.
>>>
>>> Regards,
>>>    Alex.
>>>
>>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>>> wrote:
>>>
>>> > Alexandre,
>>> >
>>> > You describe the normal path when a beginner try to use a source of >
>>> > code
>>> > that doesn't understand, black-box, reading code, hacking, ok now I >
>>> > know
>>> > 10% of the project, with lucky :p.
>>> >
>>>
>>>
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>> - Time is the quality of nature that keeps events from happening all at
>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>>
>>
>>
>

Re: Flow Chart of Solr

Posted by Jack Krupansky <ja...@basetechnology.com>.
Sure, yes. But... it comes down to what level of detail you want and need 
for a specific task. In other words, there are probably a dozen or more 
levels of detail. The reality is that if you are going to work at the Solr 
code level, that is very, very different than being a "user" of Solr, and at 
that point your first step is to become familiar with the code itself.

When you talk about "parsing" and "stemming", you are really talking about 
the user-level, not the Solr code level. Maybe what you really need is a 
cheat sheet that maps a user-visible feature to the main Solr code component 
for that implements that user feature.

There are a number of different forms of "parsing" in Solr - parsing of 
what? Queries? Requests? Solr documents? Function queries?

Stemming? Well, in truth, Solr doesn't even do stemming - Lucene does that. 
Lucene does all of the "token filtering". Are you asking for details on how 
Lucene works? Maybe you meant to ask how "term analysis" works, which is 
split between Solr and Lucene. Or maybe you simply wanted to know when and 
where term analysis is done. Tell us your specific problem or specific 
question and we can probably quickly give you an answer.

In truth, NOBODY uses "flow charts" anymore. Sure, there are some user-level 
diagrams, but not down to the code level.

If you could focus on specific questions, we could give you specific 
answers.

"Main steps"? That depends on what level you are working at. Tell us what 
problem you are trying to solve and we can point you to the relevant areas.

In truth, if you become generally familiar with Solr at the user level 
(study the wikis), you will already know what the "main steps" are.

So, it is not "main steps of Solr", but main steps of some specific 
"request" of Solr, and for a specified level of detail, and for a specified 
area of Solr if greater detail is needed. Be more specific, and then we can 
be more specific.

For now, the general advice for people who need or want to go far beyond the 
user level is to "get familiar with the code" - just LOOK at it - a lot of 
the package and class names are OBVIOUS, really, and follow the class 
hierarchy and code flow using the standard features of any modern Java IDE. 
If you are wondering where to start for some specific user-level feature, 
please ask specifically about that feature. But... make a diligent effort to 
discover and learn on your own before asking open-ended questions.

Sure, there are lots of things in Lucene and Solr that are rather complex 
and seemingly convoluted, and not obvious, but people are more than willing 
to help you out if you simply ask a specific question. I mean, not everybody 
needs to know the fine detail of query parsing, analysis, building a 
Lucene-level stemmer, etc. If we tried to put all of that in a diagram, most 
people would be more confused than enlightened.

At which step are scores calculated? That's more of a Lucene question. Or, 
are you really asking what code in Solr invokes Lucene search methods that 
calculate basic scores?

In short, you need to be more specific. Don't force us to guess what problem 
you are trying to solve.

-- Jack Krupansky

-----Original Message----- 
From: Furkan KAMACI
Sent: Wednesday, April 03, 2013 6:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Flow Chart of Solr

So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI <fu...@gmail.com>

> I think about myself as an example. I have started to make research about
> Solr just for some weeks. I have learned Solr and its related projects. My
> next step writing down the main steps Solr. We have separated learning
> curve of Solr into two main categories.
> First one is who are using it as out of the box components. Second one is
> developer side.
>
> Actually developer side branches into two way.
>
> First one is general steps of it. i.e. document comes into Solr (i.e.
> crawled data of Nutch). which analyzing processes are going to done
> (stamming, hamming etc.), what will be doing after parsing step by step.
> When a search query happens what happens step by step, at which step 
> scores
> are calculated so on so forth.
> Second one is more code specific i.e. which handlers takes into account
> data that will going to be indexed(no need the explain every handler at
> this step) . Which are the analyzer, tokenizer classes and what are the
> flow between them. How response handlers works and what are they.
>
> Also explaining about cloud side is other work.
>
> Some of explanations are currently presents at wiki (but some of them are
> at very deep places at wiki and it is not easy to find the parent topic of
> it, maybe starting wiki from a top age and branching all other topics as
> possible as from it could be better)
>
> If we could show the big picture, and beside of it the smaller pictures
> within it, it would be great (if you know the main parts it will be easy 
> to
> go deep into the code i.e. you don't need to explain every handler, if you
> show the way to the developer he/she could debug and find the needs)
>
> When I think about myself as an example, I have to write down the steps of
> Solr a bit detail  even I read many pages at wiki and a book about it, I
> see that it is not easy even writing down the big picture of developer 
> side.
>
>
> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>
>> Yago,
>>
>> My point - perhaps lost in too much text - was that Solr is presented -
>> and
>> can function - as a black-box. Which makes it different from more
>> traditional open-source project. So, the stage-2 happens exactly when the
>> non-programmers have to cross the boundary from the black-box into
>> code-first approach and the hand-off is not particularly smooth. Or even
>> when - say - php or .Net programmer  tries to get beyond the basic
>> operations their client library and has the understand the server-side
>> aspects of Solr.
>>
>> Regards,
>>    Alex.
>>
>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>> wrote:
>>
>> > Alexandre,
>> >
>> > You describe the normal path when a beginner try to use a source of 
>> > code
>> > that doesn't understand, black-box, reading code, hacking, ok now I 
>> > know
>> > 10% of the project, with lucky :p.
>> >
>>
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>
>
> 


Re: Flow Chart of Solr

Posted by Furkan KAMACI <fu...@gmail.com>.
So, all in all, is there anybody who can write down just main steps of
Solr(including parsing, stemming etc.)?


2013/4/2 Furkan KAMACI <fu...@gmail.com>

> I think about myself as an example. I have started to make research about
> Solr just for some weeks. I have learned Solr and its related projects. My
> next step writing down the main steps Solr. We have separated learning
> curve of Solr into two main categories.
> First one is who are using it as out of the box components. Second one is
> developer side.
>
> Actually developer side branches into two way.
>
> First one is general steps of it. i.e. document comes into Solr (i.e.
> crawled data of Nutch). which analyzing processes are going to done
> (stamming, hamming etc.), what will be doing after parsing step by step.
> When a search query happens what happens step by step, at which step scores
> are calculated so on so forth.
> Second one is more code specific i.e. which handlers takes into account
> data that will going to be indexed(no need the explain every handler at
> this step) . Which are the analyzer, tokenizer classes and what are the
> flow between them. How response handlers works and what are they.
>
> Also explaining about cloud side is other work.
>
> Some of explanations are currently presents at wiki (but some of them are
> at very deep places at wiki and it is not easy to find the parent topic of
> it, maybe starting wiki from a top age and branching all other topics as
> possible as from it could be better)
>
> If we could show the big picture, and beside of it the smaller pictures
> within it, it would be great (if you know the main parts it will be easy to
> go deep into the code i.e. you don't need to explain every handler, if you
> show the way to the developer he/she could debug and find the needs)
>
> When I think about myself as an example, I have to write down the steps of
> Solr a bit detail  even I read many pages at wiki and a book about it, I
> see that it is not easy even writing down the big picture of developer side.
>
>
> 2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>
>
>> Yago,
>>
>> My point - perhaps lost in too much text - was that Solr is presented -
>> and
>> can function - as a black-box. Which makes it different from more
>> traditional open-source project. So, the stage-2 happens exactly when the
>> non-programmers have to cross the boundary from the black-box into
>> code-first approach and the hand-off is not particularly smooth. Or even
>> when - say - php or .Net programmer  tries to get beyond the basic
>> operations their client library and has the understand the server-side
>> aspects of Solr.
>>
>> Regards,
>>    Alex.
>>
>> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
>> wrote:
>>
>> > Alexandre,
>> >
>> > You describe the normal path when a beginner try to use a source of code
>> > that doesn't understand, black-box, reading code, hacking, ok now I know
>> > 10% of the project, with lucky :p.
>> >
>>
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all at
>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>
>
>

Re: Flow Chart of Solr

Posted by Furkan KAMACI <fu...@gmail.com>.
I think about myself as an example. I have started to make research about
Solr just for some weeks. I have learned Solr and its related projects. My
next step writing down the main steps Solr. We have separated learning
curve of Solr into two main categories.
First one is who are using it as out of the box components. Second one is
developer side.

Actually developer side branches into two way.

First one is general steps of it. i.e. document comes into Solr (i.e.
crawled data of Nutch). which analyzing processes are going to done
(stamming, hamming etc.), what will be doing after parsing step by step.
When a search query happens what happens step by step, at which step scores
are calculated so on so forth.
Second one is more code specific i.e. which handlers takes into account
data that will going to be indexed(no need the explain every handler at
this step) . Which are the analyzer, tokenizer classes and what are the
flow between them. How response handlers works and what are they.

Also explaining about cloud side is other work.

Some of explanations are currently presents at wiki (but some of them are
at very deep places at wiki and it is not easy to find the parent topic of
it, maybe starting wiki from a top age and branching all other topics as
possible as from it could be better)

If we could show the big picture, and beside of it the smaller pictures
within it, it would be great (if you know the main parts it will be easy to
go deep into the code i.e. you don't need to explain every handler, if you
show the way to the developer he/she could debug and find the needs)

When I think about myself as an example, I have to write down the steps of
Solr a bit detail  even I read many pages at wiki and a book about it, I
see that it is not easy even writing down the big picture of developer side.


2013/4/2 Alexandre Rafalovitch <ar...@gmail.com>

> Yago,
>
> My point - perhaps lost in too much text - was that Solr is presented - and
> can function - as a black-box. Which makes it different from more
> traditional open-source project. So, the stage-2 happens exactly when the
> non-programmers have to cross the boundary from the black-box into
> code-first approach and the hand-off is not particularly smooth. Or even
> when - say - php or .Net programmer  tries to get beyond the basic
> operations their client library and has the understand the server-side
> aspects of Solr.
>
> Regards,
>    Alex.
>
> On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com>
> wrote:
>
> > Alexandre,
> >
> > You describe the normal path when a beginner try to use a source of code
> > that doesn't understand, black-box, reading code, hacking, ok now I know
> > 10% of the project, with lucky :p.
> >
>
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>

Re: Flow Chart of Solr

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Yago,

My point - perhaps lost in too much text - was that Solr is presented - and
can function - as a black-box. Which makes it different from more
traditional open-source project. So, the stage-2 happens exactly when the
non-programmers have to cross the boundary from the black-box into
code-first approach and the hand-off is not particularly smooth. Or even
when - say - php or .Net programmer  tries to get beyond the basic
operations their client library and has the understand the server-side
aspects of Solr.

Regards,
   Alex.

On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro <ya...@gmail.com> wrote:

> Alexandre,
>
> You describe the normal path when a beginner try to use a source of code
> that doesn't understand, black-box, reading code, hacking, ok now I know
> 10% of the project, with lucky :p.
>


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Flow Chart of Solr

Posted by Yago Riveiro <ya...@gmail.com>.
Alexandre,   

You describe the normal path when a beginner try to use a source of code that doesn't understand, black-box, reading code, hacking, ok now I know 10% of the project, with lucky :p.

First at all, the Solr community is fantastic and always helps when I need it. IMHO the devel documentation is dispersed in a lot of sources, blogs, wiki, lucidWorks wiki (I know that this wiki was donated to apache and it's in progress to present to the world as part of the project).

The curve for do funny thing with Solr at source level is hard, I see a lot of webinars teaching how deploy and use solr, but not how developing a ResponseWriter or a SearchComponent.

Unfortunately I don't have the knowledge to contribute right, in the future … will see.

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 5:24 PM, Alexandre Rafalovitch wrote:

> ommunity. I am trying to do my share throu  


Re: Flow Chart of Solr

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
I think there is a gap in the support of one's path of learning Solr . I'll
try to describe it based on my own experience. Hopefully, it is helpful.

At First, there is a "Solr is a blackbox" stage, where the person may not
know Java and is just using out of the box components. Wiki is reasonably
helpful there and there are other resources (blogs, etc). At this point,
Lucene is a black box within the black box and is something that is safely
ignored.

At the second stage, one hits the period where he/she understands what is
going on in their basic scenario and is trying to get into more advanced
case. This could be putting together a complex analyzer chain, trying to
use Update Request Processors or optimizing slow/OOM imports or doing
complex queries. Suddenly, they are pointed directly at Javadocs and have
to figure out the way around Java-based instructions. A Java programmer can
bridge that gap and get over the curve, but I suspect others get lost very
quickly and get stuck even when they don't need to be good programmers. An
example in my mind would be something like RegexReplaceProcessor. One has
to climb up and down the inheritance chain of the Javadoc to figure out
what can be done and what the parameters are. And the parameters syntax is
Java regular expressions rather than something used in copyField, so they
need to jump over and figure that out. So, it is fairly hard to envisage
those pieces and how they can combine together. Similarly, some of the
stuff is described in Jira requests, but also in a way that requires a
programmer's mind-set to parse it out. I think a lot of people drop out at
this stage and fall-back to 'black-box' view of Solr. Most of the questions
I see on Stack Overflow are conceptual troubles at this stage.

And then, those who get to the third stage, jump to the advanced level
where one could just read the source code to figure out what is going on. I
found www.grepcode.com to be useful (though it is quite slow now and is a
bit behind for Solr). Somewhere around here, one also starts to realize the
fuzzy relation between the Lucene and Solr code and becomes somewhat
clearer what Solr's benefits actually are (as opposed to bare Lucene's).
This also generates its own frustration and confusion of course, because
suddenly one starts to wish for Lucene's features that Solr does not use
(e.g. split/sync analyzer chains, some alternative facet implementation
features, etc).

And finally (at the end of the beginning....), you become the contributor
and become very familiar with subversion/ant/etc. Though, I suspect, the
contributors become more specialized and actually understand less about
other parts of the system (e.g. Is anyone still fully understanding DIH?).

I am not blaming anyone with this story for the lack of support. I think
Solr is - in many ways - better documented than many other open source
projects. And the new manual being contributed to replace Wiki will (soon?)
make this even better. And, of course, this mailing list
is indescribably awesome. I am just trying to provide a fresh view of what
I went through and where I see people getting stuck.

I think a bit more effort in documenting that second stage would bring more
people to the community. I am trying to do my share through Wiki updates,
questions here, Jira issues, my upcoming book and some other little things.
I see others do the same. Perhaps, the diagram is something that we should
explicitly try to do. Though, I think it would be more fun to do it as a
Scrollorama Inception Explained style (
http://www.inception-explained.com/). :-)

Regards,
   Alex.


Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Tue, Apr 2, 2013 at 11:22 AM, Furkan KAMACI <fu...@gmail.com>wrote:

> You are right about mentioning developer doc and user doc. Users separate
> about it. Some of them uses Solr for indexing and monitoring via admin face
> and that is quietly enough for them however some people wants to modify it
> so it would be nice if there had been some documentation for developer side
> too.
>
>
> 2013/4/2 Yago Riveiro <ya...@gmail.com>
>
> > For beginners is complicate understand the complexity of solr / lucene,
> > I'm trying devel a custom search component and it's too hard keep in mind
> > the flow, inheritance and iteration between classes. I think that there
> is
> > a gap between software doc and user doc, or maybe I don't search enough
> > T_T. Java doc not always is clear always.
> >
> > The fact that I'm beginner in solr world don't help.
> >
> > Either way, this thread was very helpful, I found some very good
> resources
> > here :)
> >
> > Cumprimentos
> >
> > --
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
> >
> > > Actually maybe one the most important core thing is that Analysis part
> at
> > > last diagram but there is nothing about it i.e. stamming, lemmitazing
> > etc.
> > > at any of them.
> > >
> > >
> > > 2013/4/2 Andre Bois-Crettez <andre.bois@kelkoo.com (mailto:
> > andre.bois@kelkoo.com)>
> > >
> > > >
> > > > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> > > >
> > > > > (13/04/02 21:45), Furkan KAMACI wrote:
> > > > >
> > > > > > Is there any documentation something like flow chart of Solr.
> i.e.
> > > > > > Documents comes into Solr(maybe indicating which classes get
> > documents)
> > > > > > and
> > > > > > goes to parsing process (i.e. stemming processes etc.) and then
> > reverse
> > > > > > indexes are get so on so forth?
> > > > > >
> > > > > > There is an interesting ticket:
> > > > >
> > > > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > > > https://issues.apache.org/**jira/browse/LUCENE-2412<
> > https://issues.apache.org/jira/browse/LUCENE-2412>
> > > > >
> > > > > koji
> > > >
> > > > I like this one, it is a bit more detailed :
> > > >
> > > > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<
> > http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
> > > >
> > > > --
> > > > André Bois-Crettez
> > > >
> > > > Search technology, Kelkoo
> > > > http://www.kelkoo.com/
> > > >
> > > >
> > > > Kelkoo SAS
> > > > Société par Actions Simplifiée
> > > > Au capital de € 4.168.964,30
> > > > Siège social : 8, rue du Sentier 75002 Paris
> > > > 425 093 069 RCS Paris
> > > >
> > > > Ce message et les pièces jointes sont confidentiels et établis à
> > > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > > > destinataire de ce message, merci de le détruire et d'en avertir
> > > > l'expéditeur.
> > > >
> > >
> > >
> > >
> >
> >
> >
>

Re: Flow Chart of Solr

Posted by Furkan KAMACI <fu...@gmail.com>.
You are right about mentioning developer doc and user doc. Users separate
about it. Some of them uses Solr for indexing and monitoring via admin face
and that is quietly enough for them however some people wants to modify it
so it would be nice if there had been some documentation for developer side
too.


2013/4/2 Yago Riveiro <ya...@gmail.com>

> For beginners is complicate understand the complexity of solr / lucene,
> I'm trying devel a custom search component and it's too hard keep in mind
> the flow, inheritance and iteration between classes. I think that there is
> a gap between software doc and user doc, or maybe I don't search enough
> T_T. Java doc not always is clear always.
>
> The fact that I'm beginner in solr world don't help.
>
> Either way, this thread was very helpful, I found some very good resources
> here :)
>
> Cumprimentos
>
> --
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:
>
> > Actually maybe one the most important core thing is that Analysis part at
> > last diagram but there is nothing about it i.e. stamming, lemmitazing
> etc.
> > at any of them.
> >
> >
> > 2013/4/2 Andre Bois-Crettez <andre.bois@kelkoo.com (mailto:
> andre.bois@kelkoo.com)>
> >
> > >
> > > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> > >
> > > > (13/04/02 21:45), Furkan KAMACI wrote:
> > > >
> > > > > Is there any documentation something like flow chart of Solr. i.e.
> > > > > Documents comes into Solr(maybe indicating which classes get
> documents)
> > > > > and
> > > > > goes to parsing process (i.e. stemming processes etc.) and then
> reverse
> > > > > indexes are get so on so forth?
> > > > >
> > > > > There is an interesting ticket:
> > > >
> > > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > > https://issues.apache.org/**jira/browse/LUCENE-2412<
> https://issues.apache.org/jira/browse/LUCENE-2412>
> > > >
> > > > koji
> > >
> > > I like this one, it is a bit more detailed :
> > >
> > > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<
> http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
> > >
> > > --
> > > André Bois-Crettez
> > >
> > > Search technology, Kelkoo
> > > http://www.kelkoo.com/
> > >
> > >
> > > Kelkoo SAS
> > > Société par Actions Simplifiée
> > > Au capital de € 4.168.964,30
> > > Siège social : 8, rue du Sentier 75002 Paris
> > > 425 093 069 RCS Paris
> > >
> > > Ce message et les pièces jointes sont confidentiels et établis à
> > > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > > destinataire de ce message, merci de le détruire et d'en avertir
> > > l'expéditeur.
> > >
> >
> >
> >
>
>
>

Re: Flow Chart of Solr

Posted by Yago Riveiro <ya...@gmail.com>.
For beginners is complicate understand the complexity of solr / lucene, I'm trying devel a custom search component and it's too hard keep in mind the flow, inheritance and iteration between classes. I think that there is a gap between software doc and user doc, or maybe I don't search enough T_T. Java doc not always is clear always.  

The fact that I'm beginner in solr world don't help.

Either way, this thread was very helpful, I found some very good resources here :)   

Cumprimentos

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote:

> Actually maybe one the most important core thing is that Analysis part at
> last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
> at any of them.
>  
>  
> 2013/4/2 Andre Bois-Crettez <andre.bois@kelkoo.com (mailto:andre.bois@kelkoo.com)>
>  
> >  
> > On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> >  
> > > (13/04/02 21:45), Furkan KAMACI wrote:
> > >  
> > > > Is there any documentation something like flow chart of Solr. i.e.
> > > > Documents comes into Solr(maybe indicating which classes get documents)
> > > > and
> > > > goes to parsing process (i.e. stemming processes etc.) and then reverse
> > > > indexes are get so on so forth?
> > > >  
> > > > There is an interesting ticket:
> > >  
> > > Architecture Diagrams needed for Lucene, Solr and Nutch
> > > https://issues.apache.org/**jira/browse/LUCENE-2412<https://issues.apache.org/jira/browse/LUCENE-2412>
> > >  
> > > koji
> >  
> > I like this one, it is a bit more detailed :
> >  
> > http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
> >  
> > --
> > André Bois-Crettez
> >  
> > Search technology, Kelkoo
> > http://www.kelkoo.com/
> >  
> >  
> > Kelkoo SAS
> > Société par Actions Simplifiée
> > Au capital de € 4.168.964,30
> > Siège social : 8, rue du Sentier 75002 Paris
> > 425 093 069 RCS Paris
> >  
> > Ce message et les pièces jointes sont confidentiels et établis à
> > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> > destinataire de ce message, merci de le détruire et d'en avertir
> > l'expéditeur.
> >  
>  
>  
>  



Re: Flow Chart of Solr

Posted by Furkan KAMACI <fu...@gmail.com>.
Actually maybe one the most important core thing is that Analysis part at
last diagram but there is nothing about it i.e. stamming, lemmitazing etc.
at any of them.


2013/4/2 Andre Bois-Crettez <an...@kelkoo.com>

>
> On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
>
>> (13/04/02 21:45), Furkan KAMACI wrote:
>>
>>> Is there any documentation something like flow chart of Solr. i.e.
>>> Documents comes into Solr(maybe indicating which classes get documents)
>>> and
>>> goes to parsing process (i.e. stemming processes etc.) and then reverse
>>> indexes are get so on so forth?
>>>
>>>  There is an interesting ticket:
>>
>> Architecture Diagrams needed for Lucene, Solr and Nutch
>> https://issues.apache.org/**jira/browse/LUCENE-2412<https://issues.apache.org/jira/browse/LUCENE-2412>
>>
>> koji
>>
>
> I like this one, it is a bit more detailed :
>
> http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/<http://www.cominvent.com/2011/04/04/solr-architecture-diagram/>
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Re: Flow Chart of Solr

Posted by Andre Bois-Crettez <an...@kelkoo.com>.
On 04/02/2013 04:20 PM, Koji Sekiguchi wrote:
> (13/04/02 21:45), Furkan KAMACI wrote:
>> Is there any documentation something like flow chart of Solr. i.e.
>> Documents comes into Solr(maybe indicating which classes get documents) and
>> goes to parsing process (i.e. stemming processes etc.) and then reverse
>> indexes are get so on so forth?
>>
> There is an interesting ticket:
>
> Architecture Diagrams needed for Lucene, Solr and Nutch
> https://issues.apache.org/jira/browse/LUCENE-2412
>
> koji

I like this one, it is a bit more detailed :

http://www.cominvent.com/2011/04/04/solr-architecture-diagram/

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: Flow Chart of Solr

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(13/04/02 21:45), Furkan KAMACI wrote:
> Is there any documentation something like flow chart of Solr. i.e.
> Documents comes into Solr(maybe indicating which classes get documents) and
> goes to parsing process (i.e. stemming processes etc.) and then reverse
> indexes are get so on so forth?
>

There is an interesting ticket:

Architecture Diagrams needed for Lucene, Solr and Nutch
https://issues.apache.org/jira/browse/LUCENE-2412

koji
-- 
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html