You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/12/09 14:30:22 UTC

Namespaces in response (SOLR-1586)

In SOLR-1586, the proposed patch introduces the concept that a Solr response can declare a namespace for part of the response (in this case, it is using the tags defined by georss.org to specify a point, etc.).  I'm not sure what to make of this.  My gut reaction says no, but I'm not a namespace expert and I also don't feel strongly about it.

Discussion points:
1. If there are standard namespaces, then people can use them to do fun XML things
2. If we allow them, we get all of the other benefits of namespaces...
3. The indexing side doesn't support them, so it seems odd to put in something like <field name="point">55.3 27.9</field> and get back <georss:point name="point"> 55.3 27.9</georss:point>.  At the same time, it seems equally weird to get back <str name="point">...</str> when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of.
4. If we let in other namespaces, we then are opening ourselves to longer responses, etc.  It is also likely the case that there isn't just one standard.  This likely could mean slower responses, etc.
5. If people wanted them, they could just do XSLT, but that is an extra step too.

An alternative is that we could refactor things a bit and allow the FieldType to specify the tag name instead of it being hardcoded in the writers.  This way people writing FieldTypes could define them.  For instance, we could have FieldType.getTagName() that could be overridden and clients could have tools for introspecting this.

I'm not sure what effect any of this would have on downstream clients, either.

Thoughts?

-Grant

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Yonik,

>> Using standards enables standard tool development.
> 
> We do use standards... lots of them :-)  Let's be a bit more specific
> though - I was asking about using a namespace for the point type by
> *default*, and in isolation (i.e. the rest of solr xml isn't
> namespaced), and if/how that made things easier?

Let's ask a different question -- how does it make things harder?

> At first blush it
> doesn't really seem to since any tool would need to deal with the Solr
> XML response in general.

I've got use cases where folks writing APIs in Javascript/Ajax are querying
SOLR (as a REST-ful web service) and elements of the response are being
dropped into a web page via DHTML. Having the ability to drop tags that
include namespaces helps out those folks because they want to have:

(a) expected representations using standards they like (GeoRSS is on the
list).

(b) understanding of the elements they are dropping in (i.e., there is one
use case where separately, after dropping in the georss:point tag, the tag
definition (e.g., via the namespace at:
http://www.w3.org/2003/01/geo/wgs84_pos#) is looked up and displayed.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Dec 9, 2009 at 2:02 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
>> What GIS tool could deal with a Solr XML response format w/o any other
>> knowledge of everything else in the response?
>> Are there some real use cases that using a namespace vs not for point
>> make easier (an honest question... I don't know much about GIS stuff).
>
> Using standards enables standard tool development.

We do use standards... lots of them :-)  Let's be a bit more specific
though - I was asking about using a namespace for the point type by
*default*, and in isolation (i.e. the rest of solr xml isn't
namespaced), and if/how that made things easier?  At first blush it
doesn't really seem to since any tool would need to deal with the Solr
XML response in general.

-Yonik
http://www.lucidimagination.com

Re: Namespaces in response (SOLR-1586)

Posted by "Ramirez, Paul M (388J)" <pa...@jpl.nasa.gov>.

Hey All,

I think Eric is right on here and what I thought the intent of the patch was. Facilitating integration of Solr into environments where there is not "one true XML output". In addition, there shouldn't be "one true JSON output" for cases where your existing code already has a way it expects the JSON. Why not allow someone to write a JSON output that feeds directly into that tool without having to change that tool. This is what makes Solr so cool is because of its flexibility and to limit that would be a shame. None of this really has to limit the internal representation or what the Solr community builds to support it's format but don't unnecessarily relegate that functionality to XSLT.

--Paul


On 12/9/09 11:22 AM, "Eric Pugh" <ep...@opensourceconnections.com> wrote:



Is this the opportunity of having more then one XML output type?  I
mean, XML is meant to be a transport medium for data, and maybe moving
from a "one true XML output" for Solr to being able to support
multiple outputs dependent on the consumer would be useful.  I can see
it making it easier to plug Solr into environments that expect data in
certain formats, without doing an extra XSL transformation?

Eric

Re: Namespaces in response (SOLR-1586)

Posted by Eric Pugh <ep...@opensourceconnections.com>.

XML is definitly one of those emotional issues in the tech world!   
Those who grok it don't understand why those who don't love it won't  
use it everywhere.  And those who dislike it can't see the benefits  
often of XML because of their bad experiences.

I know I just spent a week mucking around with an application where I  
couldn't start it up because of XML validation errors.  Errors  
generated not because the XML was wrong, but because the validation  
process was borked up.  It led me down a rat hole of frustration  
chasing Schemas and DTDs and validating parsers...  I think that  
frustration is part of what has pushed people to embrace JSON, YML,  
and other approaches for encoding data.

The biggest thing I love about Solr is "it just works...".  It's  
simple.  It's powerful.  You don't have to commit months to  
understanding it.  And yet if you want to do advanced things then Solr  
is fairly forgiving of that, and gives you the hooks/plugins to do it.

Is this the opportunity of having more then one XML output type?  I  
mean, XML is meant to be a transport medium for data, and maybe moving  
from a "one true XML output" for Solr to being able to support  
multiple outputs dependent on the consumer would be useful.  I can see  
it making it easier to plug Solr into environments that expect data in  
certain formats, without doing an extra XSL transformation?

Eric

On Dec 9, 2009, at 2:11 PM, Mattmann, Chris A (388J) wrote:

> Hi Yonik:
>
>>
>> If you're forced to declare the namespace / put the URI, I'm just
>> afraid of what clients / XML parsers out there may start trying to
>> validate by default.
>
> And even if they did, it's valid XML so what's the problem?
>
>> And I'm still trying to figure out what we gain.
>
> * plugging into other standard GIS tools
> (here's a list of georss ones:
>
> http://www.google.com/#hl=en&source=hp&fkt=1998&fsdt=4214&q=georss 
> +readers&a
> q=f&aqi=g1&oq=&fp=b36c7832dbb01be6
>  )
>
> * understanding that a <point is not a <solr:point (which in your  
> examples
> you're using a ',' to separate them while e.g., georss suggests a '  
> ') but a
> georss:point. From this you can:
>  - look up the field definition
>  - generate default values
>  - understand the unit restrictions
>
> There is a wealth of work in XML schema so I'm not sure I have to  
> justify
> its use.
>
>> If one does want validation, it seems like we should have an
>> (optional) schema for the XML response as a whole?
>
> I'm happy to provide this, for validation, but let's start small,  
> then grow
> big. SOLR-1586 does _not_ break anything.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Yonik:

> 
> If you're forced to declare the namespace / put the URI, I'm just
> afraid of what clients / XML parsers out there may start trying to
> validate by default.

And even if they did, it's valid XML so what's the problem?

> And I'm still trying to figure out what we gain.

* plugging into other standard GIS tools
 (here's a list of georss ones:
    
http://www.google.com/#hl=en&source=hp&fkt=1998&fsdt=4214&q=georss+readers&a
q=f&aqi=g1&oq=&fp=b36c7832dbb01be6
  )

* understanding that a <point is not a <solr:point (which in your examples
you're using a ',' to separate them while e.g., georss suggests a ' ') but a
georss:point. From this you can:
  - look up the field definition
  - generate default values
  - understand the unit restrictions

There is a wealth of work in XML schema so I'm not sure I have to justify
its use. 

>  If one does want validation, it seems like we should have an
> (optional) schema for the XML response as a whole?

I'm happy to provide this, for validation, but let's start small, then grow
big. SOLR-1586 does _not_ break anything.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Dec 9, 2009 at 12:40 PM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> <foo>
>  <zoo:bar xmlns:zoo="http://example.com/zoo">hi</zoo:bar>
> </foo>

If you're forced to declare the namespace / put the URI, I'm just
afraid of what clients / XML parsers out there may start trying to
validate by default.  And I'm still trying to figure out what we gain.
 If one does want validation, it seems like we should have an
(optional) schema for the XML response as a whole?

-Yonik
http://www.lucidimagination.com

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Yonik,

> Should have tried this before... I just created a small XML file:
> 
> <foo>
>   <bar>hi</bar>
> </foo>
> 
> I pointed both firefox and IE at this file and it displays as XML fine.
> I then changed the file to this:
> 
> <foo>
>   <zoo:bar>hi</zoo:bar>
> </foo>

Sure, of course it does. It's because that's not valid XML syntax. You have
to declare the namespace for zoo. You can do it at the top of the XML file
in the root XML tag. Or, you can do it inline (like I've done in SOLR).

Try this:

<foo>
 <zoo:bar xmlns:zoo="http://example.com/zoo">hi</zoo:bar>
</foo>

Cheers,
Chris


> 
> That made both of them barf.
> That alone makes me lean pretty strongly against using a namespace for this.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Wed, Dec 9, 2009 at 12:28 PM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J)
>> <ch...@jpl.nasa.gov> wrote:
>>> How does it introduce any new requirements? Namespaces are easily ignored by
>>> any XML client as they are if they weren't present. In other words, unless
>>> the XML client has setValidating=true, then this isn't an issue.
>> 
>> I've run across cases where I added a schema declaration to an XML
>> file and then things started failing.  I think some parsers may
>> default to validating if it sees that it can?
>> 
>> Namespaces are to avoid name clashes.  Solr XML is well defined and
>> not arbitrary... adding <point> if we wish to do so won't introduce
>> any clashes.
>> 
>>> The only difference between what you call simple above and what I've
>>> proposed (and correct me if I'm wrong but others have too) is that your
>>> <point tag would include a namespace prefix and an xmlns attribute. What's
>>> the difference?
>>> 
>>>> It is worth using standards when they buy you enough.... I'm not sure
>>>> this is one of those times.
>>>> I'm sure there are standards for numeric types like <int> too... but
>>>> using namespaces for that seems like overkill.
>>> 
>>> There's a difference between a primitive type like int, and one like point.
>>> Also, it all comes down to your use case. If the only thing you're ever
>>> going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever
>>> PL you want) then namespaces/etc. might be overkill. But why open up the
>>> response format then and advertise SOLR as something that provides REST-ful
>>> services for search?
>> 
>> REST-ful doesn't say anything about customizing the response format.
>> 
>>> If that's the case, then users consuming those
>>> responses need the flexibility to customize them for their use case
>>> (validation, plugging into external GIS tools, etc.). So, I don't agree with
>>> this.
>> 
>> What GIS tool could deal with a Solr XML response format w/o any other
>> knowledge of everything else in the response?
>> Are there some real use cases that using a namespace vs not for point
>> make easier (an honest question... I don't know much about GIS stuff).
>> 
>>> All I've done is use what already exists. There doesn't need to be any
>>> patches. XmlWriter#writePrim allowed you to do this before, see:
>> 
>> Yeah, you can use that to output <long>false</long> too... but it will
>> cause certain clients to barf.
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Should have tried this before... I just created a small XML file:

<foo>
  <bar>hi</bar>
</foo>

I pointed both firefox and IE at this file and it displays as XML fine.
I then changed the file to this:

<foo>
  <zoo:bar>hi</zoo:bar>
</foo>

That made both of them barf.
That alone makes me lean pretty strongly against using a namespace for this.

-Yonik
http://www.lucidimagination.com



On Wed, Dec 9, 2009 at 12:28 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J)
> <ch...@jpl.nasa.gov> wrote:
>> How does it introduce any new requirements? Namespaces are easily ignored by
>> any XML client as they are if they weren't present. In other words, unless
>> the XML client has setValidating=true, then this isn't an issue.
>
> I've run across cases where I added a schema declaration to an XML
> file and then things started failing.  I think some parsers may
> default to validating if it sees that it can?
>
> Namespaces are to avoid name clashes.  Solr XML is well defined and
> not arbitrary... adding <point> if we wish to do so won't introduce
> any clashes.
>
>> The only difference between what you call simple above and what I've
>> proposed (and correct me if I'm wrong but others have too) is that your
>> <point tag would include a namespace prefix and an xmlns attribute. What's
>> the difference?
>>
>>> It is worth using standards when they buy you enough.... I'm not sure
>>> this is one of those times.
>>> I'm sure there are standards for numeric types like <int> too... but
>>> using namespaces for that seems like overkill.
>>
>> There's a difference between a primitive type like int, and one like point.
>> Also, it all comes down to your use case. If the only thing you're ever
>> going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever
>> PL you want) then namespaces/etc. might be overkill. But why open up the
>> response format then and advertise SOLR as something that provides REST-ful
>> services for search?
>
> REST-ful doesn't say anything about customizing the response format.
>
>> If that's the case, then users consuming those
>> responses need the flexibility to customize them for their use case
>> (validation, plugging into external GIS tools, etc.). So, I don't agree with
>> this.
>
> What GIS tool could deal with a Solr XML response format w/o any other
> knowledge of everything else in the response?
> Are there some real use cases that using a namespace vs not for point
> make easier (an honest question... I don't know much about GIS stuff).
>
>> All I've done is use what already exists. There doesn't need to be any
>> patches. XmlWriter#writePrim allowed you to do this before, see:
>
> Yeah, you can use that to output <long>false</long> too... but it will
> cause certain clients to barf.
>
> -Yonik
> http://www.lucidimagination.com
>

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Yonik,

> 
> I've run across cases where I added a schema declaration to an XML
> file and then things started failing.  I think some parsers may
> default to validating if it sees that it can?

I've seen this too. But it won't affect the interaction we're talking about
like I said, SOLR-1586 outputs valid XML, so this isn't an issue.

> 
> Namespaces are to avoid name clashes.  Solr XML is well defined and
> not arbitrary... adding <point> if we wish to do so won't introduce
> any clashes.
> 

Actually there are quite a bit of use cases for namespacing beyond name
clashes. Namespaces enable validation, understanding and definition for
elements (understanding units, ranges, etc.). For instance, you and I both
use the term "mass", but in my domain, mass refers to the planetary science
definition of mass, but, in your domain you mean earth science. "mass" does
not always mean the same thing (variation in units, representation, etc.)

See here:

http://www.w3.org/TR/2006/REC-xml-names11-20060816/

>> The only difference between what you call simple above and what I've
>> proposed (and correct me if I'm wrong but others have too) is that your
>> <point tag would include a namespace prefix and an xmlns attribute. What's
>> the difference?
>> 
>>> It is worth using standards when they buy you enough.... I'm not sure
>>> this is one of those times.
>>> I'm sure there are standards for numeric types like <int> too... but
>>> using namespaces for that seems like overkill.
>> 
>> There's a difference between a primitive type like int, and one like point.
>> Also, it all comes down to your use case. If the only thing you're ever
>> going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever
>> PL you want) then namespaces/etc. might be overkill. But why open up the
>> response format then and advertise SOLR as something that provides REST-ful
>> services for search?
> 
> REST-ful doesn't say anything about customizing the response format.

So are you saying that the intention is not to allow customization of the
response format? Also you've released how many releases of SOLR that have
the capability to do this and now you're suddenly going to change it? I'm
sorry I disagree.

> 
>> If that's the case, then users consuming those
>> responses need the flexibility to customize them for their use case
>> (validation, plugging into external GIS tools, etc.). So, I don't agree with
>> this.
> 
> What GIS tool could deal with a Solr XML response format w/o any other
> knowledge of everything else in the response?
> Are there some real use cases that using a namespace vs not for point
> make easier (an honest question... I don't know much about GIS stuff).

Using standards enables standard tool development. Unless you want everyone
to develop their own custom tools for SOLR (or be tied to using whatever is
provided by SOLR _only_), and I don't think that's the intent. I also don't
think that's a very friendly, open strategy for users. What I'm proposing
does _not_ break backwards compatibility, anywhere. If you've got an
example, then speak up.

> 
>> All I've done is use what already exists. There doesn't need to be any
>> patches. XmlWriter#writePrim allowed you to do this before, see:
> 
> Yeah, you can use that to output <long>false</long> too... but it will
> cause certain clients to barf.

That's a ResponseWriter issue. That's not a client issue. Clients don't
arbitrarily connect to servers for which they don't speak the protocol
language.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Wed, Dec 9, 2009 at 11:44 AM, Mattmann, Chris A (388J)
<ch...@jpl.nasa.gov> wrote:
> How does it introduce any new requirements? Namespaces are easily ignored by
> any XML client as they are if they weren't present. In other words, unless
> the XML client has setValidating=true, then this isn't an issue.

I've run across cases where I added a schema declaration to an XML
file and then things started failing.  I think some parsers may
default to validating if it sees that it can?

Namespaces are to avoid name clashes.  Solr XML is well defined and
not arbitrary... adding <point> if we wish to do so won't introduce
any clashes.

> The only difference between what you call simple above and what I've
> proposed (and correct me if I'm wrong but others have too) is that your
> <point tag would include a namespace prefix and an xmlns attribute. What's
> the difference?
>
>> It is worth using standards when they buy you enough.... I'm not sure
>> this is one of those times.
>> I'm sure there are standards for numeric types like <int> too... but
>> using namespaces for that seems like overkill.
>
> There's a difference between a primitive type like int, and one like point.
> Also, it all comes down to your use case. If the only thing you're ever
> going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever
> PL you want) then namespaces/etc. might be overkill. But why open up the
> response format then and advertise SOLR as something that provides REST-ful
> services for search?

REST-ful doesn't say anything about customizing the response format.

> If that's the case, then users consuming those
> responses need the flexibility to customize them for their use case
> (validation, plugging into external GIS tools, etc.). So, I don't agree with
> this.

What GIS tool could deal with a Solr XML response format w/o any other
knowledge of everything else in the response?
Are there some real use cases that using a namespace vs not for point
make easier (an honest question... I don't know much about GIS stuff).

> All I've done is use what already exists. There doesn't need to be any
> patches. XmlWriter#writePrim allowed you to do this before, see:

Yeah, you can use that to output <long>false</long> too... but it will
cause certain clients to barf.

-Yonik
http://www.lucidimagination.com

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss,

> : I think it's rather powerful. You insulate the following variations into 1
> : single place to change them (FieldType):
> :
> : * output representation
> : * indexing
> : * validation
> :
> : To remove this from FieldType would be to strew the same functionality
> : across multiple classes, which doesn't make sense IMHO.
> 
> it's a damned-if-you-do/damned-if-you-don't situation though ... you look
> at as "insulating" the response writers because all of the logic about
> serializing data is in the FieldType, but i look at it as "poluting" the
> FieldType with knowledge about the output formats -- there's a reason we
> didn't add "writeBinary" to the FieldTYpe when the BinaryResponseWriter
> was added ... the toObject abstraction let's the FieldType do whatever it
> wants internally, and provide it's "best face" to the world when asked.
> the ResponseWriters can then apply hueristics to decide the most
> compatible type they know of to use when representing it: "is it something
> complex i have a codec for? no; oh well, then is it soemthing that
> implemnets COllection? no; oh well, then is it something that is an
> instanceof Number? no; oh well, as a last resort we can stringify"

Sure, it's just that it's half-way on both sides right now like you said.
There's probably a middle ground. I like the "insulation" but I also
understand the "clutter" (i.e., what you're saying).

> 
> : In the long run, this might be nice, and +1 on getting there in the long
> : run. In the short, a compromise is to allow namespacing on fields in the
> : existing XmlWriter, which is allowed anyways, whether by oversight or not.
> 
> I'm sure if we look hard enough at teh existing internal APIs, we can find
> a way to generate completley broken XML that no DOM, SAX or pull parser
> could possibly deal with cleanly -- but that doesn't mean we should do
> that just because it would allow us to start outputing a bunch of metadata
> that we think is useful.  breaking the (implicit) XML Schema is just as
> bad as breaking the XML itself.

Agreed. Let's document that (implicit) schema so loud people like me don't
keep bugging you guys when it's so obvious to you. I'm just trying to help.
I'll take an action.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: > themselves ... because of the back-ass-wards way we have FieldTypes write
: > their values directly to an XMLWriter or a TextWriter the idea of using an
: > object that stringifies itself as needed doesn't really apply very well
: 
: I think it's rather powerful. You insulate the following variations into 1
: single place to change them (FieldType):
: 
: * output representation
: * indexing
: * validation
: 
: To remove this from FieldType would be to strew the same functionality
: across multiple classes, which doesn't make sense IMHO.

it's a damned-if-you-do/damned-if-you-don't situation though ... you look 
at as "insulating" the response writers because all of the logic about 
serializing data is in the FieldType, but i look at it as "poluting" the 
FieldType with knowledge about the output formats -- there's a reason we 
didn't add "writeBinary" to the FieldTYpe when the BinaryResponseWriter 
was added ... the toObject abstraction let's the FieldType do whatever it 
wants internally, and provide it's "best face" to the world when asked.  
the ResponseWriters can then apply hueristics to decide the most 
compatible type they know of to use when representing it: "is it something 
complex i have a codec for? no; oh well, then is it soemthing that 
implemnets COllection? no; oh well, then is it something that is an 
instanceof Number? no; oh well, as a last resort we can stringify"

: In the long run, this might be nice, and +1 on getting there in the long
: run. In the short, a compromise is to allow namespacing on fields in the
: existing XmlWriter, which is allowed anyways, whether by oversight or not.

I'm sure if we look hard enough at teh existing internal APIs, we can find 
a way to generate completley broken XML that no DOM, SAX or pull parser 
could possibly deal with cleanly -- but that doesn't mean we should do 
that just because it would allow us to start outputing a bunch of metadata 
that we think is useful.  breaking the (implicit) XML Schema is just as 
bad as breaking the XML itself.



-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss,

> : ...unless things have changed since hte last time i looked, all of the
> : "out of the box" response writers call "toString()" on any object they
> : don't understand.  So the best way to move forward in a flexible manner
> : seems like it would be to add a new "GeoPoint" object to Solr, which
> : toStrings to a simple "-34.56,67.89" for use by existing response writers
> : as a string, but some newer smarter response writer could output it in
> : some more sophisticated manner.
> 
> The caveat to that, now that i've skimmed SOLR-1586, is that it currently
> only applies to objects "added" to the SolrQueryResponse (or one of hte
> containers in it) datastructure that the ResponseWriter's "walk"
> themselves ... because of the back-ass-wards way we have FieldTypes write
> their values directly to an XMLWriter or a TextWriter the idea of using an
> object that stringifies itself as needed doesn't really apply very well

I think it's rather powerful. You insulate the following variations into 1
single place to change them (FieldType):

* output representation
* indexing
* validation

To remove this from FieldType would be to strew the same functionality
across multiple classes, which doesn't make sense IMHO.

> ... and it won't unless we switch all of the ResponseWRiters to follow the
> BinaryResponseWriter model of using FieldType.toObject(...) to get the
> field value as an "obejct" that can be sent over the wire -- then the
> existing XmlResponseWriter, and the Text ResponseWriters, can call
> toString() on Objects they doesn't understand, and some
> newer/hipper/cooler response writers that understand georss can do fancier
> things with it.

In the long run, this might be nice, and +1 on getting there in the long
run. In the short, a compromise is to allow namespacing on fields in the
existing XmlWriter, which is allowed anyways, whether by oversight or not.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: ...unless things have changed since hte last time i looked, all of the 
: "out of the box" response writers call "toString()" on any object they 
: don't understand.  So the best way to move forward in a flexible manner 
: seems like it would be to add a new "GeoPoint" object to Solr, which 
: toStrings to a simple "-34.56,67.89" for use by existing response writers 
: as a string, but some newer smarter response writer could output it in 
: some more sophisticated manner.

The caveat to that, now that i've skimmed SOLR-1586, is that it currently 
only applies to objects "added" to the SolrQueryResponse (or one of hte 
containers in it) datastructure that the ResponseWriter's "walk" 
themselves ... because of the back-ass-wards way we have FieldTypes write 
their values directly to an XMLWriter or a TextWriter the idea of using an 
object that stringifies itself as needed doesn't really apply very well 
... and it won't unless we switch all of the ResponseWRiters to follow the 
BinaryResponseWriter model of using FieldType.toObject(...) to get the 
field value as an "obejct" that can be sent over the wire -- then the 
existing XmlResponseWriter, and the Text ResponseWriters, can call 
toString() on Objects they doesn't understand, and some 
newer/hipper/cooler response writers that understand georss can do fancier 
things with it.



-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: > eh ... agree to disagree i guess. it seems just as valid to say that
: > "UpdateCommand" -- what type of data does it update? ... or that
: > "RequestHandler" is ambigious because it can only handle "Solr" requests,
: > so it should be title "SolrRequestHandler".
: 
: True! I guess it's just aesthetics. I can go either way, but I dunno. (and
: yes, just to be a pest, What type of data does that UpdateCommand update?)

Isn't it obvious from the context? ... Solr Data  :)

(i think that's the first, and last, time i've used an emoticon on a 
lucene mailing list )

: You give a little, you get a little back. Maybe a compromise is to called it
: NamedListResponseWriter, b/c that's really what it writes, no? Naming can be

By that logic every ResponseWriter is a NamedListResponseWriter, and a 
StringResponseWriter and a MapResponseWriter ... at a certain point you 
have to just trust that people will read the docs, you can't encode every 
bit of knowledge about hte code base into the names.


-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss:

On 12/15/09 6:39 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

> 
> 
> : > a SolrQueryResponse, no one has ever accused any of those response writers
> : > of not being flexible enough to generate a *different* type of response in
> : > those formats.
> :
> : You may be right, but actually quite a few issues have referenced even non
> : XMLWriters of similar issues. See:
> 
> I honeslty don't understand what you're getting at here, this list of
> issues is all over the map and almost none of them relate to the
> extensibility of any request handlers...

They may be all over the map, but in general they address your statement
about "non-XML response writers" being flexible enough to generate a
different type of response (although admittedly, none are as clear at the
XMLWriter examples, I'll give you that). The examples I gave were just based
on a quick search of JIRA.

> : Maybe, maybe not. I'm not sure the effect is to make it crystal clear as
> : much as it is to make it "clearer". XMLWriter is totally ambiguous -- what
> : type of "XML" does it generate? I would argue "SOLR response XML", hence the
> : SorlXmlResponseWriter.
> 
> eh ... agree to disagree i guess. it seems just as valid to say that
> "UpdateCommand" -- what type of data does it update? ... or that
> "RequestHandler" is ambigious because it can only handle "Solr" requests,
> so it should be title "SolrRequestHandler".

True! I guess it's just aesthetics. I can go either way, but I dunno. (and
yes, just to be a pest, What type of data does that UpdateCommand update?)

> 
> we have enough ambiguity and confusion with some of our config file
> options and names that non-java users see ... the ones that only plugin
> writers see i'm less concerned with ... better to beef up the javadocs
> that deal with a bunch of deprecation headaches just to add "Solr" to the
> front of a class name.

You give a little, you get a little back. Maybe a compromise is to called it
NamedListResponseWriter, b/c that's really what it writes, no? Naming can be
a pain -- I'll try and think of a good one when I'm preparing the patch for
SOLR-1649.

Thanks for the discussion. Helps to clarify things!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: > a SolrQueryResponse, no one has ever accused any of those response writers
: > of not being flexible enough to generate a *different* type of response in
: > those formats.
: 
: You may be right, but actually quite a few issues have referenced even non
: XMLWriters of similar issues. See:

I honeslty don't understand what you're getting at here, this list of 
issues is all over the map and almost none of them relate to the 
extensibility of any request handlers...

: http://issues.apache.org/jira/browse/SOLR-1616
  ... this was from someone who didn't notice json.nl=arrarr and 
  felt like the default way of representing a NamedList in JSON was odd.  
  they didn't disagree with the JSON structure, they just don't like the 
  default.
: http://issues.apache.org/jira/browse/SOLR-358
  ...this was an improvement issue to track adding the ruby response 
  writer ... which idnd't exist before this.
: http://issues.apache.org/jira/browse/SOLR-1555
  ...this is a bug in how the term compontent adds the terms to the 
  response ... it's completley orthoginal to the response output 
  structure.
: http://issues.apache.org/jira/browse/SOLR-431
  ...this is from one of my coworkers who had some really old, really 
  hideously hackish plugins from before Solr was open sourced that was 
  trying to find a way to work arround a big fixed in the xml escaping -- 
  i could maybe see this as a "response writers need to be more flexible" 
  type issue, except they knew from the start the start they were abusing 
  a bug.
: http://issues.apache.org/jira/browse/SOLR-912
  ...this is an issue Kay opened to revamp NamedList to be more typesafe 
  ... also has absolutely nothign to do with how flexible the output 
  representation is.

: Maybe, maybe not. I'm not sure the effect is to make it crystal clear as
: much as it is to make it "clearer". XMLWriter is totally ambiguous -- what
: type of "XML" does it generate? I would argue "SOLR response XML", hence the
: SorlXmlResponseWriter.

eh ... agree to disagree i guess. it seems just as valid to say that 
"UpdateCommand" -- what type of data does it update? ... or that 
"RequestHandler" is ambigious because it can only handle "Solr" requests, 
so it should be title "SolrRequestHandler".

we have enough ambiguity and confusion with some of our config file 
options and names that non-java users see ... the ones that only plugin 
writers see i'm less concerned with ... better to beef up the javadocs 
that deal with a bunch of deprecation headaches just to add "Solr" to the 
front of a class name.


-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss,

On 12/14/09 3:18 PM, "Chris Hostetter" <ho...@fucit.org> wrote:

> 
> : Well, I actually would disagree. What's the point of #toInternal and
> : #toExternal then, other than to convert from the external representation to
> : an internal Lucene index representation, and then to do the opposite coming
> : out of the index?
> 
> that is what they are for -- but they deal purely in string
> representations of hte data itself -- they don't (and shouldn't) know/care
> wether the data is then being encapsulted in JSON, thrift, Avro, Solr XML,
> RSS, KML, etc....
> 
> The "String" limitation of toExternal is on of the reasons toObject was
> added (and the reason the BinaryResponseWRiter uses toObject()).

Conceptually I think that the best approach would be to do something similar
to the functionality of #toObject, but to not call it that. #toInternal and
#toExternal are actually good names, their interface is just off (they
shouldn't return Strings).

> 
> : class final which it once was). We should rename that to
> : SolrXmlResponseWriter, but it's not really generic XML (as the name
> : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
> 
> Eh... i don't know that the name suggests that it can generate generic
> XML, it generates a (particular) one to one mapping from the
> SolrQueryResponse to XML .. just like the JSONResponseWriter generates a
> one to one mapping fromthe SolrQueryResponse to JSON, and ditoo for the
> ruby/php/python writers ... there an infinite number of possible
> XML/JSON/Ruby/PHP/Python/etc. structures that *could* be generated from
> a SolrQueryResponse, no one has ever accused any of those response writers
> of not being flexible enough to generate a *different* type of response in
> those formats.

You may be right, but actually quite a few issues have referenced even non
XMLWriters of similar issues. See:

http://issues.apache.org/jira/browse/SOLR-1616
http://issues.apache.org/jira/browse/SOLR-358
http://issues.apache.org/jira/browse/SOLR-1555
http://issues.apache.org/jira/browse/SOLR-431
http://issues.apache.org/jira/browse/SOLR-912

> 
> And practicle speaking: slapping "Solr" in front of a response writer
> classname isn't going to make it crystal clear that it produces a "solr
> specific" type of "____".  It's oging to make people think it's the
> "Solr" implemntation of "____".  "Solr" is hte prefix of enough classnames
> that eyeballs are just going to gloss over it.

Maybe, maybe not. I'm not sure the effect is to make it crystal clear as
much as it is to make it "clearer". XMLWriter is totally ambiguous -- what
type of "XML" does it generate? I would argue "SOLR response XML", hence the
SorlXmlResponseWriter.

> 
> : suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
> : it's undocumented, I'd be happy to throw it together for it's XML format.
> 
> we actaully went round and round on documenting it back in the early days
> .. frequently it was deemed "self documenting" enough for end users so not
> much effort was ever put into it.  there was a Jira issue to create and
> XSD, but even once we had one, no one really had any idea what to *do*
> with it...
> 
> https://issues.apache.org/jira/browse/SOLR-17

I commented on SOLR-17 on what could be done with it, and I linked it to the
new issue I threw up: SOLR-1646. Both can be closed at the same time, or
even better, I can close SOLR-1646 and then work diligently on trying to get
SOLR-17 committed. Even for documentation purposes it's well worth while.

> 
> 
> : Would that also be welcomed? Then, we should develop an easy extension point
> : mechanism for people who want to develop their own XML response writers and
> : write their own clients (or leverage existing clients that understand that
> : XML).
> 
> +1
> 
> I think the crux of this would be XML based response writer similar to hte
> BinaryResponseWriter that can use a "codec" type system for outputing
> known types of objects, using FiledType.toOBject() to get field values.
> Then we just have to provide "default" codecs for all the types of objects
> we produce "out of the box", but people can customize with their own
> codecs if they want differnet representation.

+1!

Thanks, Hoss.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: I'm conflicted here. In simple semantics, sure it's just an array of
: float/double numbers. A, if a string must be used a comma is probably OK, so
: long as it maps to some existing known approach to represent points. I've
: asked several times if there are examples. I can point to one that uses
: spaces to separate the coordinates in the point (georss). What others use
: comma? 

I have no opinion about the details ... space seperated string, comma 
seperated string, list of ints ... they are all the same to me.

As a layman, my limited knowledge of geo coordinates has a vague notion 
that comma is the seperated used when discussing latitude nad longitute, 
but i have no real knowledge of naything GIS related.  (i think i remember 
that KML uses comma, but KML also has some weird idea that longitude comes 
first because that's what the guys writing graphics rendering engines 
aparently like: y-axis first)

: Well, I actually would disagree. What's the point of #toInternal and
: #toExternal then, other than to convert from the external representation to
: an internal Lucene index representation, and then to do the opposite coming
: out of the index? 

that is what they are for -- but they deal purely in string 
representations of hte data itself -- they don't (and shouldn't) know/care 
wether the data is then being encapsulted in JSON, thrift, Avro, Solr XML, 
RSS, KML, etc....

The "String" limitation of toExternal is on of the reasons toObject was 
added (and the reason the BinaryResponseWRiter uses toObject()).

: class final which it once was). We should rename that to
: SolrXmlResponseWriter, but it's not really generic XML (as the name
: suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since

Eh... i don't know that the name suggests that it can generate generic 
XML, it generates a (particular) one to one mapping from the 
SolrQueryResponse to XML .. just like the JSONResponseWriter generates a 
one to one mapping fromthe SolrQueryResponse to JSON, and ditoo for the 
ruby/php/python writers ... there an infinite number of possible 
XML/JSON/Ruby/PHP/Python/etc. structures that *could* be generated from 
a SolrQueryResponse, no one has ever accused any of those response writers 
of not being flexible enough to generate a *different* type of response in 
those formats.

And practicle speaking: slapping "Solr" in front of a response writer 
classname isn't going to make it crystal clear that it produces a "solr 
specific" type of "____".  It's oging to make people think it's the 
"Solr" implemntation of "____".  "Solr" is hte prefix of enough classnames 
that eyeballs are just going to gloss over it.

: suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
: it's undocumented, I'd be happy to throw it together for it's XML format.

we actaully went round and round on documenting it back in the early days 
.. frequently it was deemed "self documenting" enough for end users so not 
much effort was ever put into it.  there was a Jira issue to create and 
XSD, but even once we had one, no one really had any idea what to *do* 
with it...

https://issues.apache.org/jira/browse/SOLR-17


: Would that also be welcomed? Then, we should develop an easy extension point
: mechanism for people who want to develop their own XML response writers and
: write their own clients (or leverage existing clients that understand that
: XML).

+1

I think the crux of this would be XML based response writer similar to hte 
BinaryResponseWriter that can use a "codec" type system for outputing 
known types of objects, using FiledType.toOBject() to get field values.  
Then we just have to provide "default" codecs for all the types of objects 
we produce "out of the box", but people can customize with their own 
codecs if they want differnet representation.


-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss,

> : > : I think the initial geosearch feature can start off with
> : > : <str>10,20</str> for a point.
> : >
> : > +1.
> :
> : Fundamentally, how is a string a point?
> 
> Fundementally a string is not a point, and a point is not a string -- but
> if you want express the concept of a point in a manner that only uses very
> simple primative types, then a string containing comma seperated numbers
> is a pretty dencet way to do it.  If you'd prefer, a pair of numbers would
> workd just as well...
> 
>    <arr><float>10</float><float>20</float></arr>

I'm conflicted here. In simple semantics, sure it's just an array of
float/double numbers. A, if a string must be used a comma is probably OK, so
long as it maps to some existing known approach to represent points. I've
asked several times if there are examples. I can point to one that uses
spaces to separate the coordinates in the point (georss). What others use
comma? 

> 
> : > The current XML format SOlr uses was designed to be extremely simple, very
> : > JSON-esque, and easily parsable by *anyone* in any langauge, without
> : > needing special knowledge of types .
> :
> : Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
> : Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
> : It's also for representation?
> 
> No, actually the use case for FieldTYpes is entirely about the internal
> logic of how Solr should deal with those fields, and how various
> operations should work on them.  FieldTypes can dictate the internal
> representation within the confines of a Lucene index, but they should not
> circumvent the contracts of the response writers in dictating what
> is/isn't a legal response.

Well, I actually would disagree. What's the point of #toInternal and
#toExternal then, other than to convert from the external representation to
an internal Lucene index representation, and then to do the opposite coming
out of the index? 

> 
> : allowed for a while I think), why prevent it? Allowing namespaces does _not_
> : break anything.
>         ...
> : > introducing a new 'point" concept, wether as <point> or as
> : > <georss:point/>, is going to break things for people.
> :
> : Show me an example, I fundamentally disagree with this.
> 
> Ok. Let's start with SolrJ then: take a look at the KnownType enum (line
> 151) in XMLResponseParser...
> 
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/clien
> t/solrj/impl/XMLResponseParser.java?revision=819403&view=markup

Got it. OK, sure, well thanks for actually being able to identify somewhere
where it would be and for taking the time to provide a link. So what you are
saying is that this breaks the SolrJ and python clients and people who
develop clients to parse and read the (undocumented) SOLR response schema.

> 
> ...or let's do a random google code search for "solr xml lst" -- check out
> ResponseContentHandler in solrpy...
> 
> http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841
> 
> ...I can't write python code to save my life, but I have pretty good idea
> what that code will do if it sees an unexpected tag.

Gotcha.


> : And why is that? Isn't the point of SOLR to expand to use cases brought up
> : by users of the system? As long as those use cases can be principally
> : supported, without breaking backwards compatibility (or in that case,  if
> : they do, with large blinking red text that says it), then you're shutting
> : people out for 0 benefit? It's aesthetics we're talking about here.
> 
> I don't know if i'd say that's the point of Solr, but yes we should
> absolutely try to grow the capabilities of the system as new use cases
> come along.

Well that's what I was trying to do, but all I was hearing was a lot of
hollering without any help to understand why. Thanks for being the one to
finally provide that information.

> 
> I am 100% in agreement that the existing "simple" XMLRresponseWriter is
> not for everyone -- Historicly we've tried to maintain a sense of equality
> between all of hte Response writers, so that they all contained the same
> data just with different markup -- but there are clearly cases where it
> would be nice to have a response writer that is allowed to "know more"
> about teh real structure of the data and represent it in a manner that
> more closely represents it's purpose.

I'd like to refactor the whole thing to be a bit less brittle, and also to
close off people that shouldn't be dealing with SOLR's XML in/out (by taking
away your favorite writePrim method and its public modifier and making the
class final which it once was). We should rename that to
SolrXmlResponseWriter, but it's not really generic XML (as the name
suggests), it's SOLR's custom (undocumented) XML schema, right? Also, since
it's undocumented, I'd be happy to throw it together for it's XML format.
Would that also be welcomed? Then, we should develop an easy extension point
mechanism for people who want to develop their own XML response writers and
write their own clients (or leverage existing clients that understand that
XML).

> There is a clear push for Solr to natively be able to generated responses that
> incorporate more "industry standard" XML schemas, and i would love to see
> us start adding functionality to do that, but bastardizing the existing
> XMLResponseWriter format is not the way to do it.

I see the light now.

> 
> Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags
> to the output generated by the XMLResponseWriter.  Feel free to call me
> stubborn, call me obstinant, call me pedantic -- but there is no way in
> hell i'm going to support a patch that does that.

I won't call you any of those things. Thanks for the help. Let me know what
you think.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: > : I think the initial geosearch feature can start off with
: > : <str>10,20</str> for a point.
: >
: > +1.
:
: Fundamentally, how is a string a point?

Fundementally a string is not a point, and a point is not a string -- but
if you want express the concept of a point in a manner that only uses very
simple primative types, then a string containing comma seperated numbers
is a pretty dencet way to do it. If you'd prefer, a pair of numbers would
workd just as well...

: > The current XML format SOlr uses was designed to be extremely simple, very
: > JSON-esque, and easily parsable by *anyone* in any langauge, without
: > needing special knowledge of types .
:
: Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
: Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
: It's also for representation?

No, actually the use case for FieldTYpes is entirely about the internal
logic of how Solr should deal with those fields, and how various
operations should work on them. FieldTypes can dictate the internal
representation within the confines of a Lucene index, but they should not
circumvent the contracts of the response writers in dictating what
is/isn't a legal response.

XMLWriter.writePrim may be public, which means there is a loophole that
plugin writers can exploit to add new tag names to the Solr XML response that
violate the contract (and no we don't have a formal XSD or DTD for our
XML response format, but we still have a very well advertised contract) --
but that doesn't mean that code which ships with Solr should exploit those
loopholes to violate that contract. People should expect that if they use
Solr as is without any custom code that the XMLResponseWriter won't all of
the sudden start including new, non-primitive-ish, XML tags/attributes
that weren't there before.

That's the entire point of the format as it was designed: break down
whatever complex data might be involved in a response into easily
digestible maps/lists of maps/lists of very primitive types that can
easily be used in any programming langauge.

: allowed for a while I think), why prevent it? Allowing namespaces does _not_
: break anything.
...
: > introducing a new 'point" concept, wether as <point> or as
: > <georss:point/>, is going to break things for people.
:
: Show me an example, I fundamentally disagree with this.

Ok. Let's start with SolrJ then: take a look at the KnownType enum (line
151) in XMLResponseParser...

http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/XMLResponseParser.java?revision=819403&view=markup

...or let's do a random google code search for "solr xml lst" -- check out
ResponseContentHandler in solrpy...

http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841

...I can't write python code to save my life, but I have pretty good idea
what that code will do if it sees an unexpected tag.

This is how a *LOT* of SOlr client libraries are implemented ... it's not
an issue of broken XML parsers freaking out about namespaces, it's an
issue of having a long standing, heavily advertised "schema" for the XML
response that promises to only ever use a handful of types. Adding any
new tags to this format (regardless of how easy it may be because of that
stupid fucking "public" modifier on XMLWuiter.writePrim) will absolutely
break things for people.

: And why is that? Isn't the point of SOLR to expand to use cases brought up
: by users of the system? As long as those use cases can be principally
: supported, without breaking backwards compatibility (or in that case, if
: they do, with large blinking red text that says it), then you're shutting
: people out for 0 benefit? It's aesthetics we're talking about here.

I don't know if i'd say that's the point of Solr, but yes we should
absolutely try to grow the capabilities of the system as new use cases
come along.

I am 100% in agreement that the existing "simple" XMLRresponseWriter is
not for everyone -- Historicly we've tried to maintain a sense of equality
between all of hte Response writers, so that they all contained the same
data just with different markup -- but there are clearly cases where it
would be nice to have a response writer that is allowed to "know more"
about teh real structure of the data and represent it in a manner that
more closely represents it's purpose. This was the entire point behind
adding FieldType.toOBject, and UUIDFIeld w/the BinaryResponseWriter is a
good example of the model we should follow in the future.

There is a clear push for Solr to natively be able to generated responses that
incorporate more "industry standard" XML schemas, and i would love to see
us start adding functionality to do that, but bastardizing the existing
XMLResponseWriter format is not the way to do it.

Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags
to the output generated by the XMLResponseWriter. Feel free to call me
stubborn, call me obstinant, call me pedantic -- but there is no way in
hell i'm going to support a patch that does that.

-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Hoss,

> : I think the initial geosearch feature can start off with
> : <str>10,20</str> for a point.
> 
> +1.

Fundamentally, how is a string a point?

> 
> The current XML format SOlr uses was designed to be extremely simple, very
> JSON-esque, and easily parsable by *anyone* in any langauge, without
> needing special knowledge of types .

Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
It's also for representation?

> It has been heavily advertised as
> only containing a very small handful of tags, representing primitive types
> (int, long, float, date, double, str) and basic collections (arr, lst,
> doc) ... even if id neverh ad a formal shema/DTD.

Which is leading to this confusion. Your argument is kind of weird too --
just because you never had or advertised a feature like this (which SOLR
allowed for a while I think), why prevent it? Allowing namespaces does _not_
break anything. 

> adding new tags to that
> -- name spaced or otherwise -- is a very VERY bad idea for clients who
> have come to expect that they can use very simple parsing code to access
> all the data.

I disagree. I've got a number of projects here that could potentially use
this across multiple domains (planetary science, cancer research, earth
science, space science, etc.) and they all need this capability. Also what's
"simple" have to do with anything? Even "simple" parsers will parse what
SOLR-1586 outputs.

> 
> introducing a new 'point" concept, wether as <point> or as
> <georss:point/>, is going to break things for people.

Show me an example, I fundamentally disagree with this.

> 
> As discussed with Mattman in another thread -- some public methods in
> XMLWriter have inadvertantly made it possible for plugin writers to add
> their own XML tags -- but that doesn't mean we should do it in the core
> Solr distribution.

And why is that? Isn't the point of SOLR to expand to use cases brought up
by users of the system? As long as those use cases can be principally
supported, without breaking backwards compatibility (or in that case,  if
they do, with large blinking red text that says it), then you're shutting
people out for 0 benefit? It's aesthetics we're talking about here.

> If you write your own custom XMLWriter you aren't
> allowed to be suprised when it contains new tags, but our "out of hte box"
> users shouldn't have to deal with such suprises.

What surprise -- their code won't break?

> 
> As also discussed in that same thread thread: it makes a lot of sense
> in the long run to start having Response Writers that can generate more
> "rich" XML based responses and if there are already well defined standards
> for some of these concepts (like georss) then by all means we should
> support them -- but the existing XmlResponseWriter should NOT start
> generating new tags.

I agree with this, but rather than waiting for that to come 2-3 months down
the road, why not buy into the need for this now, with what exists?

> 
> The contract for SolrQueryResponse has always said:
> 
>>>>>> A SolrQueryResponse may contain the following types of Objects
>>>>>> generated by the SolrRequestHandler that processed the request.
>>>>>> ... 
>>>>>> Other data types may be added to the SolrQueryResponse, but there is
>>>>>> no guarantee that QueryResponseWriters will be able to deal with
>>>>>> unexpected types.
> 
> ...unless things have changed since hte last time i looked, all of the
> "out of the box" response writers call "toString()" on any object they
> don't understand.

Actually most of them call some variation of #toExternal, regardless, which
returns a String. Also, #toInternal returns the same type, a String.

> So the best way to move forward in a flexible manner
> seems like it would be to add a new "GeoPoint" object to Solr, which
> toStrings to a simple "-34.56,67.89" for use by existing response writers
> as a string, but some newer smarter response writer could output it in
> some more sophisticated manner.

I'm not convinced of that.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Chris Hostetter <ho...@fucit.org>.

: I think the initial geosearch feature can start off with
: <str>10,20</str> for a point.

+1.

The current XML format SOlr uses was designed to be extremely simple, very 
JSON-esque, and easily parsable by *anyone* in any langauge, without 
needing special knowledge of types .  It has been heavily advertised as 
only containing a very small handful of tags, representing primitive types 
(int, long, float, date, double, str) and basic collections (arr, lst, 
doc) ... even if id neverh ad a formal shema/DTD.  adding new tags to that 
-- name spaced or otherwise -- is a very VERY bad idea for clients who 
have come to expect that they can use very simple parsing code to access 
all the data.

introducing a new 'point" concept, wether as <point> or as 
<georss:point/>, is going to break things for people.

As discussed with Mattman in another thread -- some public methods in 
XMLWriter have inadvertantly made it possible for plugin writers to add 
their own XML tags -- but that doesn't mean we should do it in the core 
Solr distribution.  If you write your own custom XMLWriter you aren't 
allowed to be suprised when it contains new tags, but our "out of hte box" 
users shouldn't have to deal with such suprises.

As also discussed in that same thread thread: it makes a lot of sense 
in the long run to start having Response Writers that can generate more 
"rich" XML based responses and if there are already well defined standards 
for some of these concepts (like georss) then by all means we should 
support them -- but the existing XmlResponseWriter should NOT start 
generating new tags.

The contract for SolrQueryResponse has always said: 

>>>>> A SolrQueryResponse may contain the following types of Objects 
>>>>> generated by the SolrRequestHandler that processed the request.  
>>>>> ...  
>>>>> Other data types may be added to the SolrQueryResponse, but there is 
>>>>> no guarantee that QueryResponseWriters will be able to deal with 
>>>>> unexpected types.

...unless things have changed since hte last time i looked, all of the 
"out of the box" response writers call "toString()" on any object they 
don't understand.  So the best way to move forward in a flexible manner 
seems like it would be to add a new "GeoPoint" object to Solr, which 
toStrings to a simple "-34.56,67.89" for use by existing response writers 
as a string, but some newer smarter response writer could output it in 
some more sophisticated manner.


-Hoss

Re: Namespaces in response (SOLR-1586)

Posted by Walter Underwood <wu...@wunderwood.org>.

On Dec 9, 2009, at 11:11 AM, Mattmann, Chris A (388J) wrote:

>> 
>> Any parser that does that is so broken that you should stop using it
>> immediately. --wunder
> 
> Walter, totally agree here.

To elaborate my position:

1. Validation is a user option. The XML spec makes that very clear. We've had 10 years to get that right, and anyone who auto-validates is not paying attention. Validation is very useful when you are creating XML, rarely useful when reading it.

2. XML namespaces are string prefixes that use the URL syntax. They do not follow URI rules for anything but syntax and there is no guarantee that they can be resolved. In fact, an XML parser can't do anything standard with the result if they do resolve. Again, we've had 10 years to figure that out.

Yes, this can be confusing, but if a parser author can't figure it out, don't use their parser because they are already getting the simple stuff wrong.

wunder

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

> Any parser that does that is so broken that you should stop using it
> immediately. --wunder

Walter, totally agree here.

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Walter Underwood <wu...@wunderwood.org>.

Any parser that does that is so broken that you should stop using it immediately. --wunder

On Dec 9, 2009, at 8:33 AM, Yonik Seeley wrote:

> My gut feeling is that we should not be introducing namespaces by default.
> It introduces a new requirement of XML parsers in clients, and some
> parsers would start validating by default, and going out to the web to
> retrieve the referenced namespace/schema, etc.

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Yonik,

Thanks. Replies below:

> My gut feeling is that we should not be introducing namespaces by default.
> It introduces a new requirement of XML parsers in clients, and some
> parsers would start validating by default, and going out to the web to
> retrieve the referenced namespace/schema, etc.

How does it introduce any new requirements? Namespaces are easily ignored by
any XML client as they are if they weren't present. In other words, unless
the XML client has setValidating=true, then this isn't an issue. Even if it
does, then as long as namespaces are valid (which the source code for SOLR
should have), then there wouldn't be anything introduced.

So, basically the jist of it is, there are 0 new requirements introduced by
this. There is only added functionality, if you want to use it.


> 
> I think the initial geosearch feature can start off with
> <str>10,20</str> for a point.
> If we wish to introduce a point type in the XML and binary response
> writers at a later point in time, it seems like it might require a
> version bump of the output format anyway, and we could go to something
> simple like <point>10,20</point>.

The only difference between what you call simple above and what I've
proposed (and correct me if I'm wrong but others have too) is that your
<point tag would include a namespace prefix and an xmlns attribute. What's
the difference?

> 
> It is worth using standards when they buy you enough.... I'm not sure
> this is one of those times.
> I'm sure there are standards for numeric types like <int> too... but
> using namespaces for that seems like overkill.

There's a difference between a primitive type like int, and one like point.
Also, it all comes down to your use case. If the only thing you're ever
going to do with SOLR is have a SOLR client talk to it (Java, Ruby, whatever
PL you want) then namespaces/etc. might be overkill. But why open up the
response format then and advertise SOLR as something that provides REST-ful
services for search? If that's the case, then users consuming those
responses need the flexibility to customize them for their use case
(validation, plugging into external GIS tools, etc.). So, I don't agree with
this. 

> 
> But if someone wants to supply patches that can optionally enable
> sticking in schema, namespaces, etc, w/o significant impact to the
> default, that's OK too.  Or perhaps a custom response writer that uses
> namespaces for every single type for those who want that.

All I've done is use what already exists. There doesn't need to be any
patches. XmlWriter#writePrim allowed you to do this before, see:

http://www.lucidimagination.com/search/document/be6fb7ce53c2922d/jira_create
d_solr_1592_refactor_xmlwriter_starttag_to_allow_arbitrary_attributes_to_be_
writ

_and_ XmlWriter#writeCdata does now as well.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Yonik Seeley <yo...@lucidimagination.com>.

My gut feeling is that we should not be introducing namespaces by default.
It introduces a new requirement of XML parsers in clients, and some
parsers would start validating by default, and going out to the web to
retrieve the referenced namespace/schema, etc.

I think the initial geosearch feature can start off with
<str>10,20</str> for a point.
If we wish to introduce a point type in the XML and binary response
writers at a later point in time, it seems like it might require a
version bump of the output format anyway, and we could go to something
simple like <point>10,20</point>.

It is worth using standards when they buy you enough.... I'm not sure
this is one of those times.
I'm sure there are standards for numeric types like <int> too... but
using namespaces for that seems like overkill.

But if someone wants to supply patches that can optionally enable
sticking in schema, namespaces, etc, w/o significant impact to the
default, that's OK too.  Or perhaps a custom response writer that uses
namespaces for every single type for those who want that.

-Yonik
http://www.lucidimagination.com

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Grant,

My replies inline as well:

>>> 
>>> Discussion points:
>>> 1. If there are standard namespaces, then people can use them to do fun XML
>>> things
>> 
>> +1. This includes things like validation,
> 
> Yeah, but the rest of Solr's response doesn't have it, so...
> 

You mean the rest of SOLR's default response and the components that add to
it. I can, arbitrarily, as a user of SOLR, introduce as many inline xmlns
attributes (and thus declare arbitrary number of namespaces) as I want,
there is nothing that precludes me from doing so was my point.

>>> 3. The indexing side doesn't support them, so it seems odd to put in
>>> something
>>> like <field name="point">55.3 27.9</field> and get back <georss:point
>>> name="point"> 55.3 27.9</georss:point>.  At the same time, it seems equally
>>> weird to get back <str name="point">...</str> when there is in fact more
>>> semantic information available about this particular field that would
>>> otherwise require more work by an application to make sense of.
>> 
>> You got it. I'm not sure why it seems weird -- the translation from
>> docs/fields to external representation (via response writers or field type
>> representation) is one of the benefits of SOLR IMHO.
> 
> It's weird b/c no XML type was specified upfront, but a type was given out on
> the back end.  It's not a show stopper or anything, just an interesting point,
> I think.

I actually disagree with this. FieldTypes, if we agree on a data type
representation, e.g., georss point format, or line format, etc., define
their XML representation. So, if we have a FieldType of type georss:point,
then a type _is_ given up front, it's just defined in the standard that
defines the field element.

Imagine if you wanted to standardize on something like dublin core, for
titles, formats, etc. SOLR expects a fairly simple XML structure (Documents,
with Fields, with attributes), but the advantage of SOLR over traditional
Lucene is that via FieldTypes, you can understand what the true type of the
field you are indexing is. In other words, we can say in a schema file that
e.g., this incoming title is DublinCore, so its field type is
solr.DublinCoreAuthor, which inside of the FieldType definition, tells us
how to go from the given representation to the index reprsentation
(#toINternal) and subsequently tells us how to go from the index
representation to the external representation (#toExternal).

I'm not advocating for change SOLR's input doc format for indexing -- I'm
arguing that what you guys have done is actually a great idea. Having
FieldTypes and SolrInputDocuments as separate, allows each to involve
independently of one another, but the same time, be brought back together
for the purpose of e.g., validation, (see the lat/lon validation I did in
the attached patch), response writing (for plugging into external tools),
and for representation in the Lucene index outside of plain ol' Strings.

> 
>> 
>>> 4. If we let in other namespaces, we then are opening ourselves to longer
>>> responses, etc.  It is also likely the case that there isn't just one
>>> standard.  This likely could mean slower responses, etc.
>> 
>> How does adding in some characters (e.g., an "ns" tag and an associated URL)
>> add anything other than noise? We're talking the difference between O(n)
>> versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you
>> introduce 10, 000 namespaces, well, that's on you, and be prepared for
>> slower client/server interactions.
> 
> You'd be surprised how slow XML parsing often is, especially for larger
> responses, XML processing can be quite expensive and most of the information
> in verbose at best.   I've seen this on a number of occasions and it is why we
> switched to a binary response format in SolrJ and why I think all clients
> should speak the binary protocol.

Sure, XML parsing can be slow, but from your point above, you guys have
standardized on using a binary request/response format in things like SolrJ,
so what does the XML have to do this with anyways and why performance a
concern then? In the case where people want XML, in their particular format,
it's up to them to parse (and in most cases, if they are outputting a
format, there's likely already readers/etc. that exist for that format,
where things like optimizations can be delegated to).

On the other hand, let's consider XSLT, which is a big performance hit as
well, in many cases, more of a hit than simply outputting XML with the
namespaces inline. Also, let's quality this. I'm not saying we should make
SOLR's default response (and all its Components that add to the response) be
forced to use namespaces. However, it should definitely not be precluded.

> 
> 
>> 
>>> 5. If people wanted them, they could just do XSLT, but that is an extra step
>>> too.
>> 
>> Yep, that's an extra step, and it's not explicit, like the patch I attached
>> is. I tried to take advantage of one of SOLR's extension points in the
>> architecture to explicitly tie a representation of a Field to its external
>> and internal representation (aka, the point of a FieldType, no?)
>>> 
>>> An alternative is that we could refactor things a bit and allow the
>>> FieldType
>>> to specify the tag name instead of it being hardcoded in the writers.  This
>>> way people writing FieldTypes could define them.  For instance, we could
>>> have
>>> FieldType.getTagName() that could be overridden and clients could have tools
>>> for introspecting this.
>> 
>> This is basically what I did right? I did an inline namespace using a
>> variant of #writePrm in XMLWriter (#writeCdata) and had the
>> FieldType#toExternal method set the tag name, which is allowed by the API.
> 
> As Hoss' points out on the thread, I think the longer term goal seems to be to
> be more agnostic of the FieldType, so this would argue against my proposal.

My opinion is that if you've got all of this flow and logic going through
FieldType (which makes a lot of sense IMHO, see my comments on that same
thread), which is similar e.g., to what we see in databases, etc.., it
actually makes a lot of sense. So, I would be +1 for your proposal, but as I
mentioned your proposal is already possible (as shown in this patch). There
is just not explicit API like you suggested to do so, with the method
signatures that you proposed.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by Grant Ingersoll <gs...@apache.org>.

Inline...

On Dec 9, 2009, at 9:33 AM, Mattmann, Chris A (388J) wrote:

> Hi Grant, and others,
> 
> My 2 cents (and of course I'm bias having prepared the patch):
> 
>> In SOLR-1586, the proposed patch introduces the concept that a Solr response
>> can declare a namespace for part of the response (in this case, it is using
>> the tags defined by georss.org to specify a point, etc.).
> 
> The patch doesn't introduce this concept -- it makes use of it.
> XMLWriter#writePrim took care of that for me, see Hostetter's comment:
> 
> http://www.lucidimagination.com/search/document/be6fb7ce53c2922d/jira_create
> d_solr_1592_refactor_xmlwriter_starttag_to_allow_arbitrary_attributes_to_be_
> writ
> 
> 
> Since that method is public, anyone could have done this in the past, they
> just chose not to. Moreover, they chose not to in the committed source for
> SOLR, but others who took SOLR, prepared their own XML response writers,
> etc., may have done this same thing as well.
> 
>> 
>> Discussion points:
>> 1. If there are standard namespaces, then people can use them to do fun XML
>> things
> 
> +1. This includes things like validation,

Yeah, but the rest of Solr's response doesn't have it, so...

> strong typing (see SOLR-912 for
> others who also believe that the NamedList BagOfObjects structure, while
> robust, introduces type confusion when unraveling the response), and
> plugging in to other tools. Imagine a GIS tool that required a
> "georss:point" to be returned back somehow. You could argue XSLT could do
> this, but as you note below, it's an extra step. It also _implicitly_ ties
> the representation and typing of a FieldType to something that isn't really
> tied to a field type at all (an XSLT file?)

Agreed.

> 
>> 2. If we allow them, we get all of the other benefits of namespaces...
> 
> For sure -- see above for some examples.
> 
>> 3. The indexing side doesn't support them, so it seems odd to put in something
>> like <field name="point">55.3 27.9</field> and get back <georss:point
>> name="point"> 55.3 27.9</georss:point>.  At the same time, it seems equally
>> weird to get back <str name="point">...</str> when there is in fact more
>> semantic information available about this particular field that would
>> otherwise require more work by an application to make sense of.
> 
> You got it. I'm not sure why it seems weird -- the translation from
> docs/fields to external representation (via response writers or field type
> representation) is one of the benefits of SOLR IMHO.

It's weird b/c no XML type was specified upfront, but a type was given out on the back end.  It's not a show stopper or anything, just an interesting point, I think.

> 
>> 4. If we let in other namespaces, we then are opening ourselves to longer
>> responses, etc.  It is also likely the case that there isn't just one
>> standard.  This likely could mean slower responses, etc.
> 
> How does adding in some characters (e.g., an "ns" tag and an associated URL)
> add anything other than noise? We're talking the difference between O(n)
> versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you
> introduce 10, 000 namespaces, well, that's on you, and be prepared for
> slower client/server interactions.

You'd be surprised how slow XML parsing often is, especially for larger responses, XML processing can be quite expensive and most of the information in verbose at best.   I've seen this on a number of occasions and it is why we switched to a binary response format in SolrJ and why I think all clients should speak the binary protocol.


> 
>> 5. If people wanted them, they could just do XSLT, but that is an extra step
>> too.
> 
> Yep, that's an extra step, and it's not explicit, like the patch I attached
> is. I tried to take advantage of one of SOLR's extension points in the
> architecture to explicitly tie a representation of a Field to its external
> and internal representation (aka, the point of a FieldType, no?)
>> 
>> An alternative is that we could refactor things a bit and allow the FieldType
>> to specify the tag name instead of it being hardcoded in the writers.  This
>> way people writing FieldTypes could define them.  For instance, we could have
>> FieldType.getTagName() that could be overridden and clients could have tools
>> for introspecting this.
> 
> This is basically what I did right? I did an inline namespace using a
> variant of #writePrm in XMLWriter (#writeCdata) and had the
> FieldType#toExternal method set the tag name, which is allowed by the API.

As Hoss' points out on the thread, I think the longer term goal seems to be to be more agnostic of the FieldType, so this would argue against my proposal.

-Grant

Re: Namespaces in response (SOLR-1586)

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.

Hi Grant, and others,

My 2 cents (and of course I'm bias having prepared the patch):

> In SOLR-1586, the proposed patch introduces the concept that a Solr response
> can declare a namespace for part of the response (in this case, it is using
> the tags defined by georss.org to specify a point, etc.).

The patch doesn't introduce this concept -- it makes use of it.
XMLWriter#writePrim took care of that for me, see Hostetter's comment:

http://www.lucidimagination.com/search/document/be6fb7ce53c2922d/jira_create
d_solr_1592_refactor_xmlwriter_starttag_to_allow_arbitrary_attributes_to_be_
writ


Since that method is public, anyone could have done this in the past, they
just chose not to. Moreover, they chose not to in the committed source for
SOLR, but others who took SOLR, prepared their own XML response writers,
etc., may have done this same thing as well.

> 
> Discussion points:
> 1. If there are standard namespaces, then people can use them to do fun XML
> things

+1. This includes things like validation, strong typing (see SOLR-912 for
others who also believe that the NamedList BagOfObjects structure, while
robust, introduces type confusion when unraveling the response), and
plugging in to other tools. Imagine a GIS tool that required a
"georss:point" to be returned back somehow. You could argue XSLT could do
this, but as you note below, it's an extra step. It also _implicitly_ ties
the representation and typing of a FieldType to something that isn't really
tied to a field type at all (an XSLT file?)

> 2. If we allow them, we get all of the other benefits of namespaces...

For sure -- see above for some examples.

> 3. The indexing side doesn't support them, so it seems odd to put in something
> like <field name="point">55.3 27.9</field> and get back <georss:point
> name="point"> 55.3 27.9</georss:point>.  At the same time, it seems equally
> weird to get back <str name="point">...</str> when there is in fact more
> semantic information available about this particular field that would
> otherwise require more work by an application to make sense of.

You got it. I'm not sure why it seems weird -- the translation from
docs/fields to external representation (via response writers or field type
representation) is one of the benefits of SOLR IMHO.

> 4. If we let in other namespaces, we then are opening ourselves to longer
> responses, etc.  It is also likely the case that there isn't just one
> standard.  This likely could mean slower responses, etc.

How does adding in some characters (e.g., an "ns" tag and an associated URL)
add anything other than noise? We're talking the difference between O(n)
versus O(n+20) here. Also it's perfectly legit IMHO to say, well if you
introduce 10, 000 namespaces, well, that's on you, and be prepared for
slower client/server interactions.

> 5. If people wanted them, they could just do XSLT, but that is an extra step
> too.

Yep, that's an extra step, and it's not explicit, like the patch I attached
is. I tried to take advantage of one of SOLR's extension points in the
architecture to explicitly tie a representation of a Field to its external
and internal representation (aka, the point of a FieldType, no?)
> 
> An alternative is that we could refactor things a bit and allow the FieldType
> to specify the tag name instead of it being hardcoded in the writers.  This
> way people writing FieldTypes could define them.  For instance, we could have
> FieldType.getTagName() that could be overridden and clients could have tools
> for introspecting this.

This is basically what I did right? I did an inline namespace using a
variant of #writePrm in XMLWriter (#writeCdata) and had the
FieldType#toExternal method set the tag name, which is allowed by the API.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Namespaces in response (SOLR-1586)

Posted by "Ramirez, Paul M (388J)" <pa...@jpl.nasa.gov>.

Hey All,


 1.  Namespaces are fun especially when you have some target format you are trying to work towards. Many target formats use namespaces extensively so having the ability to map to them on the back end (response) would be great. This does not mean that Solr would have to utilize namespaces at all and supporting them internally is a different issue. I think that was the spirit of the original patch.
 2.  From what I'm gathering this is a discussion of whether Solr supports them internally. Hopefully, there is a differentiation between internal/external namespace usage with Solr.
 3.  Why must the response dictate what is done internally within Solr?
 4.  Internally it would seem that these are just string mappings and how much impact would there really be to writing out the response?
 5.  If the shift is just to have them use XSLT my guess would be that would cause a slower response than direct mappings. This is solely my opinion as I have not done any tests but NamedList -> XML -> XSLT would seem logically slower than NamedList-> (mapped) XML

Thanks,
Paul Ramirez


On 12/9/09 5:30 AM, "Grant Ingersoll" <gs...@apache.org> wrote:

In SOLR-1586, the proposed patch introduces the concept that a Solr response can declare a namespace for part of the response (in this case, it is using the tags defined by georss.org to specify a point, etc.).  I'm not sure what to make of this.  My gut reaction says no, but I'm not a namespace expert and I also don't feel strongly about it.

Discussion points:
1. If there are standard namespaces, then people can use them to do fun XML things
2. If we allow them, we get all of the other benefits of namespaces...
3. The indexing side doesn't support them, so it seems odd to put in something like <field name="point">55.3 27.9</field> and get back <georss:point name="point"> 55.3 27.9</georss:point>.  At the same time, it seems equally weird to get back <str name="point">...</str> when there is in fact more semantic information available about this particular field that would otherwise require more work by an application to make sense of.
4. If we let in other namespaces, we then are opening ourselves to longer responses, etc.  It is also likely the case that there isn't just one standard.  This likely could mean slower responses, etc.
5. If people wanted them, they could just do XSLT, but that is an extra step too.

An alternative is that we could refactor things a bit and allow the FieldType to specify the tag name instead of it being hardcoded in the writers.  This way people writing FieldTypes could define them.  For instance, we could have FieldType.getTagName() that could be overridden and clients could have tools for introspecting this.

I'm not sure what effect any of this would have on downstream clients, either.

Thoughts?

-Grant