You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2016/05/02 14:08:35 UTC

RE: [jira] [Commented] (SOLR-8017) solr.PointType can't deal with coordination in format like (0.9504547, 1.0, 1.0890503)

>> so that means that using tika metadata indexing with schemaless mode 
> is, well, useless ?
Yes. 

>I know of nobody using "schemaless" for production for the simple >reason that it makes the best guess it can based on the _first_ time it >sees a particular field. There's absolutely no way to guarantee that that >doc is representative of all docs.
> And if you want to really get weird, some programs allow custom attributes.

Agreed. It makes no sense to go schemaless with Tika's metadata.

>In the Tika case you've also got the problem that there's no universal metadata definition. What's "author" >in one type of doc might be "editor" in another. Or "most_recent_edit" might be "last_edited" and even if >these are dates the format won't necessarily be the same.

We do try to normalize across file formats to Dublin Core when possible -- dc:creator, dc:created.  We also try to normalize date formats for those metadata items that we know are dates (dc:created, etc.).  If you find issues with normalization or can recommend areas for improvement, please do!



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org