You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Walter Closenfleight <wa...@gmail.com> on 2011/01/10 16:50:02 UTC

Storing metadata from post parameters and XML

I'm very unclear on how to associate what I need to a Solr index entry.
Based on what I've read thus far, you can extract data from text files and
store that in a Solr document.

I have hundreds of thousands of documents in a database/svn type system.
When I index a file, it is likely going to be local to the filesystem and I
know the location it will take on in the database. So, when I index, I want
to provide a path that it can find it when someone else does a search.

123.xml may look like:

<mydoc>
<title>my title</title>
<para>Every foobar has its day</para>
<figure href="/abc/xxx.gif"><caption>My caption</caption>
</mydoc>

and the proprietary location I want it to be associated with is:

/abc/def/ghi/123.xml

So, when a user does a search for "foobar", it returns some information
about 123.xml but most importantly the location should be available.

I have yet to find (in the schema.xml or otherwise) where you can define
that path to store, and how you would pass along that parameter in the
indexing of that document.

Instead, from the examples I can find, including the book, you store fields
from your data into the index. In the book's examples (a music database),
searching for "Cherub Rock" returns a list of with their duration, track
name, album name, and artist. In other words, the full text data you
retrieve is the only information the search index has to offer.

Just for example, using the exampledocs post.jar, I'm envisioning something
like this:

java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
"xxx" -othermeta2 "zzz"

Then the Solr doc would look like:
<doc>
<field name="id">123</field>
<field name="dblocation">/abc/def/ghi/123.xml</field>
<field name="othermeta1">xxx</field>
<field name="othermeta2">zzz</field>
<field name="title">my title</field>
<field name="graphic">/abc/xxx.gif</field>
<field name="text">Every foobar has its day My caption</field>
</doc>

This way, when a user searches for foobar, they get item 123 back, review
the search result and if they decide that's the data they want, they can use
the dblocation field to retrieve the data for editing purposes (and then
re-index it following their edits).

I'm guessing I just haven't found the right terms yet to look into, as I'm
very new to this. Thanks for any direction you can provide. Also, if Solr
appears to be the wrong tool for what I need, let me know as well!

Thank you,
Walter

Re: Storing metadata from post parameters and XML

Posted by Erick Erickson <er...@gmail.com>.
I'm not quite sure whether your question is answered or not, so ignore me if
it is...

But I'm having trouble envisioning this part

"they can use
the dblocation field to retrieve the data for editing purposes (and then
re-index it following their edits)."

I'd never, ever, ever let a user edit the XML and re-post it. You're just
asking
for messed up data (I mean, nobody is really good enough to hand-edit
XML, and for sure random users aren't).

Somewhere, I suspect you'll have a program that the user interacts with that
handles this kind of thing, parsing the XML, presenting it in a format the
user
can't mess up, saving it away and re-indexing. It's pretty easy to use
something
like SolrJ to handle the interactions with solar part...

A common way is right at your step "and if they decide that's the data they
want",
which is often clicking a link in a browser. At that point, you launch your
own very
special program with enough meta-data to find the file to edit and provide a
front-end
to let them edit it under controlled circumstances. You can use SolrJ to
re-index
after the user is done.

Of course, I may well be waaaaayyyyy off base relative to your app...

Best
Erick

On Mon, Jan 10, 2011 at 11:22 AM, Walter Closenfleight <
walter.p.closenfleight@gmail.com> wrote:

> Stefan,
>
>
>
> You're right. I was attempting to post some quick pseudo-code, but that
> <doc/> is pretty misleading, they should have been <str> elements, like
> <str
> name="dblocation">/abc/def/ghi/123.xml</str>, or something to that affect.
>
>
>
> Thanks,
>
> Walter
>
>
> On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis <
> matheis.stefan@googlemail.com> wrote:
>
> > Hey Walter,
> >
> > what's against just putting your db-location in a 'string' field, and use
> > it
> > like any other value?
> > There is no special field-type for something like a
> > path/directory/location-information, afaik.
> >
> > Regards
> > Stefan
> >
> > On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
> > walter.p.closenfleight@gmail.com> wrote:
> >
> > > I'm very unclear on how to associate what I need to a Solr index entry.
> > > Based on what I've read thus far, you can extract data from text files
> > and
> > > store that in a Solr document.
> > >
> > > I have hundreds of thousands of documents in a database/svn type
> system.
> > > When I index a file, it is likely going to be local to the filesystem
> and
> > I
> > > know the location it will take on in the database. So, when I index, I
> > want
> > > to provide a path that it can find it when someone else does a search.
> > >
> > > 123.xml may look like:
> > >
> > > <mydoc>
> > > <title>my title</title>
> > > <para>Every foobar has its day</para>
> > > <figure href="/abc/xxx.gif"><caption>My caption</caption>
> > > </mydoc>
> > >
> > > and the proprietary location I want it to be associated with is:
> > >
> > > /abc/def/ghi/123.xml
> > >
> > > So, when a user does a search for "foobar", it returns some information
> > > about 123.xml but most importantly the location should be available.
> > >
> > > I have yet to find (in the schema.xml or otherwise) where you can
> define
> > > that path to store, and how you would pass along that parameter in the
> > > indexing of that document.
> > >
> > > Instead, from the examples I can find, including the book, you store
> > fields
> > > from your data into the index. In the book's examples (a music
> database),
> > > searching for "Cherub Rock" returns a list of with their duration,
> track
> > > name, album name, and artist. In other words, the full text data you
> > > retrieve is the only information the search index has to offer.
> > >
> > > Just for example, using the exampledocs post.jar, I'm envisioning
> > something
> > > like this:
> > >
> > > java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml"
> -othermeta1
> > > "xxx" -othermeta2 "zzz"
> > >
> > > Then the Solr doc would look like:
> > > <doc>
> > > <field name="id">123</field>
> > > <field name="dblocation">/abc/def/ghi/123.xml</field>
> > > <field name="othermeta1">xxx</field>
> > > <field name="othermeta2">zzz</field>
> > > <field name="title">my title</field>
> > > <field name="graphic">/abc/xxx.gif</field>
> > > <field name="text">Every foobar has its day My caption</field>
> > > </doc>
> > >
> > > This way, when a user searches for foobar, they get item 123 back,
> review
> > > the search result and if they decide that's the data they want, they
> can
> > > use
> > > the dblocation field to retrieve the data for editing purposes (and
> then
> > > re-index it following their edits).
> > >
> > > I'm guessing I just haven't found the right terms yet to look into, as
> > I'm
> > > very new to this. Thanks for any direction you can provide. Also, if
> Solr
> > > appears to be the wrong tool for what I need, let me know as well!
> > >
> > > Thank you,
> > > Walter
> > >
> >
>

Re: Storing metadata from post parameters and XML

Posted by Walter Closenfleight <wa...@gmail.com>.
Stefan,



You're right. I was attempting to post some quick pseudo-code, but that
<doc/> is pretty misleading, they should have been <str> elements, like <str
name="dblocation">/abc/def/ghi/123.xml</str>, or something to that affect.



Thanks,

Walter


On Mon, Jan 10, 2011 at 10:08 AM, Stefan Matheis <
matheis.stefan@googlemail.com> wrote:

> Hey Walter,
>
> what's against just putting your db-location in a 'string' field, and use
> it
> like any other value?
> There is no special field-type for something like a
> path/directory/location-information, afaik.
>
> Regards
> Stefan
>
> On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
> walter.p.closenfleight@gmail.com> wrote:
>
> > I'm very unclear on how to associate what I need to a Solr index entry.
> > Based on what I've read thus far, you can extract data from text files
> and
> > store that in a Solr document.
> >
> > I have hundreds of thousands of documents in a database/svn type system.
> > When I index a file, it is likely going to be local to the filesystem and
> I
> > know the location it will take on in the database. So, when I index, I
> want
> > to provide a path that it can find it when someone else does a search.
> >
> > 123.xml may look like:
> >
> > <mydoc>
> > <title>my title</title>
> > <para>Every foobar has its day</para>
> > <figure href="/abc/xxx.gif"><caption>My caption</caption>
> > </mydoc>
> >
> > and the proprietary location I want it to be associated with is:
> >
> > /abc/def/ghi/123.xml
> >
> > So, when a user does a search for "foobar", it returns some information
> > about 123.xml but most importantly the location should be available.
> >
> > I have yet to find (in the schema.xml or otherwise) where you can define
> > that path to store, and how you would pass along that parameter in the
> > indexing of that document.
> >
> > Instead, from the examples I can find, including the book, you store
> fields
> > from your data into the index. In the book's examples (a music database),
> > searching for "Cherub Rock" returns a list of with their duration, track
> > name, album name, and artist. In other words, the full text data you
> > retrieve is the only information the search index has to offer.
> >
> > Just for example, using the exampledocs post.jar, I'm envisioning
> something
> > like this:
> >
> > java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
> > "xxx" -othermeta2 "zzz"
> >
> > Then the Solr doc would look like:
> > <doc>
> > <field name="id">123</field>
> > <field name="dblocation">/abc/def/ghi/123.xml</field>
> > <field name="othermeta1">xxx</field>
> > <field name="othermeta2">zzz</field>
> > <field name="title">my title</field>
> > <field name="graphic">/abc/xxx.gif</field>
> > <field name="text">Every foobar has its day My caption</field>
> > </doc>
> >
> > This way, when a user searches for foobar, they get item 123 back, review
> > the search result and if they decide that's the data they want, they can
> > use
> > the dblocation field to retrieve the data for editing purposes (and then
> > re-index it following their edits).
> >
> > I'm guessing I just haven't found the right terms yet to look into, as
> I'm
> > very new to this. Thanks for any direction you can provide. Also, if Solr
> > appears to be the wrong tool for what I need, let me know as well!
> >
> > Thank you,
> > Walter
> >
>

Re: Storing metadata from post parameters and XML

Posted by Stefan Matheis <ma...@googlemail.com>.
Hey Walter,

what's against just putting your db-location in a 'string' field, and use it
like any other value?
There is no special field-type for something like a
path/directory/location-information, afaik.

Regards
Stefan

On Mon, Jan 10, 2011 at 4:50 PM, Walter Closenfleight <
walter.p.closenfleight@gmail.com> wrote:

> I'm very unclear on how to associate what I need to a Solr index entry.
> Based on what I've read thus far, you can extract data from text files and
> store that in a Solr document.
>
> I have hundreds of thousands of documents in a database/svn type system.
> When I index a file, it is likely going to be local to the filesystem and I
> know the location it will take on in the database. So, when I index, I want
> to provide a path that it can find it when someone else does a search.
>
> 123.xml may look like:
>
> <mydoc>
> <title>my title</title>
> <para>Every foobar has its day</para>
> <figure href="/abc/xxx.gif"><caption>My caption</caption>
> </mydoc>
>
> and the proprietary location I want it to be associated with is:
>
> /abc/def/ghi/123.xml
>
> So, when a user does a search for "foobar", it returns some information
> about 123.xml but most importantly the location should be available.
>
> I have yet to find (in the schema.xml or otherwise) where you can define
> that path to store, and how you would pass along that parameter in the
> indexing of that document.
>
> Instead, from the examples I can find, including the book, you store fields
> from your data into the index. In the book's examples (a music database),
> searching for "Cherub Rock" returns a list of with their duration, track
> name, album name, and artist. In other words, the full text data you
> retrieve is the only information the search index has to offer.
>
> Just for example, using the exampledocs post.jar, I'm envisioning something
> like this:
>
> java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
> "xxx" -othermeta2 "zzz"
>
> Then the Solr doc would look like:
> <doc>
> <field name="id">123</field>
> <field name="dblocation">/abc/def/ghi/123.xml</field>
> <field name="othermeta1">xxx</field>
> <field name="othermeta2">zzz</field>
> <field name="title">my title</field>
> <field name="graphic">/abc/xxx.gif</field>
> <field name="text">Every foobar has its day My caption</field>
> </doc>
>
> This way, when a user searches for foobar, they get item 123 back, review
> the search result and if they decide that's the data they want, they can
> use
> the dblocation field to retrieve the data for editing purposes (and then
> re-index it following their edits).
>
> I'm guessing I just haven't found the right terms yet to look into, as I'm
> very new to this. Thanks for any direction you can provide. Also, if Solr
> appears to be the wrong tool for what I need, let me know as well!
>
> Thank you,
> Walter
>