You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Jana, Kumar Raja" <kj...@ptc.com> on 2008/12/06 10:05:28 UTC
Adding External Metadata to pdf document
Hi,
I need to add some external metadata along with the documents I send to
ExtractingRequestHandler. Can someone please tell me how do i achieve
this?
E.g. Say I need to index the file abc.pdf. I want to add some more
additional information to the metadata such as Category = Alphabets,
Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
Thanks,
Kumar
RE: Adding External Metadata to pdf document
Posted by "Jana, Kumar Raja" <kj...@ptc.com>.
Hi Grant,
Yeah, I've noticed the commit yesterday. Great!!! Now I need not check
for updates on the patch anymore.
Now that it has been integrated, I suppose it will be a good time to
develop an API for sending Documents to Solr. Something similar to
sending a SolrInputDocument with doc.add(field) kind of methods.
Please let me know if someone has already started this. I'll be more
than happy to help.
-Kumar
-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org]
Sent: Monday, December 08, 2008 10:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding External Metadata to pdf document
On Dec 8, 2008, at 2:26 AM, Jana, Kumar Raja wrote:
>
> Hi Grant,
>
> Thanks for the help. It has solved my problem.
>
Cool. In case you didn't see Solr Cell is now committed.
> Is there any example Solrj code to send a document to Solr Cell using
> the right ContentHandlers? I've tried to understand the Test class and
> code it on similar lines but am totally lost!!! :(
I haven't tried it, but you should be able to create a ContentStream and
add it to the SolrRequest object.
-Grant
Re: Adding External Metadata to pdf document
Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 8, 2008, at 2:26 AM, Jana, Kumar Raja wrote:
>
> Hi Grant,
>
> Thanks for the help. It has solved my problem.
>
Cool. In case you didn't see Solr Cell is now committed.
> Is there any example Solrj code to send a document to Solr Cell using
> the right ContentHandlers? I've tried to understand the Test class and
> code it on similar lines but am totally lost!!! :(
I haven't tried it, but you should be able to create a ContentStream
and add it to the SolrRequest object.
-Grant
RE: Adding External Metadata to pdf document
Posted by "Jana, Kumar Raja" <kj...@ptc.com>.
Hi Grant,
Thanks for the help. It has solved my problem.
Is there any example Solrj code to send a document to Solr Cell using
the right ContentHandlers? I've tried to understand the Test class and
code it on similar lines but am totally lost!!! :(
Thanks,
Kumar
-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org]
Sent: Saturday, December 06, 2008 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding External Metadata to pdf document
Hi Kumar,
Wow, a brave soul trying out Solr Cell (aka the
ExtractingRequestHandler) already! Cool!
To add in external metadata, you can pass in literal parameters, as in:
In your example, you could do something like:
&ext.literal.Category=Alphabets&ext.literal.Catalog_ID=1213123
This will literally add the value "Alphabets" to the Category field, and
likewise 1213123 to the Catalog_ID field.
See
http://wiki.apache.org/solr/ExtractingRequestHandler#head-88b9f55989c987
8638e88be5d335b5126550f87c
On Dec 6, 2008, at 4:05 AM, Jana, Kumar Raja wrote:
> Hi,
>
> I need to add some external metadata along with the documents I send
> to ExtractingRequestHandler. Can someone please tell me how do i
> achieve this?
>
> E.g. Say I need to index the file abc.pdf. I want to add some more
> additional information to the metadata such as Category = Alphabets,
> Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
>
> Thanks,
> Kumar
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Adding External Metadata to pdf document
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Kumar,
Wow, a brave soul trying out Solr Cell (aka the
ExtractingRequestHandler) already! Cool!
To add in external metadata, you can pass in literal parameters, as in:
In your example, you could do something like:
&ext.literal.Category=Alphabets&ext.literal.Catalog_ID=1213123
This will literally add the value "Alphabets" to the Category field,
and likewise 1213123 to the Catalog_ID field.
See http://wiki.apache.org/solr/ExtractingRequestHandler#head-88b9f55989c9878638e88be5d335b5126550f87c
On Dec 6, 2008, at 4:05 AM, Jana, Kumar Raja wrote:
> Hi,
>
> I need to add some external metadata along with the documents I send
> to
> ExtractingRequestHandler. Can someone please tell me how do i achieve
> this?
>
> E.g. Say I need to index the file abc.pdf. I want to add some more
> additional information to the metadata such as Category = Alphabets,
> Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
>
> Thanks,
> Kumar
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ