You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Jana, Kumar Raja" <kj...@ptc.com> on 2008/12/06 10:05:28 UTC

Adding External Metadata to pdf document

Hi,
 
I need to add some external metadata along with the documents I send to
ExtractingRequestHandler. Can someone please tell me how do i achieve
this?
 
E.g. Say I need to index the file abc.pdf. I want to add some more
additional information to the metadata such as Category = Alphabets,
Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
 
Thanks,
Kumar

RE: Adding External Metadata to pdf document

Posted by "Jana, Kumar Raja" <kj...@ptc.com>.
Hi Grant,

Yeah, I've noticed the commit yesterday. Great!!! Now I need not check
for updates on the patch anymore.

Now that it has been integrated, I suppose it will be a good time to
develop an API for sending Documents to Solr. Something similar to
sending a SolrInputDocument with doc.add(field) kind of methods.
Please let me know if someone has already started this. I'll be more
than happy to help.

-Kumar

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Monday, December 08, 2008 10:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding External Metadata to pdf document


On Dec 8, 2008, at 2:26 AM, Jana, Kumar Raja wrote:

>
> Hi Grant,
>
> Thanks for the help. It has solved my problem.
>

Cool.  In case you didn't see Solr Cell is now committed.

> Is there any example Solrj code to send a document to Solr Cell using 
> the right ContentHandlers? I've tried to understand the Test class and

> code it on similar lines but am totally lost!!! :(

I haven't tried it, but you should be able to create a ContentStream and
add it to the SolrRequest object.

-Grant

Re: Adding External Metadata to pdf document

Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 8, 2008, at 2:26 AM, Jana, Kumar Raja wrote:

>
> Hi Grant,
>
> Thanks for the help. It has solved my problem.
>

Cool.  In case you didn't see Solr Cell is now committed.

> Is there any example Solrj code to send a document to Solr Cell using
> the right ContentHandlers? I've tried to understand the Test class and
> code it on similar lines but am totally lost!!! :(

I haven't tried it, but you should be able to create a ContentStream  
and add it to the SolrRequest object.

-Grant

RE: Adding External Metadata to pdf document

Posted by "Jana, Kumar Raja" <kj...@ptc.com>.
 Hi Grant,

Thanks for the help. It has solved my problem.

Is there any example Solrj code to send a document to Solr Cell using
the right ContentHandlers? I've tried to understand the Test class and
code it on similar lines but am totally lost!!! :(


Thanks,
Kumar



-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Saturday, December 06, 2008 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Adding External Metadata to pdf document

Hi Kumar,

Wow, a brave soul trying out Solr Cell (aka the  
ExtractingRequestHandler) already!   Cool!

To add in external metadata, you can pass in literal parameters, as in:

In your example, you could do something like:
&ext.literal.Category=Alphabets&ext.literal.Catalog_ID=1213123

This will literally add the value "Alphabets" to the Category field, and
likewise 1213123 to the Catalog_ID field.

See
http://wiki.apache.org/solr/ExtractingRequestHandler#head-88b9f55989c987
8638e88be5d335b5126550f87c

On Dec 6, 2008, at 4:05 AM, Jana, Kumar Raja wrote:

> Hi,
>
> I need to add some external metadata along with the documents I send 
> to ExtractingRequestHandler. Can someone please tell me how do i 
> achieve this?
>
> E.g. Say I need to index the file abc.pdf. I want to add some more 
> additional information to the metadata such as Category = Alphabets, 
> Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
>
> Thanks,
> Kumar

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Adding External Metadata to pdf document

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Kumar,

Wow, a brave soul trying out Solr Cell (aka the  
ExtractingRequestHandler) already!   Cool!

To add in external metadata, you can pass in literal parameters, as in:

In your example, you could do something like:
&ext.literal.Category=Alphabets&ext.literal.Catalog_ID=1213123

This will literally add the value "Alphabets" to the Category field,  
and likewise 1213123 to the Catalog_ID field.

See http://wiki.apache.org/solr/ExtractingRequestHandler#head-88b9f55989c9878638e88be5d335b5126550f87c

On Dec 6, 2008, at 4:05 AM, Jana, Kumar Raja wrote:

> Hi,
>
> I need to add some external metadata along with the documents I send  
> to
> ExtractingRequestHandler. Can someone please tell me how do i achieve
> this?
>
> E.g. Say I need to index the file abc.pdf. I want to add some more
> additional information to the metadata such as Category = Alphabets,
> Catalog_ID = 1213123, Owner = Mr. X, Date_of_Purchase = someday, etc.
>
> Thanks,
> Kumar

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ