You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ja...@continental-corporation.com on 2017/12/01 08:55:23 UTC

Passing Metadata from an RTF-file via TIKA to SOLR ...

Hi there!
I am quite new to Lucene/Solr/Tika, etc., so I would appreciate you help 
concerning the following matter.


I have a RTF-document, that I want to index in Solr, using Tika.
The RTF-indexing works in general, but since I changed the Solr-schema, 
the indexer complains about missing mandatory fields, like "module-id".
The rtf-file is generated by me and I added the metadata-fields to the 
RTF-document in the "userprops"-section of the RTF-file (see below) -- so 
Tika should be able to read it and to provide it.

The problem is: I don't know HOW or WHERE Tika provides this metadata, so 
I don't know how to access it. As a result, I don't know how I can map it 
to the respective Solr-fields, like "module-id", that are mandatory in my 
Solr-schema.

Can someone give me a hint, please? 
I am running out of ideas here ... :-/


<RTF-file>

{\rtf1\fbidis\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fnil\fcharset0 
Arial;}}
{\colortbl ;\red0\green0\blue0;}
        {\userprops
                {\propname module-id}\proptype30{\staticval 000ba8a6}
        }
}


Mit freundlichen Grüßen/ With kind regards

Jan Schluchtmann
Systems Engineering Cluster Instruments
VW Group
Continental Automotive GmbH
Division Interior
ID S3 RM
VDO-Strasse 1, 64832 Babenhausen, Germany

Telefon/Phone: +49 6073 12-4346
Telefax: +49 6073 12-79-4346

Re: Metadata passed with CURL (via literal) is not recognized by SOLR ...?

Posted by Ja...@continental-corporation.com.
Ok, I found the solution myself.

Reason for this behaviour was the "lowernames = true"-configuration of the 
Tika-requestHandler, that transformed the "module-id" to "module_id". 
I added a fitting copyField to my schema and it seems to work now.


Maybe, this information is useful for someone ... of course, it is 
mentioned the manual, but finding it is the problem, if you don't know, 
what you are looking for. ;)


Regards
Jan



Mit freundlichen Grüßen/ With kind regards

Jan Schluchtmann
Systems Engineering Cluster Instruments
VW Group
Continental Automotive GmbH
Division Interior
ID S3 RM
VDO-Strasse 1, 64832 Babenhausen, Germany

Telefon/Phone: +49 6073 12-4346
Telefax: +49 6073 12-79-4346



Von:    Jan.Christopher.Schluchtmann-EXT@continental-corporation.com
An:     solr-user@lucene.apache.org, 
Datum:  05.12.2017 11:02
Betreff:        Metadata passed with CURL (via literal) is not recognized 
by SOLR ...?



Hi!
I am trying to index RTF-files by uploading them to the Solr-Server with 
CURL.
I am trying to pass the required metadata by the 
"literal.<key>=<value>"-statement.


The "id" and the "module-id" are mandatory in my schema.
The "id" is recognized correctly, as one can see in the Solr-response 
"doc=48a0xxx" ... but the "module-id" seems to be neglected.

Why is that?


Thanks in advance!!!



Here is the CURL-command I pass via Windows 10 Powershell:

SOLR-REQUEST:

curl.exe "
http://localhost:8983/solr/ContiReqManCore/update/extract/?commit=true&literal.id=48a04d8e5da651c5-000ba8a6-1&literal.project-id=000d8181&literal.project-name=FPK_Medium_19S1&literal.project-path=%2FFPK_Medium_19S1&literal.module-id=000ba8a6&literal.module-name=PVVTS_Functional_FPK_Medium_19S1&literal.module-path=%2FFPK_Medium_19S1%2F02_Quality%2F10_Verification-Validation%2FPVVTS_Functional_FPK_Medium_19S1&literal.module-prefix=PVVTS_Funct_&literal.object-id=1

" -F "object-ole=@D:\(...)\PVVTS_Funct_263.rtf"


SOLR-RESPONSE:

{
  "responseHeader":{
    "status":400,
    "QTime":7},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"[doc=48a04d8e5da651c5-000ba8a6-1] missing required field: 
module-id",
    "code":400}
}


Mit freundlichen Grüßen/ With kind regards

Jan Schluchtmann
Systems Engineering Cluster Instruments
VW Group
Continental Automotive GmbH
Division Interior
ID S3 RM
VDO-Strasse 1, 64832 Babenhausen, Germany

Telefon/Phone: +49 6073 12-4346
Telefax: +49 6073 12-79-4346

Metadata passed with CURL (via literal) is not recognized by SOLR ...?

Posted by Ja...@continental-corporation.com.
Hi!
I am trying to index RTF-files by uploading them to the Solr-Server with 
CURL.
I am trying to pass the required metadata by the 
"literal.<key>=<value>"-statement.


The "id" and the "module-id" are mandatory in my schema.
The "id" is recognized correctly, as one can see in the Solr-response 
"doc=48a0xxx" ... but the "module-id" seems to be neglected.

Why is that?


Thanks in advance!!!



Here is the CURL-command I pass via Windows 10 Powershell:

SOLR-REQUEST:

curl.exe "
http://localhost:8983/solr/ContiReqManCore/update/extract/?commit=true&literal.id=48a04d8e5da651c5-000ba8a6-1&literal.project-id=000d8181&literal.project-name=FPK_Medium_19S1&literal.project-path=%2FFPK_Medium_19S1&literal.module-id=000ba8a6&literal.module-name=PVVTS_Functional_FPK_Medium_19S1&literal.module-path=%2FFPK_Medium_19S1%2F02_Quality%2F10_Verification-Validation%2FPVVTS_Functional_FPK_Medium_19S1&literal.module-prefix=PVVTS_Funct_&literal.object-id=1
" -F "object-ole=@D:\(...)\PVVTS_Funct_263.rtf"


SOLR-RESPONSE:

{
  "responseHeader":{
    "status":400,
    "QTime":7},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"[doc=48a04d8e5da651c5-000ba8a6-1] missing required field: 
module-id",
    "code":400}
}


Mit freundlichen Grüßen/ With kind regards

Jan Schluchtmann
Systems Engineering Cluster Instruments
VW Group
Continental Automotive GmbH
Division Interior
ID S3 RM
VDO-Strasse 1, 64832 Babenhausen, Germany

Telefon/Phone: +49 6073 12-4346
Telefax: +49 6073 12-79-4346