You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by alessio crisantemi <al...@gmail.com> on 2012/02/09 23:45:05 UTC

indexing with DIH (and with problems)

hi all,
I would index on solr my pdf files wich includeds on my directory c:\myfile\

so, I add on my solr/conf directory the file data-config.xml like the
following:


<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="f" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor"
baseDir="c:\myfile\" fileName="*.pdf"
recursive="true">
<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text">
<field column="author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
 <field column="content_type" name="content_type" meta="true"/>
</entity>
</entity>
</document>
</dataConfig>

before, I add this part into solr-config.xml:


<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">c:\solr\conf\data-config.xml</str>
    </lst>
  </requestHandler>


but this is the result:

....
* * <str name="*command*">*delta-import*</str>
 * * <str name="*status*">*idle*</str>
 * * <str name="*importResponse*" />
 *-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#>
<lst name="*statusMessages*">
 * * <str name="*Time Elapsed*">*0:0:2.512*</str>
 * * <str name="*Total Requests made to DataSource*">*0*</str>
 * * <str name="*Total Rows Fetched*">*0*</str>
 * * <str name="*Total Documents Processed*">*0*</str>
 * * <str name="*Total Documents Skipped*">*0*</str>
 * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
 * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
 * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
* * </lst>
 * * <str name="*WARNING*">*This response format is experimental. It is
likely to change in the future.*</str>
* * </response>

suggestions?
thanks
alessio

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
with rootEntity="false" it's the same..
help!
2012/2/10 Chantal Ackermann <ch...@btelligent.de>

>
>
> On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote:
> > hi all,
> > I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
> >
> > so, I add on my solr/conf directory the file data-config.xml like the
> > following:
> >
> >
> > <dataConfig>
> > <dataSource type="BinFileDataSource" />
> > <document>
> > <entity name="f" dataSource="null" rootEntity="false"
>
> Why do you set rootEntity="false" on the root entity?
> This looks odd to me - but I can be wrong, of course.
>
> If DIH shows this:
> """
> <str name="*Total Requests made to DataSource*">*0*</str>
> """
>
> DIH hasn't even retrieved any data from you data source. Check that the
> call you have configured really returns any documents.
>
>
> Chantal
>
>
>
>
> > processor="FileListEntityProcessor"
> > baseDir="c:\myfile\" fileName="*.pdf"
> > recursive="true">
> > <entity name="tika-test" processor="TikaEntityProcessor"
> > url="${f.fileAbsolutePath}" format="text">
> > <field column="author" name="author" meta="true"/>
> > <field column="title" name="title" meta="true"/>
> >  <field column="content_type" name="content_type" meta="true"/>
> > </entity>
> > </entity>
> > </document>
> > </dataConfig>
> >
> > before, I add this part into solr-config.xml:
> >
> >
> > <requestHandler name="/dataimport"
> > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >     <lst name="defaults">
> >       <str name="config">c:\solr\conf\data-config.xml</str>
> >     </lst>
> >   </requestHandler>
> >
> >
> > but this is the result:
> >
> > ....
> > * * <str name="*command*">*delta-import*</str>
> >  * * <str name="*status*">*idle*</str>
> >  * * <str name="*importResponse*" />
> >  *-*<
> http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#
> >
> > <lst name="*statusMessages*">
> >  * * <str name="*Time Elapsed*">*0:0:2.512*</str>
> >  * * <str name="*Total Requests made to DataSource*">*0*</str>
> >  * * <str name="*Total Rows Fetched*">*0*</str>
> >  * * <str name="*Total Documents Processed*">*0*</str>
> >  * * <str name="*Total Documents Skipped*">*0*</str>
> >  * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
> >  * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
> >  * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
> > * * </lst>
> >  * * <str name="*WARNING*">*This response format is experimental. It is
> > likely to change in the future.*</str>
> > * * </response>
> >
> > suggestions?
> > thanks
> > alessio
>
>

Re: indexing with DIH (and with problems)

Posted by Chantal Ackermann <ch...@btelligent.de>.

On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote:
> hi all,
> I would index on solr my pdf files wich includeds on my directory c:\myfile\
> 
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
> 
> 
> <dataConfig>
> <dataSource type="BinFileDataSource" />
> <document>
> <entity name="f" dataSource="null" rootEntity="false"

Why do you set rootEntity="false" on the root entity?
This looks odd to me - but I can be wrong, of course.

If DIH shows this:
"""
<str name="*Total Requests made to DataSource*">*0*</str>
"""

DIH hasn't even retrieved any data from you data source. Check that the
call you have configured really returns any documents.


Chantal




> processor="FileListEntityProcessor"
> baseDir="c:\myfile\" fileName="*.pdf"
> recursive="true">
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
> <field column="author" name="author" meta="true"/>
> <field column="title" name="title" meta="true"/>
>  <field column="content_type" name="content_type" meta="true"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
> 
> before, I add this part into solr-config.xml:
> 
> 
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">c:\solr\conf\data-config.xml</str>
>     </lst>
>   </requestHandler>
> 
> 
> but this is the result:
> 
> ....
> * * <str name="*command*">*delta-import*</str>
>  * * <str name="*status*">*idle*</str>
>  * * <str name="*importResponse*" />
>  *-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#>
> <lst name="*statusMessages*">
>  * * <str name="*Time Elapsed*">*0:0:2.512*</str>
>  * * <str name="*Total Requests made to DataSource*">*0*</str>
>  * * <str name="*Total Rows Fetched*">*0*</str>
>  * * <str name="*Total Documents Processed*">*0*</str>
>  * * <str name="*Total Documents Skipped*">*0*</str>
>  * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
>  * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
>  * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
> * * </lst>
>  * * <str name="*WARNING*">*This response format is experimental. It is
> likely to change in the future.*</str>
> * * </response>
> 
> suggestions?
> thanks
> alessio


Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
Here is a stack:
SEVERE: Full Import failed
org.apache.solr.handler.
dataimport.DataImportHandlerException: Unable to load En
tityProcessor implementation for entity:9946435225838 Processing Document #
1
at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessor(DocB
uilder.java:576)
.....
.....
Caused by: org.apache.solr.common.SolrException: Error loading class
'TikaEntity
Processor'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.
java:375)
.....
.....
Caused by: java.lang.ClassNotFoundException: TikaEntityProcessor
at java.net.URLClassLoader$1.run(Unknown Source)
.....

why?
Tu

2012/2/10 Gora Mohanty <go...@mimirtech.com>

> On 10 February 2012 04:15, alessio crisantemi
> <al...@gmail.com> wrote:
> > hi all,
> > I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
> >
> > so, I add on my solr/conf directory the file data-config.xml like the
> > following:
> [...]
>
> > but this is the result:
> [...]
>
> Your Solr URL for dataimport looks a little odd: You seem to be
> doing a delta-import. Normally, one would start with a full import:
> http://solr-host:port/solr/dataimport?command=full-import
>
> Have you looked in the Solr logs for the cause of the exception?
> Please share that with us.
>
> Regards,
> Gora
>

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
the last version:

with this data-config

<dataConfig>
 <dataSource type="BinFileDataSource" />
 <document>
  <entity
    name="tika-test"
    processor="FileListEntityProcessor"
    baseDir="D:\gioconews_archivio\marzo2011"
    fileName=".*pdf"
    recursive="true"
    rootEntity="false"
    dataSource="null"/>
  <entity processor="FileListEntityProcessor"
url="D:\gioconews_archivio\marzo2011" format="text" >
   <field column="author"  name="author" meta="true"/>
   <field column="title" name="title" meta="true"/>
     <field column="description" name="description" />
     <field column="comments" name="comments" />
     <field column="content_type" name="content_type" />
     <field column="last_modified" name="last_modified" />
  </entity>
 </document>
</dataConfig>

I obtain this result:

 <str name="*command*">*full-import*</str>
  <str name="*status*">*idle*</str>
  <str name="*importResponse*" />
 -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
<lst name="*statusMessages*">
  <str name="*Time Elapsed*">*0:0:2.44*</str>
  <str name="*Total Requests made to DataSource*">*0*</str>
  <str name="*Total Rows Fetched*">*43*</str>
  <str name="*Total Documents Skipped*">*0*</str>
  <str name="*Full Dump Started*">*2012-02-12 19:06:00*</str>
  <str name="**">*Indexing failed. Rolled back all changes.*</str>
  <str name="*Rolledback*">*2012-02-12 19:06:00*</str>
 </lst>

suggestions?
thank you
a.
2012/2/12 alessio crisantemi <al...@gmail.com>

> sorry for the confusion:
>
> I forgotted a part of code:
>  <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
>
> Withouth this part, The result is the same of previous mail.
>
> If I add this raw, the results is:
>
> -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
> <lst name="*statusMessages*">
>   <str name="*Time Elapsed*">*0:0:1.79*</str>
>   <str name="*Total Requests made to DataSource*">*0*</str>
>   <str name="*Total Rows Fetched*">*1*</str>
>   <str name="*Total Documents Processed*">*0*</str>
>   <str name="*Total Documents Skipped*">*0*</str>
>   <str name="*Full Dump Started*">*2012-02-12 18:20:49*</str>
>   <str name="**">*Indexing failed. Rolled back all changes.*</str>
>   <str name="*Rolledback*">*2012-02-12 18:20:49*</str>
>  </lst>
>
> help!
> ty
> alessio
>
>
> 2012/2/12 alessio crisantemi <al...@gmail.com>
>
>> Hi,
>> Now, my DIH run but maybe only partly
>>
>> I indexing a directory containing 43 pdf files.
>> follow, the reply of my FUll-import command:
>>
>>  <str name="*command*">*full-import*</str>
>>   <str name="*status*">*idle*</str>
>>   <str name="*importResponse*" />
>>  -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
>> <lst name="*statusMessages*">
>>   <str name="*Total Requests made to DataSource*">*0*</str>
>>   <str name="*Total Rows Fetched*">*43*</str>
>>   <str name="*Total Documents Skipped*">*0*</str>
>>   <str name="*Full Dump Started*">*2012-02-12 17:39:10*</str>
>>   <str name="*Total Documents Processed*">*0*</str>
>>   <str name="*Time taken*">*0:0:0.78*</str>
>>   </lst>
>>   <str name="*WARNING*">*This response format is experimental. It is
>> likely to change in the future.*</str>
>>   </response>
>>
>>
>> It's like if my handler see my directory like a 'list of title', I
>> suppose, and not like a series of documents.
>>
>> Is true? And above all: WHY!?!?
>> please, Help me!
>> thank you
>> alessio
>>
>>
>>
>> PS: follow my data-config.xl file: may be is here the problem..
>>
>> <dataConfig>
>>  <dataSource name="dsFiles"
>>   type="FileDataSource"
>>   encoding="UTF-8"/>
>>  <document>
>>   <entity
>>     name="f"
>>     processor="FileListEntityProcessor"
>>     baseDir="D:\gioconews_archivio\marzo2011"
>>     fileName=".*pdf"
>>     recursive="true"
>>     rootEntity="false"
>>     dataSource="null">
>>
>> <entity name="tika-test" processor="TikaEntityProcessor"
>> url="${f.fileAbsolutePath}" format="text">
>>     <field column="author"  name="author" />
>>    <field column="title" name="title" />
>>          <field column="subject" name="subject" />
>>      <field column="description" name="description" />
>>      <field column="comments" name="comments" />
>>      <field column="category" name="categoru" />
>>      <field column="content_type" name="content_type" />
>>      <field column="last_modified" name="last_modified" />
>>   </entity>
>>   </entity>
>>
>>  </document>
>> </dataConfig>
>>
>> 2012/2/12 alessio crisantemi <al...@gmail.com>
>>
>>> Dear Shawn,
>>> thanks for your reply.
>>> but my contrib directory of Solr 3.5 dont' contain this .jar files
>>> (apache-solr-dataimporthandler-3.5-SNAPSHOT.jar and
>>> apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar)
>>>
>>> I have only apache-solr-dataimporthandler-3.5.jar and
>>> apache-solr-dataimporthandler-extras-3.5.jar, so, WITHOUTH 'snapshot'.
>>> Why? Where I can download this jar files?
>>> a.
>>>
>>> 2012/2/12 Shawn Heisey <so...@elyograg.org>
>>>
>>>> On 2/11/2012 4:33 AM, alessio crisantemi wrote:
>>>>
>>>>> dear all,
>>>>> I update my solr at 3.5 version but now I have this problem:
>>>>>
>>>>> Grave: Full Import failed
>>>>> org.apache.solr.handler.**dataimport.**DataImportHandlerException:
>>>>> java.lang.NoSuchMethodError:
>>>>>
>>>>
>>>> The data import handler has always been a contrib module, but it used
>>>> to be actually included in the .war file.  That has been changed, now it's
>>>> in separate jar files.
>>>>
>>>> When you downloaded or compiled 3.5.0, the dist directory should have
>>>> contained dataimporthandler and dataimporthandler-extras jar files.  Mine,
>>>> which I have compiled myself from the 3.5 svn branch, are named the
>>>> following:
>>>>
>>>> apache-solr-dataimporthandler-**3.5-SNAPSHOT.jar
>>>> apache-solr-dataimporthandler-**extras-3.5-SNAPSHOT.jar
>>>>
>>>> At minimum, put the first jar file in a lib folder referenced in your
>>>> solrconfig.xml file.  I couldn't tell you whether you'll need the -extras
>>>> file as well, you'll have to experiment.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>
>>>
>>
>

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
sorry for the confusion:

I forgotted a part of code:
 <entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text">

Withouth this part, The result is the same of previous mail.

If I add this raw, the results is:

-<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
<lst name="*statusMessages*">
  <str name="*Time Elapsed*">*0:0:1.79*</str>
  <str name="*Total Requests made to DataSource*">*0*</str>
  <str name="*Total Rows Fetched*">*1*</str>
  <str name="*Total Documents Processed*">*0*</str>
  <str name="*Total Documents Skipped*">*0*</str>
  <str name="*Full Dump Started*">*2012-02-12 18:20:49*</str>
  <str name="**">*Indexing failed. Rolled back all changes.*</str>
  <str name="*Rolledback*">*2012-02-12 18:20:49*</str>
 </lst>

help!
ty
alessio


2012/2/12 alessio crisantemi <al...@gmail.com>

> Hi,
> Now, my DIH run but maybe only partly
>
> I indexing a directory containing 43 pdf files.
> follow, the reply of my FUll-import command:
>
>  <str name="*command*">*full-import*</str>
>   <str name="*status*">*idle*</str>
>   <str name="*importResponse*" />
>  -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
> <lst name="*statusMessages*">
>   <str name="*Total Requests made to DataSource*">*0*</str>
>   <str name="*Total Rows Fetched*">*43*</str>
>   <str name="*Total Documents Skipped*">*0*</str>
>   <str name="*Full Dump Started*">*2012-02-12 17:39:10*</str>
>   <str name="*Total Documents Processed*">*0*</str>
>   <str name="*Time taken*">*0:0:0.78*</str>
>   </lst>
>   <str name="*WARNING*">*This response format is experimental. It is
> likely to change in the future.*</str>
>   </response>
>
>
> It's like if my handler see my directory like a 'list of title', I
> suppose, and not like a series of documents.
>
> Is true? And above all: WHY!?!?
> please, Help me!
> thank you
> alessio
>
>
>
> PS: follow my data-config.xl file: may be is here the problem..
>
> <dataConfig>
>  <dataSource name="dsFiles"
>   type="FileDataSource"
>   encoding="UTF-8"/>
>  <document>
>   <entity
>     name="f"
>     processor="FileListEntityProcessor"
>     baseDir="D:\gioconews_archivio\marzo2011"
>     fileName=".*pdf"
>     recursive="true"
>     rootEntity="false"
>     dataSource="null">
>
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
>    <field column="author"  name="author" />
>    <field column="title" name="title" />
>          <field column="subject" name="subject" />
>      <field column="description" name="description" />
>      <field column="comments" name="comments" />
>      <field column="category" name="categoru" />
>      <field column="content_type" name="content_type" />
>      <field column="last_modified" name="last_modified" />
>   </entity>
>   </entity>
>
>  </document>
> </dataConfig>
>
> 2012/2/12 alessio crisantemi <al...@gmail.com>
>
>> Dear Shawn,
>> thanks for your reply.
>> but my contrib directory of Solr 3.5 dont' contain this .jar files
>> (apache-solr-dataimporthandler-3.5-SNAPSHOT.jar and
>> apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar)
>>
>> I have only apache-solr-dataimporthandler-3.5.jar and
>> apache-solr-dataimporthandler-extras-3.5.jar, so, WITHOUTH 'snapshot'.
>> Why? Where I can download this jar files?
>> a.
>>
>> 2012/2/12 Shawn Heisey <so...@elyograg.org>
>>
>>> On 2/11/2012 4:33 AM, alessio crisantemi wrote:
>>>
>>>> dear all,
>>>> I update my solr at 3.5 version but now I have this problem:
>>>>
>>>> Grave: Full Import failed
>>>> org.apache.solr.handler.**dataimport.**DataImportHandlerException:
>>>> java.lang.NoSuchMethodError:
>>>>
>>>
>>> The data import handler has always been a contrib module, but it used to
>>> be actually included in the .war file.  That has been changed, now it's in
>>> separate jar files.
>>>
>>> When you downloaded or compiled 3.5.0, the dist directory should have
>>> contained dataimporthandler and dataimporthandler-extras jar files.  Mine,
>>> which I have compiled myself from the 3.5 svn branch, are named the
>>> following:
>>>
>>> apache-solr-dataimporthandler-**3.5-SNAPSHOT.jar
>>> apache-solr-dataimporthandler-**extras-3.5-SNAPSHOT.jar
>>>
>>> At minimum, put the first jar file in a lib folder referenced in your
>>> solrconfig.xml file.  I couldn't tell you whether you'll need the -extras
>>> file as well, you'll have to experiment.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>
>

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
Hi,
Now, my DIH run but maybe only partly

I indexing a directory containing 43 pdf files.
follow, the reply of my FUll-import command:

 <str name="*command*">*full-import*</str>
 <str name="*status*">*idle*</str>
  <str name="*importResponse*" />
 -<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=full-import#>
<lst name="*statusMessages*">
  <str name="*Total Requests made to DataSource*">*0*</str>
  <str name="*Total Rows Fetched*">*43*</str>
  <str name="*Total Documents Skipped*">*0*</str>
  <str name="*Full Dump Started*">*2012-02-12 17:39:10*</str>
  <str name="*Total Documents Processed*">*0*</str>
  <str name="*Time taken*">*0:0:0.78*</str>
  </lst>
  <str name="*WARNING*">*This response format is experimental. It is likely
to change in the future.*</str>
  </response>


It's like if my handler see my directory like a 'list of title', I suppose,
and not like a series of documents.

Is true? And above all: WHY!?!?
please, Help me!
thank you
alessio



PS: follow my data-config.xl file: may be is here the problem..

<dataConfig>
 <dataSource name="dsFiles"
  type="FileDataSource"
  encoding="UTF-8"/>
 <document>
  <entity
    name="f"
    processor="FileListEntityProcessor"
    baseDir="D:\gioconews_archivio\marzo2011"
    fileName=".*pdf"
    recursive="true"
    rootEntity="false"
    dataSource="null">
<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text">
   <field column="author"  name="author" />
   <field column="title" name="title" />
         <field column="subject" name="subject" />
     <field column="description" name="description" />
     <field column="comments" name="comments" />
     <field column="category" name="categoru" />
     <field column="content_type" name="content_type" />
     <field column="last_modified" name="last_modified" />
  </entity>
  </entity>

 </document>
</dataConfig>

2012/2/12 alessio crisantemi <al...@gmail.com>

> Dear Shawn,
> thanks for your reply.
> but my contrib directory of Solr 3.5 dont' contain this .jar files
> (apache-solr-dataimporthandler-3.5-SNAPSHOT.jar and
> apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar)
>
> I have only apache-solr-dataimporthandler-3.5.jar and
> apache-solr-dataimporthandler-extras-3.5.jar, so, WITHOUTH 'snapshot'.
> Why? Where I can download this jar files?
> a.
>
> 2012/2/12 Shawn Heisey <so...@elyograg.org>
>
>> On 2/11/2012 4:33 AM, alessio crisantemi wrote:
>>
>>> dear all,
>>> I update my solr at 3.5 version but now I have this problem:
>>>
>>> Grave: Full Import failed
>>> org.apache.solr.handler.**dataimport.**DataImportHandlerException:
>>> java.lang.NoSuchMethodError:
>>>
>>
>> The data import handler has always been a contrib module, but it used to
>> be actually included in the .war file.  That has been changed, now it's in
>> separate jar files.
>>
>> When you downloaded or compiled 3.5.0, the dist directory should have
>> contained dataimporthandler and dataimporthandler-extras jar files.  Mine,
>> which I have compiled myself from the 3.5 svn branch, are named the
>> following:
>>
>> apache-solr-dataimporthandler-**3.5-SNAPSHOT.jar
>> apache-solr-dataimporthandler-**extras-3.5-SNAPSHOT.jar
>>
>> At minimum, put the first jar file in a lib folder referenced in your
>> solrconfig.xml file.  I couldn't tell you whether you'll need the -extras
>> file as well, you'll have to experiment.
>>
>> Thanks,
>> Shawn
>>
>>
>

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
Dear Shawn,
thanks for your reply.
but my contrib directory of Solr 3.5 dont' contain this .jar files
(apache-solr-dataimporthandler-3.5-SNAPSHOT.jar and
apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar)

I have only apache-solr-dataimporthandler-3.5.jar and
apache-solr-dataimporthandler-extras-3.5.jar, so, WITHOUTH 'snapshot'.
Why? Where I can download this jar files?
a.

2012/2/12 Shawn Heisey <so...@elyograg.org>

> On 2/11/2012 4:33 AM, alessio crisantemi wrote:
>
>> dear all,
>> I update my solr at 3.5 version but now I have this problem:
>>
>> Grave: Full Import failed
>> org.apache.solr.handler.**dataimport.**DataImportHandlerException:
>> java.lang.NoSuchMethodError:
>>
>
> The data import handler has always been a contrib module, but it used to
> be actually included in the .war file.  That has been changed, now it's in
> separate jar files.
>
> When you downloaded or compiled 3.5.0, the dist directory should have
> contained dataimporthandler and dataimporthandler-extras jar files.  Mine,
> which I have compiled myself from the 3.5 svn branch, are named the
> following:
>
> apache-solr-dataimporthandler-**3.5-SNAPSHOT.jar
> apache-solr-dataimporthandler-**extras-3.5-SNAPSHOT.jar
>
> At minimum, put the first jar file in a lib folder referenced in your
> solrconfig.xml file.  I couldn't tell you whether you'll need the -extras
> file as well, you'll have to experiment.
>
> Thanks,
> Shawn
>
>

Re: indexing with DIH (and with problems)

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/11/2012 4:33 AM, alessio crisantemi wrote:
> dear all,
> I update my solr at 3.5 version but now I have this problem:
>
> Grave: Full Import failed
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.NoSuchMethodError:

The data import handler has always been a contrib module, but it used to 
be actually included in the .war file.  That has been changed, now it's 
in separate jar files.

When you downloaded or compiled 3.5.0, the dist directory should have 
contained dataimporthandler and dataimporthandler-extras jar files.  
Mine, which I have compiled myself from the 3.5 svn branch, are named 
the following:

apache-solr-dataimporthandler-3.5-SNAPSHOT.jar
apache-solr-dataimporthandler-extras-3.5-SNAPSHOT.jar

At minimum, put the first jar file in a lib folder referenced in your 
solrconfig.xml file.  I couldn't tell you whether you'll need the 
-extras file as well, you'll have to experiment.

Thanks,
Shawn


Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
dear all,
I update my solr at 3.5 version but now I have this problem:

Grave: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
 at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
 at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
 at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.NoSuchMethodError:
org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:72)
 at
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:59)
 at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
 ... 5 more
feb 11, 2012 12:27:26 PM org.apache.solr.update.DirectUpdateHandler2
rollback
Informazioni: start rollback
feb 11, 2012 12:27:26 PM org.apache.solr.update.DirectUpdateHandler2
rollback
Informazioni: end_rollback
feb 11, 2012 12:27:27 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
Informazioni: Starting Full Import
feb 11, 2012 12:27:27 PM org.apache.solr.core.SolrCore execute
Informazioni: [] webapp=/solr path=/select
params={clean=false&commit=true&command=full-import&qt=/dataimport}
status=0 QTime=0
feb 11, 2012 12:27:27 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
Avvertenza: Unable to read: dataimport.properties
feb 11, 2012 12:27:28 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
Grave: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.NoSuchMethodError:
org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
 at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
 at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
 at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.NoSuchMethodError:
org.apache.solr.core.SolrResourceLoader.getClassLoader()Ljava/lang/ClassLoader;
 at
org.apache.solr.handler.dataimport.TikaEntityProcessor.firstInit(TikaEntityProcessor.java:72)
 at
org.apache.solr.handler.dataimport.EntityProcessorBase.init(EntityProcessorBase.java:59)
 at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319)
 at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
 ... 5 more
feb 11, 2012 12:27:28 PM org.apache.solr.update.DirectUpdateHandler2
rollback
Informazioni: start rollback
feb 11, 2012 12:27:28 PM org.apache.solr.update.DirectUpdateHandler2
rollback
Informazioni: end_rollback
I don't know..
suggestions?
best
a.

2012/2/10 Gora Mohanty <go...@mimirtech.com>

> On 10 February 2012 04:15, alessio crisantemi
> <al...@gmail.com> wrote:
> > hi all,
> > I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
> >
> > so, I add on my solr/conf directory the file data-config.xml like the
> > following:
> [...]
>
> > but this is the result:
> [...]
>
> Your Solr URL for dataimport looks a little odd: You seem to be
> doing a delta-import. Normally, one would start with a full import:
> http://solr-host:port/solr/dataimport?command=full-import
>
> Have you looked in the Solr logs for the cause of the exception?
> Please share that with us.
>
> Regards,
> Gora
>

Re: indexing with DIH (and with problems)

Posted by Gora Mohanty <go...@mimirtech.com>.
On 10 February 2012 04:15, alessio crisantemi
<al...@gmail.com> wrote:
> hi all,
> I would index on solr my pdf files wich includeds on my directory c:\myfile\
>
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
[...]

> but this is the result:
[...]

Your Solr URL for dataimport looks a little odd: You seem to be
doing a delta-import. Normally, one would start with a full import:
http://solr-host:port/solr/dataimport?command=full-import

Have you looked in the Solr logs for the cause of the exception?
Please share that with us.

Regards,
Gora

Re: indexing with DIH (and with problems)

Posted by alessio crisantemi <al...@gmail.com>.
I have problems with full import query.
no results.

I search in log files and after I write again..
tx
a.
2012/2/9 alessio crisantemi <al...@gmail.com>

> hi all,
> I would index on solr my pdf files wich includeds on my directory
> c:\myfile\
>
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
>
>
> <dataConfig>
> <dataSource type="BinFileDataSource" />
> <document>
> <entity name="f" dataSource="null" rootEntity="false"
> processor="FileListEntityProcessor"
> baseDir="c:\myfile\" fileName="*.pdf"
> recursive="true">
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
> <field column="author" name="author" meta="true"/>
> <field column="title" name="title" meta="true"/>
>  <field column="content_type" name="content_type" meta="true"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
>
> before, I add this part into solr-config.xml:
>
>
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">c:\solr\conf\data-config.xml</str>
>     </lst>
>   </requestHandler>
>
>
> but this is the result:
>
> ....
> * * <str name="*command*">*delta-import*</str>
>  * * <str name="*status*">*idle*</str>
>  * * <str name="*importResponse*" />
>  *-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#>
> <lst name="*statusMessages*">
>  * * <str name="*Time Elapsed*">*0:0:2.512*</str>
>  * * <str name="*Total Requests made to DataSource*">*0*</str>
>  * * <str name="*Total Rows Fetched*">*0*</str>
>  * * <str name="*Total Documents Processed*">*0*</str>
>  * * <str name="*Total Documents Skipped*">*0*</str>
>  * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
>  * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
>  * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
> * * </lst>
>  * * <str name="*WARNING*">*This response format is experimental. It is
> likely to change in the future.*</str>
>  * * </response>
>
> suggestions?
> thanks
> alessio
>