You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2013/07/05 21:42:05 UTC

Re: [CONF] Apache Solr Reference Guide > Uploading Data with Index Handlers

The ASRG commit emails seem to be sending the whole confluence wiki page 
rather than just a diff like the old Solr wiki. Is that a tunable preference 
for confluence?

Thanks.

-- Jack Krupansky

-----Original Message----- 
From: Grant Ingersoll (Confluence)
Sent: Friday, July 05, 2013 2:54 PM
To: commits@lucene.apache.org
Subject: [CONF] Apache Solr Reference Guide > Uploading Data with Index 
Handlers

Space: Apache Solr Reference Guide 
(https://cwiki.apache.org/confluence/display/solr)
Page: Uploading Data with Index Handlers 
(https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers)


Edited by Grant Ingersoll:
---------------------------------------------------------------------
{section}
{column:width=75%}
Index Handlers are Update Handlers designed to add, delete and update 
documents to the index. Solr includes several of these to allow indexing 
documents in XML, CSV and JSON.

The example URLs given here reflect the handler configuration in the 
supplied {{solrconfig.xml}}. If the name associated with the handler is 
changed then the URLs will need to be modified. It is quite possible to 
access the same handler using more than one name, which can be useful if you 
wish to specify different sets of default options.

New {{UpdateProcessors}} now default to the {{uniqueKey}} field if it is the 
appropriate type for configured fields. The processors automatically add 
fields with new UUIDs and Timestamps to {{SolrInputDocuments}}. These work 
similarly to the <field default="..."/> option in {{schema.xml}}, but are 
applied in the {{UpdateProcessorChain}}. They may be used prior to other 
{{UpdateProcessors}}, or to generate a {{uniqueKey}} field value when using 
the {{DistributedUpdateProcessor}} (i.e., SolrCloud), 
{{TimestampUpdateProcessorFactory}}, {{UUIDUpdateProcessorFactory}}, and 
{{DefaultValueUpdateProcessorFactory}}.
{column}

{column:width=25%}
{panel}
Index Handlers covered in this section:
{toc:minLevel=2|maxLevel=2}
{panel}
{column}
{section}

h2. Combined UpdateRequestHandlers

For the separate XML, CSV, JSON, and javabin update request handlers 
explained below, Solr provides a single {{RequestHandler}}, and chooses the 
appropriate {{ContentStreamLoader}} based on the the {{Content-Type}} 
header, entered as the {{qt}} (query type) parameter matching the name of 
registered handlers. The "standard" request handler is the default and will 
be used if {{qt}} is not specified in the request.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
  <requestHandler name="standard" />
  <requestHandler name="custom" />
{code}

h3. Configuring Shard Handlers for Distributed Searches

Inside the RequestHandler, you can configure and specify the shard handler 
used for distributed search. You can also plug in custom shard handlers as 
well.

Configuring the standard handler, set up the configuration as in this 
example:

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="standard" default="true">
    <!-- other params go here -->
     <shardHandlerFactory>
        <int name="socketTimeOut">1000</int>
        <int name="connTimeOut">5000</int>
      </shardHandler>
  </requestHandler>
{code}

The parameters that can be specified are as follows:

|| Parameter || Default || Explanation ||
| socketTimeout | default: 0 (use OS default) | The amount of time in ms 
that a socket is allowed to wait |
| connTimeout | default: 0 (use OS default) | The amount of time in ms that 
is accepted for binding / connection a socket |
| maxConnectionsPerHost | default: 20 | The maximum number of connections 
that is made to each  individual shard in a distributed search |
| corePoolSize | default: 0 | The retained lowest limit on the number of 
threads used in coordinating distributed search |
| maximumPoolSize | default: Integer.MAX_VALUE | The maximum number of 
threads used for coordinating distributed search |
| maxThreadIdleTime | default: 5 seconds | The amount of time to wait for 
before threads are  scaled back in response to a reduction in load |
| sizeOfQueue | default: \-1 | If specified, the thread pool will use a 
backing queue instead of a direct handoff buffer.  High throughput systems 
will want to configure this to be a  direct hand off (with \-1). Systems 
that desire better latency will want to configure a reasonable size of queue 
to handle variations in requests. |
| fairnessPolicy | default: false | Chooses the JVM specifics dealing with 
fair policy  queuing. If enabled, distributed searches will be handled in a 
First in - First out method at a cost to throughput. If disabled, throughput 
will be favored over latency. |

{topofpage}
h2. XMLUpdateRequestHandler for XML-formatted Data

h3. Configuration

The default configuration file has the update request handler configured by 
default.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
{code}

h3. Adding Documents

Documents are added to the index by sending an XML message to the update 
handler.

The XML schema recognized by the update handler is very straightforward:

* The {{<add>}} element introduces one more more documents to be added.
* The {{<doc>}} element introduces the fields making up a document.
* The {{<field>}} element presents the content for a specific field.

For example:

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<add>
  <doc>
   <field name="authors">Patrick Eagar</field>
   <field name="subject">Sports</field>
   <field name="dd">796.35</field>
   <field name="numpages">128</field>
   <field name="desc"></field>
   <field name="price">12.40</field>
   <field name="title" boost="2.0">Summer of the all-rounder: Test and 
championship cricket in England 1982</field>
   <field name="isbn">0002166313</field>
   <field name="yearpub">1982</field>
   <field name="publisher">Collins</field>
  </doc>
  <doc boost="2.5">
  ...
  </doc>
</add>
{code}

If the document schema defines a  unique key, then an {{/update}} operation 
silently replaces a document in the index with the same unique key, unless 
the {{<add>}} element sets the {{allowDups}} attribute to {{true}}. If no 
unique key has been defined, indexing performance is somewhat faster, as no 
search has to be made for an existing document.

Each element has certain optional attributes which may be specified.

|| Command || Command Description || Optional Parameter || Parameter 
Description ||
| <add> | Introduces one or more documents to be added to the index. | 
commitWithin=_number_ | Add the document within the specified number of 
milliseconds |
| <doc> | Introduces the definition of a specific document. | boost=_float_ 
| Default is 1.0. Sets a boost value for the document.To learn more about 
boosting, see [Searching]. |
| <field> | Defines a field within a document. | boost=_float_ | Default is 
1.0. Sets a boost value for the field. |

{note}
Other optional parameters for {{<add>}}, including {{allowDups}}, 
{{overwritePending}}, and {{overwriteCommitted}}, are now deprecated. 
However, you can specify {{overwrite=false}} for XML updates to avoid 
overwriting.
{note}

h3. Commit and Optimize Operations

The {{<commit>}} operation writes all documents loaded since the last commit 
to one or more segment files on the disk. Before a commit has been issued, 
newly indexed content is not visible to searches. The commit operation opens 
a new searcher, and triggers any event listeners that have been configured.

Commits may be issued explicitly with a {{<commit/>}} message, and can also 
be triggered from {{<autocommit>}} parameters in {{solrconfig.xml}}.

The {{<optimize>}} operation requests Solr to merge internal data structures 
in order to improve search performance.  For a large index, optimization 
will take some time to complete, but by merging many small segment files 
into a larger one, search performance will improve. If you are using Solr's 
replication mechanism to distribute searches across many systems, be aware 
that after an optimize, a complete index will need to be transferred. In 
contrast, post-commit transfers are usually much smaller.

The {{<commit>}} and {{<optimize>}} elements accept these optional 
attributes:

|| Optional Attribute || Description ||
| maxSegments | Default is 1. Optimizes the index to include no more than 
this number of segments. |
| waitFlush | Default is true. Blocks until index changes are flushed to 
disk. |
| waitSearcher | Default is true. Blocks until a new searcher is opened and 
registered as the main query searcher, making the changes visible. |
| expungeDeletes | Default is false. Merges segments and removes deleted 
documents. |

Here are examples of <commit> and <optimize> using optional attributes:

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<commit waitFlush="false" waitSearcher="false"/>
<commit waitFlush="false" waitSearcher="false" expungeDeletes="true"/>
<optimize waitFlush="false" waitSearcher="false"/>
{code}

h3. Delete Operations

Documents can be deleted from the index in two ways. "Delete by ID" deletes 
the document with the specified ID, and can be used only if a UniqueID field 
has been defined in the schema. "Delete by Query" deletes all documents 
matching a specified query, although {{commitWithin}} is ignored for a 
Delete by Query. A single delete message can contain multiple delete 
operations.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<delete>
  <id>0002166313</id>
  <id>0031745983</id>
  <query>subject:sport</query>
  <query>publisher:penguin</query>
</delete>
{code}

h3. Rollback Operations

The rollback command rolls back all add and deletes made to the index since 
the last commit. It neither calls any event listeners nor creates a new 
searcher. Its syntax is simple: {{<rollback/>}}.

h3. Using {{curl}} to Perform Updates with the Update Request Handler.

You can use the {{curl}} utility to perform any of the above commands, using 
its {{\--data-binary}} option to append the XML message to the {{curl}} 
command, and generating a  HTTP POST request. For example:

{code:lang=xml|borderStyle=solid|borderColor=#666666}
curl http://localhost:8983/update -H "Content-Type: text/xml" --data-binary 
'
<add>
<doc>
  <field name="authors">Patrick Eagar</field>
  <field name="subject">Sports</field>
  <field name="dd">796.35</field>
  <field name="isbn">0002166313</field>
  <field name="yearpub">1982</field>
  <field name="publisher">Collins</field>
</doc>
</add>'
{code}

For posting XML messages contained in a file, you can use the alternative 
form:

{code:borderStyle=solid|borderColor=#666666}
curl http://localhost:8983/update -H "Content-Type: text/xml"
--data-binary @myfile.xml
{code}

Short requests can also be sent using a HTTP GET command, URL-encoding the 
request, as in the following. Note the escaping of "<" and ">":

{code:borderStyle=solid|borderColor=#666666}
curl http://localhost:8983/update?stream.body=%3Ccommit/%3E
{code}

Responses from Solr take the form shown here:

{code:borderStyle=solid|borderColor=#666666}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">127</int>
</lst>
</response>
{code}

The status field will be non-zero in case of failure. The servlet container 
will generate an appropriate HTML-formatted message in the case of an error 
at the HTTP layer.

h3. A Simple Cross-Platform Posting Tool

For demo purposes, the file {{$SOLR/example/exampledocs/post.jar}} includes 
a cross-platform Java tool for POST-ing XML documents. Open a window and 
run:

{code:borderStyle=solid|borderColor=#666666}
java -jar post.jar <list of files with  messages>
{code}

By default, this will contact the server at {{localhost:8983}}. The "-help" 
option outputs the following information on its usage:

{code:borderStyle=solid|borderColor=#666666}
SimplePostTool: version 1.2
{code}

This is a simple command line tool for POSTing raw XML to a Solr port.  XML 
data can be read from files specified as command line args; as raw 
commandline arg strings; or via STDIN.

Examples:

{code:borderStyle=solid|borderColor=#666666}
java -Ddata=files -jar post.jar *.xml
java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
java -Ddata=stdin -jar post.jar < hd.xml
{code}

Other options controlled by System Properties include the Solr URL to POST 
to, and whether a commit should be executed. These are the defaults for all 
System Properties.

{code:borderStyle=solid|borderColor=#666666}
-Ddata=files
-Durl=[http://localhost:8983/solr/update|http://localhost:8983/solr/update]
-Dcommit=yes
{code}

For more information about the XML Update Request Handler, see 
[https://wiki.apache.org/solr/UpdateXmlMessages].

{topofpage}
h2. XSLTRequestHandler to Transform XML Content

h3. Configuration

The default configuration file has the update request handler configured by 
default, although the "lazy load" flag is set.

The XSLTRequestHandler allows you to index any XML data with the [XML 
{{<tr>}} command|http://xmlstar.sourceforge.net/doc/UG/ch04s02.html]. You 
must have an XSLT stylesheet in the solr/conf/xslt directory that can 
transform the incoming data to the expected {{<add><doc/></add>}} format.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="/update/xslt" startup="lazy" 
class="solr.XsltUpdateRequestHandler"/>
{code}

Here is an example XSLT stylesheet:

{code:borderStyle=solid|borderColor=#666666}
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="1.0">
  <xsl:template match="/">
    <add>
      <xsl:apply-templates select="/random/document"/>
    </add>
  </xsl:template>

  <xsl:template match="document">

    <doc boost="5.5">
      <xsl:apply-templates select="*"/>
    </doc>
  </xsl:template>

  <xsl:template match="node">
    <field name="{@name}">
      <xsl:if test="@enhance!=''">
        <xsl:attribute name="boost"><xsl:value-of 
select="@enhance"/></xsl:attribute>
      </xsl:if>
      <xsl:value-of select="@value"/>
    </field>
  </xsl:template>

</xsl:stylesheet>
{code}

Attaching the stylesheet "updateXml.xsl" transforms a  search result to 
Solr's {{UpdateXml}} syntax. One example is to copy a Solr1.3 index (which 
does not have CSV response writer) into a format   which can be indexed into 
another Solr file (provided that all fields are stored):

{code}
http://localhost:8983/solr/select?q=*:*&wt=xslt&tr=updateXml.xsl&rows=1000
{code}

You can also use the stylesheet in {{XsltUpdateRequestHandler}} to transform 
an index when updating:

{code}
curl "http://localhost:8983/solr/update/xslt?commit=true&tr=updateX
{code}

{topofpage}

h2. CSVRequestHandler for CSV Content

h3. Configuration

The default configuration file has the update request handler configured by 
default, although the "lazy load" flag is set.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="/update/csv" class="solr.CSVRequestHandler" 
startup="lazy" />
{code}

h3. Parameters

The CSV handler allows the specification of many parameters in the URL in 
the form: 
{{f.}}{{{}{_}parameter{_}{}}}{{.}}{{{}{_}optional_fieldname{_}{}}}{{=}}{{{}{_}value{_}}}.

The table below describes the parameters for the update handler.

|| Parameter || Usage || Global (g) or Per Field (f) || Example ||
| separator | Character used as field separator; default is "," | g,(f: see 
split) | separator=% |
| trim | If true, remove leading and trailing whitespace from values. 
Default=false. | g,f | f.isbn.trim=true \\
trim=false |
| header | Set to true if first line of input contains field names. These 
will be used if the *field_name* parameter is absent. | g | |
| field_name | Comma separated list of field names to use when adding 
documents. | g | field_name=isbn,price,title |
| literal.<field_name> | Comma separated list of field names to use when 
processing literal values. | g | literal.color=red,blue,black |
| skip | Comma separated list of field names to skip. | g | 
skip=uninteresting,shoesize |
| skipLines | Number of lines to discard in the input stream before the CSV 
data starts, including the header, if present. Default=0. | g | skipLines=5 
|
| encapsulator | The character optionally used to surround values to 
preserve characters such as the CSV separator or whitespace. This standard 
CSV format handles the encapsulator itself appearing in an encapsulated 
value by doubling the encapsulator. | g,(f: see split) | encapsulator=" |
| escape | The character used for escaping CSV separators or other reserved 
characters. If an escape is specified, the encapsulator is not used unless 
also explicitly specified since most formats use either encapsulation or 
escaping, not both | g | escape=\ \\ |
| keepEmpty | Keep and index zero length (empty) fields. Default=false. | 
g,f | f.price.keepEmpty=true |
| map | Map one value to another. Format is value:replacement (which can be 
empty.) | g,f | map=left:right \\
f.subject.map=history:bunk |
| split | If true, split a field into multiple values by a separate parser. 
| f | |
| overwrite | If true (the default), check for and overwrite duplicate 
documents, based on the uniqueKey field declared in the Solr schema. If you 
know the documents you are indexing do not contain any duplicates then you 
may see a considerable speed up setting this to false. | g | |
| commit | Issues a commit after the data has been ingested. | g | |
| commitWithin | Add the document within the specified number of 
milliseconds. | g | commitWithin=10000 |
| rowid | Map the rowid (line number) to a field specified by the value of 
the parameter, for instance if your CSV doesn't have a unique key and you 
want to use the row id as such. | g | rowid=id|
| rowidOffset | Add the given offset (as an int) to the rowid before adding 
it to the document.  Default is 0 | g | rowidOffset=10|

For more information on the CSV Update Request Handler, see 
[https://wiki.apache.org/solr/UpdateCSV].

{topofpage}

h2. Using the JSONRequestHandler for JSON Content

JSON formatted update requests may be sent to Solr using the 
{{/solr/update/json}} URL. All of the normal methods for uploading content 
are supported.

h3. Configuration

The default configuration file has the update request handler configured by 
default, although the "lazy load" flag is set.

{code:lang=xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" 
startup="lazy" />
{code}

h3. Examples

There is a sample JSON file at {{example/exampledocs/books.json}} that you 
can use to add documents to the Solr example server.

{code:borderStyle=solid|borderColor=#666666}
cd example/exampledocs
curl 'http://localhost:8983/solr/update/json?commit=true'
--data-binary @books.json -H 'Content-type:application/json'
{code}

Adding {{commit=true}} to the URL makes the documents immediately 
searchable.

You should now be able to query for the newly added documents:

{{[http://localhost:8983/solr/select?q=title:monsters&wt=json&indent=true]}} 
returns:

{code:borderStyle=solid|borderColor=#666666}
{
  "responseHeader":{
    "status":0,
    "QTime":2,
    "params":{
      "indent":"true",
      "wt":"json",
      "q":"title:monsters"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"978-1423103349",
        "author":"Rick Riordan",
        "series_t":"Percy Jackson and the Olympians",
        "sequence_i":2,
        "genre_s":"fantasy",
        "inStock":true,
        "price":6.49,
        "pages_i":304,
        "title":[
          "The Sea of Monsters"],
        "cat":["book","paperback"]}]
  }
}
{code}

h3. Update Commands

The JSON update handler accepts all of the update commands that the XML 
update handler supports, through a straightforward mapping. Multiple 
commands may be contained in one message:

{code:borderStyle=solid|borderColor=#666666}
{
"add": {
  "doc": {
    "id": "DOC1",
    "my_boosted_field": {        /* use a map with boost/value for a boosted 
field */
      "boost": 2.3,
      "value": "test"
    },
    "my_multivalued_field": [ "aaa", "bbb" ]   /* use an array for a 
multi-valued field */
  }
},
"add": {
  "commitWithin": 5000,          /* commit this document within 5 seconds */
  "overwrite": false,            /* don't check for existing documents with 
the same uniqueKey */
  "boost": 3.45,                 /* a document boost */
  "doc": {
    "f1": "v1",
    "f1": "v2"
  }
},

"commit": {},
"optimize": { "waitFlush":false, "waitSearcher":false },

"delete": { "id":"ID" },         /* delete by ID */
"delete": { "query":"QUERY" }    /* delete by query */
}
{code}

{note}
Comments are not allowed JSON, but duplicate names are.
{note}

As with other update handlers, parameters such as {{commit}}, 
{{commitWithin}}, {{optimize}}, and {{overwrite}} may be specified in the 
URL instead of in the body of the message.

The JSON update format allows for a simple delete-by-id. The value of a 
{{delete}} can be an array which contains a list of zero or more specific 
document id's (not a range) to be deleted. For example:

{code:borderStyle=solid|borderColor=#666666}
"delete":"myid"
{code}

{code:borderStyle=solid|borderColor=#666666}
"delete":["id1","id2"]
{code}

The value of a "delete" can be an array   which contains a list of zero or 
more id's to be deleted. It is not a range (start and end).

You can also specify _version_&nbsp; with each "delete":
String str = "{'delete':'id':50, '_version_':12345}"
You can specify the version of deletes in the body of the update request as 
well.

For more information about the JSON Update Request Handler, see 
[https://wiki.apache.org/solr/UpdateJSON].

{topofpage}

h2. Updating Only Part of a Document

Solr supports several modifiers that atomically update values of a document.

|| Modifier || Usage ||
| set | Set or replace a particular value, or remove the value if null is 
specified as the new value. |
| add | Adds an additional value to a list. |
| inc | Increments a numeric value by a specific amount. |

{note} All original source fields must be stored for field modifiers to work 
correctly, which is the Solr default.
{note}

For example:

{code:borderStyle=solid|borderColor=#666666}
{"id":"mydoc", "f1"{"set":10}, "f2"{"add":20}}
{code}

This example results in field {{f1}} being set to "10", and field {{f2}} 
having an additional value of "20" added. All other existing fields from the 
original document remain unchanged.
{topofpage}

h2. Using SimplePostTool

This is a simple command line tool for POSTing raw data to a Solr port. 
Data can be read from files specified as command line arguments, as raw 
command line argument strings, or via {{STDIN}}.  Options controlled by 
System Properties include the Solr URL to post to, the {{Content-Type}} of 
the data, whether a commit or optimize should be executed, and whether the 
response should be written to {{STDOUT}}.   If {{auto=yes}} the tool will 
try to guess the type and set {{Content-Type}} and the URL automatically. 
When posting rich documents the file name will be propagated as 
{{resource.name}} and also used as {{literal.id}}. You may override these or 
any other request parameter through the {{\-Dparams}} property

Supported System Properties and their defaults:

| *Parameter* | *Values* | *Default* |
| \-Ddata | yes, no | default=files |
| \-Dtype | <content-type> | default=application/xml |
| \-Durl | <solr-update-url> | default=[http://localhost:8983/solr/update] |
| \-Dauto | yes, no | default=no |
| \-Drecursive | yes, no | default=no |
| \-Dfiletypes | <type>\[,<type>,..\] | default=xml, json, csv, pdf, doc, 
docx, ppt, pptx, xls, xlsx, odt, odp, ods, rtf, htm, html |
| \-Dparams | "<key>=<value>\[&<key>=<value>...\]" | values must be 
URL-encoded |
| \-Dcommit | yes, no | default=yes |
| \-Doptimize | yes, no | default=no |
| \-Dout | yes,no | default=no |

Examples:

{code:borderStyle=solid|borderColor=#666666}
  java -jar post.jar *.xml
  java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
  java -Ddata=stdin -jar post.jar < hd.xml
  java -Dtype=text/csv -jar post.jar *.csv
  java -Dtype=application/json -jar post.jar *.json
  java -Durl=[http://localhost:8983/solr/update/extract] -Dparams=literal.id=a
    -Dtype=application/pdf -jar post.jar a.pdf
  java -Dauto=yes -jar post.jar a.pdf
  java -Dauto=yes -Drecursive=yes -jar post.jar afolder
  java -Dauto=yes -Dfiletypes=ppt,html -jar post.jar afolder

{code}

In the above example:

| *\-Dauto=yes* | Will guess file type from file name suffix, and set type 
and url accordingly. It also sets the ID and file name automatically. |
| *\-Drecursive=yes* | Will recurse into sub-folders and index all files. |
| *\-Dfiletypes* | Specifies the file types to consider when indexing 
folders. |
| *\-Dparams* | HTTP GET params to add to the request, so you don't need to 
write the whole URL again. |

h2. Indexing Using SolrJ

Use of the the SolrJ client library is covered in the section on [solr:Using 
SolrJ].

{scrollbar}


Stop watching space: 
https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: 
https://cwiki.apache.org/confluence/users/editmyemailsettings.action



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [CONF] Apache Solr Reference Guide > Uploading Data with Index Handlers

Posted by Steve Rowe <sa...@gmail.com>.
Hoss set up a CWIKI account named lucene_pmc_notification_role to watch the ref guide and email commits@l.a.o when content changes.  See https://issues.apache.org/jira/browse/SOLR-4887 for some details.

Lucene PMC members have access to the credentials for this CWIKI role - I logged in as that account and looked at the email notification config.

In the currently installed version of Confluence at the ASF (v3.5.17), the option to include diffs instead of the full content is grayed out for text format emails.  We would have to first select HTML format emails in order to get diffs instead of full contents.  Not sure which is worse.  

Maybe Confluence v5.1, which ASF Infra plans on upgrading to eventually, has better options for this?

Steve

On Jul 5, 2013, at 3:42 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

> The ASRG commit emails seem to be sending the whole confluence wiki page rather than just a diff like the old Solr wiki. Is that a tunable preference for confluence?
> 
> Thanks.
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Grant Ingersoll (Confluence)
> Sent: Friday, July 05, 2013 2:54 PM
> To: commits@lucene.apache.org
> Subject: [CONF] Apache Solr Reference Guide > Uploading Data with Index Handlers
> 
> Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
> Page: Uploading Data with Index Handlers (https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers)
> 
> 
> Edited by Grant Ingersoll:
> ---------------------------------------------------------------------
> {section}
> {column:width=75%}
> Index Handlers are Update Handlers designed to add, delete and update documents to the index. Solr includes several of these to allow indexing documents in XML, CSV and JSON.
> 
> The example URLs given here reflect the handler configuration in the supplied {{solrconfig.xml}}. If the name associated with the handler is changed then the URLs will need to be modified. It is quite possible to access the same handler using more than one name, which can be useful if you wish to specify different sets of default options.
> 
> New {{UpdateProcessors}} now default to the {{uniqueKey}} field if it is the appropriate type for configured fields. The processors automatically add fields with new UUIDs and Timestamps to {{SolrInputDocuments}}. These work similarly to the <field default="..."/> option in {{schema.xml}}, but are applied in the {{UpdateProcessorChain}}. They may be used prior to other {{UpdateProcessors}}, or to generate a {{uniqueKey}} field value when using the {{DistributedUpdateProcessor}} (i.e., SolrCloud), {{TimestampUpdateProcessorFactory}}, {{UUIDUpdateProcessorFactory}}, and {{DefaultValueUpdateProcessorFactory}}.
> {column}
> 
> {column:width=25%}
> {panel}
> Index Handlers covered in this section:
> {toc:minLevel=2|maxLevel=2}
> {panel}
> {column}
> {section}
> 
> h2. Combined UpdateRequestHandlers
> 
> For the separate XML, CSV, JSON, and javabin update request handlers explained below, Solr provides a single {{RequestHandler}}, and chooses the appropriate {{ContentStreamLoader}} based on the the {{Content-Type}} header, entered as the {{qt}} (query type) parameter matching the name of registered handlers. The "standard" request handler is the default and will be used if {{qt}} is not specified in the request.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="standard" />
> <requestHandler name="custom" />
> {code}
> 
> h3. Configuring Shard Handlers for Distributed Searches
> 
> Inside the RequestHandler, you can configure and specify the shard handler used for distributed search. You can also plug in custom shard handlers as well.
> 
> Configuring the standard handler, set up the configuration as in this example:
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="standard" default="true">
>   <!-- other params go here -->
>    <shardHandlerFactory>
>       <int name="socketTimeOut">1000</int>
>       <int name="connTimeOut">5000</int>
>     </shardHandler>
> </requestHandler>
> {code}
> 
> The parameters that can be specified are as follows:
> 
> || Parameter || Default || Explanation ||
> | socketTimeout | default: 0 (use OS default) | The amount of time in ms that a socket is allowed to wait |
> | connTimeout | default: 0 (use OS default) | The amount of time in ms that is accepted for binding / connection a socket |
> | maxConnectionsPerHost | default: 20 | The maximum number of connections that is made to each  individual shard in a distributed search |
> | corePoolSize | default: 0 | The retained lowest limit on the number of threads used in coordinating distributed search |
> | maximumPoolSize | default: Integer.MAX_VALUE | The maximum number of threads used for coordinating distributed search |
> | maxThreadIdleTime | default: 5 seconds | The amount of time to wait for before threads are  scaled back in response to a reduction in load |
> | sizeOfQueue | default: \-1 | If specified, the thread pool will use a backing queue instead of a direct handoff buffer.  High throughput systems will want to configure this to be a  direct hand off (with \-1). Systems that desire better latency will want to configure a reasonable size of queue to handle variations in requests. |
> | fairnessPolicy | default: false | Chooses the JVM specifics dealing with fair policy  queuing. If enabled, distributed searches will be handled in a First in - First out method at a cost to throughput. If disabled, throughput will be favored over latency. |
> 
> {topofpage}
> h2. XMLUpdateRequestHandler for XML-formatted Data
> 
> h3. Configuration
> 
> The default configuration file has the update request handler configured by default.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
> {code}
> 
> h3. Adding Documents
> 
> Documents are added to the index by sending an XML message to the update handler.
> 
> The XML schema recognized by the update handler is very straightforward:
> 
> * The {{<add>}} element introduces one more more documents to be added.
> * The {{<doc>}} element introduces the fields making up a document.
> * The {{<field>}} element presents the content for a specific field.
> 
> For example:
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <add>
> <doc>
>  <field name="authors">Patrick Eagar</field>
>  <field name="subject">Sports</field>
>  <field name="dd">796.35</field>
>  <field name="numpages">128</field>
>  <field name="desc"></field>
>  <field name="price">12.40</field>
>  <field name="title" boost="2.0">Summer of the all-rounder: Test and championship cricket in England 1982</field>
>  <field name="isbn">0002166313</field>
>  <field name="yearpub">1982</field>
>  <field name="publisher">Collins</field>
> </doc>
> <doc boost="2.5">
> ...
> </doc>
> </add>
> {code}
> 
> If the document schema defines a  unique key, then an {{/update}} operation silently replaces a document in the index with the same unique key, unless the {{<add>}} element sets the {{allowDups}} attribute to {{true}}. If no unique key has been defined, indexing performance is somewhat faster, as no search has to be made for an existing document.
> 
> Each element has certain optional attributes which may be specified.
> 
> || Command || Command Description || Optional Parameter || Parameter Description ||
> | <add> | Introduces one or more documents to be added to the index. | commitWithin=_number_ | Add the document within the specified number of milliseconds |
> | <doc> | Introduces the definition of a specific document. | boost=_float_ | Default is 1.0. Sets a boost value for the document.To learn more about boosting, see [Searching]. |
> | <field> | Defines a field within a document. | boost=_float_ | Default is 1.0. Sets a boost value for the field. |
> 
> {note}
> Other optional parameters for {{<add>}}, including {{allowDups}}, {{overwritePending}}, and {{overwriteCommitted}}, are now deprecated. However, you can specify {{overwrite=false}} for XML updates to avoid overwriting.
> {note}
> 
> h3. Commit and Optimize Operations
> 
> The {{<commit>}} operation writes all documents loaded since the last commit to one or more segment files on the disk. Before a commit has been issued, newly indexed content is not visible to searches. The commit operation opens a new searcher, and triggers any event listeners that have been configured.
> 
> Commits may be issued explicitly with a {{<commit/>}} message, and can also be triggered from {{<autocommit>}} parameters in {{solrconfig.xml}}.
> 
> The {{<optimize>}} operation requests Solr to merge internal data structures in order to improve search performance.  For a large index, optimization will take some time to complete, but by merging many small segment files into a larger one, search performance will improve. If you are using Solr's replication mechanism to distribute searches across many systems, be aware that after an optimize, a complete index will need to be transferred. In contrast, post-commit transfers are usually much smaller.
> 
> The {{<commit>}} and {{<optimize>}} elements accept these optional attributes:
> 
> || Optional Attribute || Description ||
> | maxSegments | Default is 1. Optimizes the index to include no more than this number of segments. |
> | waitFlush | Default is true. Blocks until index changes are flushed to disk. |
> | waitSearcher | Default is true. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible. |
> | expungeDeletes | Default is false. Merges segments and removes deleted documents. |
> 
> Here are examples of <commit> and <optimize> using optional attributes:
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <commit waitFlush="false" waitSearcher="false"/>
> <commit waitFlush="false" waitSearcher="false" expungeDeletes="true"/>
> <optimize waitFlush="false" waitSearcher="false"/>
> {code}
> 
> h3. Delete Operations
> 
> Documents can be deleted from the index in two ways. "Delete by ID" deletes the document with the specified ID, and can be used only if a UniqueID field has been defined in the schema. "Delete by Query" deletes all documents matching a specified query, although {{commitWithin}} is ignored for a Delete by Query. A single delete message can contain multiple delete operations.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <delete>
> <id>0002166313</id>
> <id>0031745983</id>
> <query>subject:sport</query>
> <query>publisher:penguin</query>
> </delete>
> {code}
> 
> h3. Rollback Operations
> 
> The rollback command rolls back all add and deletes made to the index since the last commit. It neither calls any event listeners nor creates a new searcher. Its syntax is simple: {{<rollback/>}}.
> 
> h3. Using {{curl}} to Perform Updates with the Update Request Handler.
> 
> You can use the {{curl}} utility to perform any of the above commands, using its {{\--data-binary}} option to append the XML message to the {{curl}} command, and generating a  HTTP POST request. For example:
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> curl http://localhost:8983/update -H "Content-Type: text/xml" --data-binary '
> <add>
> <doc>
> <field name="authors">Patrick Eagar</field>
> <field name="subject">Sports</field>
> <field name="dd">796.35</field>
> <field name="isbn">0002166313</field>
> <field name="yearpub">1982</field>
> <field name="publisher">Collins</field>
> </doc>
> </add>'
> {code}
> 
> For posting XML messages contained in a file, you can use the alternative form:
> 
> {code:borderStyle=solid|borderColor=#666666}
> curl http://localhost:8983/update -H "Content-Type: text/xml"
> --data-binary @myfile.xml
> {code}
> 
> Short requests can also be sent using a HTTP GET command, URL-encoding the request, as in the following. Note the escaping of "<" and ">":
> 
> {code:borderStyle=solid|borderColor=#666666}
> curl http://localhost:8983/update?stream.body=%3Ccommit/%3E
> {code}
> 
> Responses from Solr take the form shown here:
> 
> {code:borderStyle=solid|borderColor=#666666}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">127</int>
> </lst>
> </response>
> {code}
> 
> The status field will be non-zero in case of failure. The servlet container will generate an appropriate HTML-formatted message in the case of an error at the HTTP layer.
> 
> h3. A Simple Cross-Platform Posting Tool
> 
> For demo purposes, the file {{$SOLR/example/exampledocs/post.jar}} includes a cross-platform Java tool for POST-ing XML documents. Open a window and run:
> 
> {code:borderStyle=solid|borderColor=#666666}
> java -jar post.jar <list of files with  messages>
> {code}
> 
> By default, this will contact the server at {{localhost:8983}}. The "-help" option outputs the following information on its usage:
> 
> {code:borderStyle=solid|borderColor=#666666}
> SimplePostTool: version 1.2
> {code}
> 
> This is a simple command line tool for POSTing raw XML to a Solr port.  XML data can be read from files specified as command line args; as raw commandline arg strings; or via STDIN.
> 
> Examples:
> 
> {code:borderStyle=solid|borderColor=#666666}
> java -Ddata=files -jar post.jar *.xml
> java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
> java -Ddata=stdin -jar post.jar < hd.xml
> {code}
> 
> Other options controlled by System Properties include the Solr URL to POST to, and whether a commit should be executed. These are the defaults for all System Properties.
> 
> {code:borderStyle=solid|borderColor=#666666}
> -Ddata=files
> -Durl=[http://localhost:8983/solr/update|http://localhost:8983/solr/update]
> -Dcommit=yes
> {code}
> 
> For more information about the XML Update Request Handler, see [https://wiki.apache.org/solr/UpdateXmlMessages].
> 
> {topofpage}
> h2. XSLTRequestHandler to Transform XML Content
> 
> h3. Configuration
> 
> The default configuration file has the update request handler configured by default, although the "lazy load" flag is set.
> 
> The XSLTRequestHandler allows you to index any XML data with the [XML {{<tr>}} command|http://xmlstar.sourceforge.net/doc/UG/ch04s02.html]. You must have an XSLT stylesheet in the solr/conf/xslt directory that can transform the incoming data to the expected {{<add><doc/></add>}} format.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="/update/xslt" startup="lazy" class="solr.XsltUpdateRequestHandler"/>
> {code}
> 
> Here is an example XSLT stylesheet:
> 
> {code:borderStyle=solid|borderColor=#666666}
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
> <xsl:template match="/">
>   <add>
>     <xsl:apply-templates select="/random/document"/>
>   </add>
> </xsl:template>
> 
> <xsl:template match="document">
> 
>   <doc boost="5.5">
>     <xsl:apply-templates select="*"/>
>   </doc>
> </xsl:template>
> 
> <xsl:template match="node">
>   <field name="{@name}">
>     <xsl:if test="@enhance!=''">
>       <xsl:attribute name="boost"><xsl:value-of select="@enhance"/></xsl:attribute>
>     </xsl:if>
>     <xsl:value-of select="@value"/>
>   </field>
> </xsl:template>
> 
> </xsl:stylesheet>
> {code}
> 
> Attaching the stylesheet "updateXml.xsl" transforms a  search result to Solr's {{UpdateXml}} syntax. One example is to copy a Solr1.3 index (which does not have CSV response writer) into a format   which can be indexed into another Solr file (provided that all fields are stored):
> 
> {code}
> http://localhost:8983/solr/select?q=*:*&wt=xslt&tr=updateXml.xsl&rows=1000
> {code}
> 
> You can also use the stylesheet in {{XsltUpdateRequestHandler}} to transform an index when updating:
> 
> {code}
> curl "http://localhost:8983/solr/update/xslt?commit=true&tr=updateX
> {code}
> 
> {topofpage}
> 
> h2. CSVRequestHandler for CSV Content
> 
> h3. Configuration
> 
> The default configuration file has the update request handler configured by default, although the "lazy load" flag is set.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="/update/csv" class="solr.CSVRequestHandler" startup="lazy" />
> {code}
> 
> h3. Parameters
> 
> The CSV handler allows the specification of many parameters in the URL in the form: {{f.}}{{{}{_}parameter{_}{}}}{{.}}{{{}{_}optional_fieldname{_}{}}}{{=}}{{{}{_}value{_}}}.
> 
> The table below describes the parameters for the update handler.
> 
> || Parameter || Usage || Global (g) or Per Field (f) || Example ||
> | separator | Character used as field separator; default is "," | g,(f: see split) | separator=% |
> | trim | If true, remove leading and trailing whitespace from values. Default=false. | g,f | f.isbn.trim=true \\
> trim=false |
> | header | Set to true if first line of input contains field names. These will be used if the *field_name* parameter is absent. | g | |
> | field_name | Comma separated list of field names to use when adding documents. | g | field_name=isbn,price,title |
> | literal.<field_name> | Comma separated list of field names to use when processing literal values. | g | literal.color=red,blue,black |
> | skip | Comma separated list of field names to skip. | g | skip=uninteresting,shoesize |
> | skipLines | Number of lines to discard in the input stream before the CSV data starts, including the header, if present. Default=0. | g | skipLines=5 |
> | encapsulator | The character optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator. | g,(f: see split) | encapsulator=" |
> | escape | The character used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both | g | escape=\ \\ |
> | keepEmpty | Keep and index zero length (empty) fields. Default=false. | g,f | f.price.keepEmpty=true |
> | map | Map one value to another. Format is value:replacement (which can be empty.) | g,f | map=left:right \\
> f.subject.map=history:bunk |
> | split | If true, split a field into multiple values by a separate parser. | f | |
> | overwrite | If true (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the Solr schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up setting this to false. | g | |
> | commit | Issues a commit after the data has been ingested. | g | |
> | commitWithin | Add the document within the specified number of milliseconds. | g | commitWithin=10000 |
> | rowid | Map the rowid (line number) to a field specified by the value of the parameter, for instance if your CSV doesn't have a unique key and you want to use the row id as such. | g | rowid=id|
> | rowidOffset | Add the given offset (as an int) to the rowid before adding it to the document.  Default is 0 | g | rowidOffset=10|
> 
> For more information on the CSV Update Request Handler, see [https://wiki.apache.org/solr/UpdateCSV].
> 
> {topofpage}
> 
> h2. Using the JSONRequestHandler for JSON Content
> 
> JSON formatted update requests may be sent to Solr using the {{/solr/update/json}} URL. All of the normal methods for uploading content are supported.
> 
> h3. Configuration
> 
> The default configuration file has the update request handler configured by default, although the "lazy load" flag is set.
> 
> {code:lang=xml|borderStyle=solid|borderColor=#666666}
> <requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" startup="lazy" />
> {code}
> 
> h3. Examples
> 
> There is a sample JSON file at {{example/exampledocs/books.json}} that you can use to add documents to the Solr example server.
> 
> {code:borderStyle=solid|borderColor=#666666}
> cd example/exampledocs
> curl 'http://localhost:8983/solr/update/json?commit=true'
> --data-binary @books.json -H 'Content-type:application/json'
> {code}
> 
> Adding {{commit=true}} to the URL makes the documents immediately searchable.
> 
> You should now be able to query for the newly added documents:
> 
> {{[http://localhost:8983/solr/select?q=title:monsters&wt=json&indent=true]}} returns:
> 
> {code:borderStyle=solid|borderColor=#666666}
> {
> "responseHeader":{
>   "status":0,
>   "QTime":2,
>   "params":{
>     "indent":"true",
>     "wt":"json",
>     "q":"title:monsters"}},
> "response":{"numFound":1,"start":0,"docs":[
>     {
>       "id":"978-1423103349",
>       "author":"Rick Riordan",
>       "series_t":"Percy Jackson and the Olympians",
>       "sequence_i":2,
>       "genre_s":"fantasy",
>       "inStock":true,
>       "price":6.49,
>       "pages_i":304,
>       "title":[
>         "The Sea of Monsters"],
>       "cat":["book","paperback"]}]
> }
> }
> {code}
> 
> h3. Update Commands
> 
> The JSON update handler accepts all of the update commands that the XML update handler supports, through a straightforward mapping. Multiple commands may be contained in one message:
> 
> {code:borderStyle=solid|borderColor=#666666}
> {
> "add": {
> "doc": {
>   "id": "DOC1",
>   "my_boosted_field": {        /* use a map with boost/value for a boosted field */
>     "boost": 2.3,
>     "value": "test"
>   },
>   "my_multivalued_field": [ "aaa", "bbb" ]   /* use an array for a multi-valued field */
> }
> },
> "add": {
> "commitWithin": 5000,          /* commit this document within 5 seconds */
> "overwrite": false,            /* don't check for existing documents with the same uniqueKey */
> "boost": 3.45,                 /* a document boost */
> "doc": {
>   "f1": "v1",
>   "f1": "v2"
> }
> },
> 
> "commit": {},
> "optimize": { "waitFlush":false, "waitSearcher":false },
> 
> "delete": { "id":"ID" },         /* delete by ID */
> "delete": { "query":"QUERY" }    /* delete by query */
> }
> {code}
> 
> {note}
> Comments are not allowed JSON, but duplicate names are.
> {note}
> 
> As with other update handlers, parameters such as {{commit}}, {{commitWithin}}, {{optimize}}, and {{overwrite}} may be specified in the URL instead of in the body of the message.
> 
> The JSON update format allows for a simple delete-by-id. The value of a {{delete}} can be an array which contains a list of zero or more specific document id's (not a range) to be deleted. For example:
> 
> {code:borderStyle=solid|borderColor=#666666}
> "delete":"myid"
> {code}
> 
> {code:borderStyle=solid|borderColor=#666666}
> "delete":["id1","id2"]
> {code}
> 
> The value of a "delete" can be an array   which contains a list of zero or more id's to be deleted. It is not a range (start and end).
> 
> You can also specify _version_&nbsp; with each "delete":
> String str = "{'delete':'id':50, '_version_':12345}"
> You can specify the version of deletes in the body of the update request as well.
> 
> For more information about the JSON Update Request Handler, see [https://wiki.apache.org/solr/UpdateJSON].
> 
> {topofpage}
> 
> h2. Updating Only Part of a Document
> 
> Solr supports several modifiers that atomically update values of a document.
> 
> || Modifier || Usage ||
> | set | Set or replace a particular value, or remove the value if null is specified as the new value. |
> | add | Adds an additional value to a list. |
> | inc | Increments a numeric value by a specific amount. |
> 
> {note} All original source fields must be stored for field modifiers to work correctly, which is the Solr default.
> {note}
> 
> For example:
> 
> {code:borderStyle=solid|borderColor=#666666}
> {"id":"mydoc", "f1"{"set":10}, "f2"{"add":20}}
> {code}
> 
> This example results in field {{f1}} being set to "10", and field {{f2}} having an additional value of "20" added. All other existing fields from the original document remain unchanged.
> {topofpage}
> 
> h2. Using SimplePostTool
> 
> This is a simple command line tool for POSTing raw data to a Solr port. Data can be read from files specified as command line arguments, as raw command line argument strings, or via {{STDIN}}.  Options controlled by System Properties include the Solr URL to post to, the {{Content-Type}} of the data, whether a commit or optimize should be executed, and whether the response should be written to {{STDOUT}}.   If {{auto=yes}} the tool will try to guess the type and set {{Content-Type}} and the URL automatically. When posting rich documents the file name will be propagated as {{resource.name}} and also used as {{literal.id}}. You may override these or any other request parameter through the {{\-Dparams}} property
> 
> Supported System Properties and their defaults:
> 
> | *Parameter* | *Values* | *Default* |
> | \-Ddata | yes, no | default=files |
> | \-Dtype | <content-type> | default=application/xml |
> | \-Durl | <solr-update-url> | default=[http://localhost:8983/solr/update] |
> | \-Dauto | yes, no | default=no |
> | \-Drecursive | yes, no | default=no |
> | \-Dfiletypes | <type>\[,<type>,..\] | default=xml, json, csv, pdf, doc, docx, ppt, pptx, xls, xlsx, odt, odp, ods, rtf, htm, html |
> | \-Dparams | "<key>=<value>\[&<key>=<value>...\]" | values must be URL-encoded |
> | \-Dcommit | yes, no | default=yes |
> | \-Doptimize | yes, no | default=no |
> | \-Dout | yes,no | default=no |
> 
> Examples:
> 
> {code:borderStyle=solid|borderColor=#666666}
> java -jar post.jar *.xml
> java -Ddata=args  -jar post.jar '<delete><id>42</id></delete>'
> java -Ddata=stdin -jar post.jar < hd.xml
> java -Dtype=text/csv -jar post.jar *.csv
> java -Dtype=application/json -jar post.jar *.json
> java -Durl=[http://localhost:8983/solr/update/extract] -Dparams=literal.id=a
>   -Dtype=application/pdf -jar post.jar a.pdf
> java -Dauto=yes -jar post.jar a.pdf
> java -Dauto=yes -Drecursive=yes -jar post.jar afolder
> java -Dauto=yes -Dfiletypes=ppt,html -jar post.jar afolder
> 
> {code}
> 
> In the above example:
> 
> | *\-Dauto=yes* | Will guess file type from file name suffix, and set type and url accordingly. It also sets the ID and file name automatically. |
> | *\-Drecursive=yes* | Will recurse into sub-folders and index all files. |
> | *\-Dfiletypes* | Specifies the file types to consider when indexing folders. |
> | *\-Dparams* | HTTP GET params to add to the request, so you don't need to write the whole URL again. |
> 
> h2. Indexing Using SolrJ
> 
> Use of the the SolrJ client library is covered in the section on [solr:Using SolrJ].
> 
> {scrollbar}
> 
> 
> Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
> Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org