You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by ct...@apache.org on 2017/05/08 20:21:50 UTC

lucene-solr:jira/solr-10290: SOLR-10296: conversion, table cleanup

Repository: lucene-solr
Updated Branches:
  refs/heads/jira/solr-10290 02c2d45c1 -> 09f53281c


SOLR-10296: conversion, table cleanup


Project: http://git-wip-us.apache.org/repos/asf/lucene-solr/repo
Commit: http://git-wip-us.apache.org/repos/asf/lucene-solr/commit/09f53281
Tree: http://git-wip-us.apache.org/repos/asf/lucene-solr/tree/09f53281
Diff: http://git-wip-us.apache.org/repos/asf/lucene-solr/diff/09f53281

Branch: refs/heads/jira/solr-10290
Commit: 09f53281cc6e7a3117a3d05adf282dfdc649237e
Parents: 02c2d45
Author: Cassandra Targett <ct...@apache.org>
Authored: Mon May 8 15:19:30 2017 -0500
Committer: Cassandra Targett <ct...@apache.org>
Committed: Mon May 8 15:19:30 2017 -0500

----------------------------------------------------------------------
 .../solr-ref-guide/src/charfilterfactories.adoc |   8 +-
 solr/solr-ref-guide/src/de-duplication.adoc     |   8 +-
 .../detecting-languages-during-indexing.adoc    |   4 +-
 solr/solr-ref-guide/src/language-analysis.adoc  |   2 +-
 .../src/updating-parts-of-documents.adoc        |  10 +-
 .../src/uploading-data-with-index-handlers.adoc |  24 ++--
 ...g-data-with-solr-cell-using-apache-tika.adoc |  19 ++--
 ...store-data-with-the-data-import-handler.adoc | 110 ++++++++++++-------
 8 files changed, 112 insertions(+), 73 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/charfilterfactories.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/charfilterfactories.adoc b/solr/solr-ref-guide/src/charfilterfactories.adoc
index 90fe81e..20ff949 100644
--- a/solr/solr-ref-guide/src/charfilterfactories.adoc
+++ b/solr/solr-ref-guide/src/charfilterfactories.adoc
@@ -33,7 +33,9 @@ Mapping file syntax:
 * The source string must contain at least one character, but the target string may be empty.
 * The following character escape sequences are recognized within source and target strings:
 +
-[cols=",,,",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
++
+[cols="20,30,20,30",options="header"]
 |===
 |Escape Sequence |Resulting Character (http://www.ecma-international.org/publications/standards/Ecma-048.htm[ECMA-48] alias) |Unicode Character |Example Mapping Line
 |`\\` |`\` |U+005C |`"\\" => "/"`
@@ -145,7 +147,9 @@ You can configure this filter in `schema.xml` like this:
 
 The table below presents examples of regex-based pattern replacement:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,20,10,20,30",options="header"]
 |===
 |Input |Pattern |Replacement |Output |Description
 |see-ing looking |`(\w+)(ing)` |`$1` |see-ing look |Removes "ing" from the end of word.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/de-duplication.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/de-duplication.adoc b/solr/solr-ref-guide/src/de-duplication.adoc
index fee4408..60d3b587 100644
--- a/solr/solr-ref-guide/src/de-duplication.adoc
+++ b/solr/solr-ref-guide/src/de-duplication.adoc
@@ -6,7 +6,9 @@ If duplicate, or near-duplicate documents are a concern in your index, de-duplic
 
 Preventing duplicate or near duplicate documents from entering an index or tagging documents with a signature/fingerprint for duplicate field collapsing can be efficiently achieved with a low collision or fuzzy hash algorithm. Solr natively supports de-duplication techniques of this type via the `Signature` class and allows for the easy addition of new hash/signature implementations. A Signature can be implemented several ways:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Method |Description
 |MD5Signature |128-bit hash used for exact duplicate detection.
@@ -50,9 +52,9 @@ The `SignatureUpdateProcessorFactory` has to be registered in `solrconfig.xml` a
 
 The `SignatureUpdateProcessorFactory` takes several properties:
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="20,30,50",options="header"]
 |===
 |Parameter |Default |Description
 |signatureClass |`org.apache.solr.update.processor.Lookup3Signature` a|

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc
index 19dce39..44cd456 100644
--- a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc
+++ b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc
@@ -55,7 +55,9 @@ Here is an example of a minimal LangDetect `langid` configuration in `solrconfig
 
 As previously mentioned, both implementations of the `langid` UpdateRequestProcessor take the same parameters.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,10,10,10,50",options="header"]
 |===
 |Parameter |Type |Default |Required |Description
 |langid |Boolean |true |no |Enables and disables language detection.

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/language-analysis.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/language-analysis.adoc b/solr/solr-ref-guide/src/language-analysis.adoc
index 2ce20ea..34fbe28 100644
--- a/solr/solr-ref-guide/src/language-analysis.adoc
+++ b/solr/solr-ref-guide/src/language-analysis.adoc
@@ -1130,7 +1130,7 @@ The second pass is to pick up -dom and -het endings. Consider this example:
 
 [width="100%",options="header",]
 |===
-|*One pass* | |*Two passes* |
+2+^|*One pass* 2+^|*Two passes*
 |*Before* |*After* |*Before* |*After*
 |forlegen |forleg |forlegen |forleg
 |forlegenhet |forlegen |forlegenhet |forleg

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
index bc134ba..ef0441b 100644
--- a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
+++ b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc
@@ -19,9 +19,9 @@ Solr supports several modifiers that atomically update values of a document. Thi
 
 To use atomic updates, add a modifier to the field that needs to be updated. The content can be updated, added to, or incrementally increased if a number.
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Modifier |Usage
 |set a|
@@ -114,21 +114,19 @@ An atomic update operation is performed using this approach only when the fields
 
 To use in-place updates, add a modifier to the field that needs to be updated. The content can be updated or incrementally increased.
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Modifier |Usage
 |set a|
 Set or replace the field value(s) with the specified value(s).
 
 May be specified as a single value.
-
 |inc a|
 Increments a numeric value by a specific amount.
 
 Must be specified as a single numeric value.
-
 |===
 
 [[UpdatingPartsofDocuments-Example.1]]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc
index 4727745..882ea5a 100644
--- a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc
+++ b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc
@@ -58,11 +58,13 @@ For example:
 
 The add command supports some optional attributes which may be specified.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Optional Parameter |Parameter Description
-|commitWithin=__number__ |Add the document within the specified number of milliseconds
-|overwrite=__boolean__ |Default is true. Indicates if the unique key constraints should be checked to overwrite previous versions of the same document (see below)
+|commitWithin=_number_ |Add the document within the specified number of milliseconds
+|overwrite=_boolean_ |Default is true. Indicates if the unique key constraints should be checked to overwrite previous versions of the same document (see below)
 |===
 
 If the document schema defines a unique key, then by default an `/update` operation to add a document will overwrite (ie: replace) any document in the index with the same unique key. If no unique key has been defined, indexing performance is somewhat faster, as no check has to be made for an existing documents to replace.
@@ -83,7 +85,9 @@ The `<optimize>` operation requests Solr to merge internal data structures in or
 
 The `<commit>` and `<optimize>` elements accept these optional attributes:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Optional Attribute |Description
 |waitSearcher |Default is true. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible.
@@ -298,7 +302,7 @@ curl 'http://localhost:8983/solr/techproducts/update?commit=true' --data-binary
 
 In general, the JSON update syntax supports all of the update commands that the XML update handler supports, through a straightforward mapping. Multiple commands, adding and deleting documents, may be contained in one message:
 
-[source,bash]
+[source,text]
 ----
 curl -X POST -H 'Content-Type: application/json' 'http://localhost:8983/solr/my_collection/update' --data-binary '
 {
@@ -404,11 +408,13 @@ curl 'http://localhost:8983/solr/my_collection/update?commit=true' --data-binary
 [[UploadingDatawithIndexHandlers-CSVUpdateParameters]]
 === CSV Update Parameters
 
-The CSV handler allows the specification of many parameters in the URL in the form: `f.__parameter__.__optional_fieldname__=__value__` .
+The CSV handler allows the specification of many parameters in the URL in the form: `f._parameter_._optional_fieldname_=_value_` .
 
 The table below describes the parameters for the update handler.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="20,40,20,20",options="header"]
 |===
 |Parameter |Usage |Global (g) or Per Field (f) |Example
 |separator |Character used as field separator; default is "," |g,(f: see split) |separator=%09
@@ -469,7 +475,7 @@ Solr indexes nested documents in blocks as a way to model documents containing o
 
 Nested documents may be indexed via either the XML or JSON data syntax (or using <<using-solrj.adoc#using-solrj,SolrJ)>> - but regardless of syntax, you must include a field that identifies the parent document as a parent; it can be any field that suits this purpose, and it will be used as input for the <<other-parsers.adoc#OtherParsers-BlockJoinQueryParsers,block join query parsers>>.
 
-To support nested documents, the schema must include an indexed/non-stored field ___root__ _ . The value of that field is populated automatically and is the same for all documents in the block, regardless of the inheritance depth.
+To support nested documents, the schema must include an indexed/non-stored field `\_root_`. The value of that field is populated automatically and is the same for all documents in the block, regardless of the inheritance depth.
 
 [[UploadingDatawithIndexHandlers-XMLExamples]]
 === XML Examples
@@ -538,7 +544,5 @@ This example is equivalent to the XML example above, note the special `\_childDo
 .Note
 [NOTE]
 ====
-
 One limitation of indexing nested documents is that the whole block of parent-children documents must be updated together whenever any changes are required. In other words, even if a single child document or the parent document is changed, the whole block of parent-child documents must be indexed together.
-
 ====

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc b/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc
index 9337a22..fc678d6 100644
--- a/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc
+++ b/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc
@@ -60,14 +60,14 @@ The URL above calls the Extracting Request Handler, uploads the file `solr-word.
 
 * The argument `myfile=@tutorial.html` needs a valid path, which can be absolute or relative.
 
-You can also use `bin/post` to send a PDF file into Solr (without the params, the literal.id parameter would be set to the absolute path to the file):
+You can also use `bin/post` to send a PDF file into Solr (without the params, the `literal.id` parameter would be set to the absolute path to the file):
 
 [source,bash]
 ----
 bin/post -c techproducts example/exampledocs/solr-word.pdf -params "literal.id=a"
 ----
 
-Now you should be able to execute a query and find that document. You can make a request like `http://localhost:8983/solr/techproducts/select?q=pdf` .
+Now you should be able to execute a query and find that document. You can make a request like `\http://localhost:8983/solr/techproducts/select?q=pdf` .
 
 You may notice that although the content of the sample document has been indexed and stored, there are not a lot of metadata fields associated with this document. This is because unknown fields are ignored according to the default parameters configured for the `/update/extract` handler in `solrconfig.xml`, and this behavior can be easily changed or overridden. For example, to store and see all metadata and content, execute the following:
 
@@ -78,14 +78,16 @@ bin/post -c techproducts example/exampledocs/solr-word.pdf -params "literal.id=d
 
 In this command, the `uprefix=attr_` parameter causes all generated fields that aren't defined in the schema to be prefixed with `attr_`, which is a dynamic field that is stored and indexed.
 
-This command allows you to query the document using an attribute, as in: `http://localhost:8983/solr/techproducts/select?q=attr_meta:microsoft`.
+This command allows you to query the document using an attribute, as in: `\http://localhost:8983/solr/techproducts/select?q=attr_meta:microsoft`.
 
 [[UploadingDatawithSolrCellusingApacheTika-InputParameters]]
 == Input Parameters
 
 The table below describes the parameters accepted by the Extracting Request Handler.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Parameter |Description
 |capture |Captures XHTML elements with the specified name for a supplementary addition to the Solr document. This parameter can be useful for copying chunks of the XHTML into a separate field. For instance, it could be used to grab paragraphs (`<p>`) and index them into a separate field. Note that content is still also captured into the overall "content" field.
@@ -223,7 +225,9 @@ As mentioned before, Tika produces metadata about the document. Metadata describ
 
 In addition to Tika's metadata, Solr adds the following metadata (defined in `ExtractingMetadataConstants`):
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Solr Metadata |Description
 |stream_name |The name of the Content Stream as uploaded to Solr. Depending on how the file is uploaded, this may or may not be set
@@ -340,8 +344,3 @@ This operation streams the file `my-file.pdf` into the Solr index for `my_collec
 The sample code above calls the extract command, but you can easily substitute other commands that are supported by Solr Cell. The key class to use is the `ContentStreamUpdateRequest`, which makes sure the ContentStreams are set properly. SolrJ takes care of the rest.
 
 Note that the `ContentStreamUpdateRequest` is not just specific to Solr Cell. You can send CSV to the CSV Update handler and to any other Request Handler that works with Content Streams for updates.
-
-[[UploadingDatawithSolrCellusingApacheTika-RelatedTopics]]
-== Related Topics
-
-* http://wiki.apache.org/solr/ExtractingRequestHandler[ExtractingRequestHandler]

http://git-wip-us.apache.org/repos/asf/lucene-solr/blob/09f53281/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
----------------------------------------------------------------------
diff --git a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
index b15974e..0fd66ab 100644
--- a/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
+++ b/solr/solr-ref-guide/src/uploading-structured-data-store-data-with-the-data-import-handler.adoc
@@ -19,7 +19,9 @@ For more information about the Data Import Handler, see https://wiki.apache.org/
 
 Descriptions of the Data Import Handler use several familiar terms, such as entity and processor, in specific ways, as explained in the table below.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Term |Definition
 |Datasource |As its name suggests, a datasource defines the location of the data of interest. For a database, it's a DSN. For an HTTP datasource, it's the base URL.
@@ -149,14 +151,14 @@ Then, these parameters can be passed to the full-import command or defined in th
 
 DIH commands are sent to Solr via an HTTP request. The following operations are supported.
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Command |Description
 |`abort` |Aborts an ongoing operation. The URL is `\http://<host>:<port>/solr/<collection_name>/dataimport?command=abort`.
 |`delta-import` |For incremental imports and change detection. The command is of the form `\http://<host>:<port>/solr/<collection_name>/dataimport?command=delta-import`.  It supports the same clean, commit, optimize and debug parameters as full-import command. Only the SqlEntityProcessor supports delta imports.
-|`full-import` |A Full Import operation can be started with a URL of the form `\http://<host>:<port>/solr/<collection_name>/dataimport?command=full-import`. The command returns immediately. The operation will be started in a new thread and the _status_ attribute in the response should be shown as __busy__. The operation may take some time depending on the size of dataset. Queries to Solr are not blocked during full-imports. When a full-import command is executed, it stores the start time of the operation in a file located at `conf/dataimport.properties`. This stored timestamp is used when a delta-import operation is executed. For a list of parameters that can be passed to this command, see below.
+|`full-import` |A Full Import operation can be started with a URL of the form `\http://<host>:<port>/solr/<collection_name>/dataimport?command=full-import`. The command returns immediately. The operation will be started in a new thread and the _status_ attribute in the response should be shown as _busy_. The operation may take some time depending on the size of dataset. Queries to Solr are not blocked during full-imports. When a full-import command is executed, it stores the start time of the operation in a file located at `conf/dataimport.properties`. This stored timestamp is used when a delta-import operation is executed. For a list of parameters that can be passed to this command, see below.
 |`reload-config` a|
 If the configuration file has been changed and you wish to reload it without restarting Solr, run the command
 
@@ -171,7 +173,9 @@ If the configuration file has been changed and you wish to reload it without res
 
 The `full-import` command accepts the following parameters:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Parameter |Description
 |clean |Default is true. Tells whether to clean up the index before the indexing is started.
@@ -195,7 +199,9 @@ The `propertyWriter` element defines the date format and locale for use with del
 
 The parameters available are:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Parameter |Description
 |dateFormat |A java.text.SimpleDateFormat to use when converting the date to text. The default is "yyyy-MM-dd HH:mm:ss".
@@ -229,7 +235,7 @@ This can be used where a database field contains XML which you wish to process u
 [source,xml]
 ----
 <dataSource name="a1" driver="org.hsqldb.jdbcDriver" ...  />
-<dataSource name="a2" type=FieldReaderDataSource" />
+<dataSource name="a2" type="FieldReaderDataSource" />
 <document>
 
   <!-- processor for database -->
@@ -259,7 +265,9 @@ This can be used like an <<UploadingStructuredDataStoreDatawiththeDataImportHand
 
 This data source accepts these optional attributes.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Optional Attribute |Description
 |basePath |The base path relative to which the value is evaluated if it is not absolute.
@@ -271,23 +279,19 @@ This data source accepts these optional attributes.
 
 This is the default datasource. It's used with the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-TheSQLEntityProcessor,SqlEntityProcessor>>. See the example in the <<UploadingStructuredDataStoreDatawiththeDataImportHandler-FieldReaderDataSource,FieldReaderDataSource>> section for details on configuration. `JdbcDatasource` supports at least the following attributes: .
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Attributes |Description
 |driver, url, user, password, encryptKeyFile |Usual jdbc connection properties
-a|
-....
-batchSize
-....
-
+|`batchSize`
  a|
 Passed to Statement#setFetchSize, default value 500.
 
 For MySQL driver, which doesn't honor fetchSize and pulls whole resultSet, which often lead to OutOfMemoryError.
 
-In this case, set batchSize=-1 that pass setFetchSize(Integer.MIN_VALUE), and switch result set to pull row by row
+In this case, set `batchSize=-1` that pass setFetchSize(Integer.MIN_VALUE), and switch result set to pull row by row
 
 |===
 
@@ -310,7 +314,9 @@ This data source is often used with XPathEntityProcessor to fetch content from a
 
 The URLDataSource type accepts these optional parameters:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Optional Parameter |Description
 |baseURL |Specifies a new baseURL for pathnames. You can use this to specify host/port changes between Dev/QA/Prod environments. Using this attribute isolates the changes to be made to the `solrconfig.xml`
@@ -326,7 +332,9 @@ Entity processors extract data, transform it, and add it to a Solr index. Exampl
 
 Each processor has its own set of attributes, described in its own section below. In addition, there are non-specific attributes common to all entities which may be specified.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |dataSource |The name of a data source. If there are multiple data sources defined, use this attribute with the name of the data source for this entity.
@@ -365,7 +373,9 @@ The SqlEntityProcessor is the default processor. The associated <<UploadingStruc
 
 The entity attributes specific to this processor are shown in the table below.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |query |Required. The SQL query used to select rows.
@@ -382,7 +392,9 @@ This processor is used when indexing XML formatted data. The data source is typi
 
 The entity attributes unique to this processor are shown below.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |Processor |Required. Must be set to "XpathEntityProcessor".
@@ -395,9 +407,9 @@ The entity attributes unique to this processor are shown below.
 
 Each field element in the entity can have the following attributes as well as the default ones.
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |xpath |Required. The XPath expression which will extract the content from the record for this field. Only a subset of Xpath syntax is supported.
@@ -470,9 +482,9 @@ Here is an example from the "```mail```" collection of the `dih` example (`examp
 
 The entity attributes unique to the MailEntityProcessor are shown below.
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |processor |Required. Must be set to "MailEntityProcessor".
@@ -504,7 +516,7 @@ After running a full import, the MailEntityProcessor keeps track of the timestam
 [[UploadingStructuredDataStoreDatawiththeDataImportHandler-GMailExtensions]]
 ==== GMail Extensions
 
-When connecting to a GMail account, you can improve the efficiency of the MailEntityProcessor by setting the protocol to *gimap* or **gimaps**. This allows the processor to send the fetchMailsSince filter to the GMail server to have the date filter applied on the server, which means the processor only receives new messages from the server. However, GMail only supports date granularity, so the server-side filter may return previously seen messages if run more than once a day.
+When connecting to a GMail account, you can improve the efficiency of the MailEntityProcessor by setting the protocol to *gimap* or *gimaps*. This allows the processor to send the fetchMailsSince filter to the GMail server to have the date filter applied on the server, which means the processor only receives new messages from the server. However, GMail only supports date granularity, so the server-side filter may return previously seen messages if run more than once a day.
 
 [[UploadingStructuredDataStoreDatawiththeDataImportHandler-TheTikaEntityProcessor]]
 === The TikaEntityProcessor
@@ -530,9 +542,9 @@ Here is an example from the "```tika```" collection of the `dih` example (`examp
 
 The parameters for this processor are described in the table below:
 
-// TODO: This table has cells that won't work with PDF: https://github.com/ctargett/refguide-asciidoc-poc/issues/13
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
 
-[width="100%",options="header",]
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |dataSource a|
@@ -543,11 +555,11 @@ This parameter defines the data source and an optional name which can be referre
 * BinFileDataSource: used for content on the local filesystem.
 
 |url |The path to the source file(s), as a file path or a traditional internet URL. This parameter is required.
-|htmlMapper |Allows control of how Tika parses HTML. The "default" mapper strips much of the HTML from documents while the "identity" mapper passes all HTML as-is with no modifications. If this parameter is defined, it must be either *default* or **identity**; if it is absent, "default" is assumed.
-|format |The output format. The options are **text**, **xml**, *html* or **none**. The default is "text" if not defined. The format "none" can be used if metadata only should be indexed and not the body of the documents.
+|htmlMapper |Allows control of how Tika parses HTML. The "default" mapper strips much of the HTML from documents while the "identity" mapper passes all HTML as-is with no modifications. If this parameter is defined, it must be either *default* or *identity*; if it is absent, "default" is assumed.
+|format |The output format. The options are *text*, *xml*, *html* or *none*. The default is "text" if not defined. The format "none" can be used if metadata only should be indexed and not the body of the documents.
 |parser |The default parser is `org.apache.tika.parser.AutoDetectParser`. If a custom or other parser should be used, it should be entered as a fully-qualified name of the class and path.
 |fields |The list of fields from the input documents and how they should be mapped to Solr fields. If the attribute `meta` is defined as "true", the field will be obtained from the metadata of the document and not parsed from the body of the main text.
-|extractEmbedded |Instructs the TikaEntityProcessor to extract embedded documents or attachments when **true**. If false, embedded documents and attachments will be ignored.
+|extractEmbedded |Instructs the TikaEntityProcessor to extract embedded documents or attachments when *true*. If false, embedded documents and attachments will be ignored.
 |onError |By default, the TikaEntityProcessor will stop processing documents if it finds one that generates an error. If you define `onError` to "skip", the TikaEntityProcessor will instead skip documents that fail processing and log a message that the document was skipped.
 |===
 
@@ -558,7 +570,9 @@ This processor is basically a wrapper, and is designed to generate a set of file
 
 The attributes specific to this processor are described in the table below:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Use
 |fileName |Required. A regular expression pattern to identify files to be included.
@@ -607,7 +621,9 @@ This EntityProcessor reads all content from the data source on a line by line ba
 
 The lines read can be filtered by two regular expressions specified with the `acceptLineRegex` and `omitLineRegex` attributes. The table below describes the LineEntityProcessor's attributes:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Description
 |url |A required attribute that specifies the location of the input file in a way that is compatible with the configured data source. If this value is relative and you are using FileDataSource or URLDataSource, it assumed to be relative to baseLoc.
@@ -654,7 +670,9 @@ Ensure that the dataSource is of type `DataSource<Reader>` (`FileDataSource`, `U
 
 Uses Solr instance as a datasource, see https://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor. In addition to that, SolrEntityProcessor also supports the following parameters:
 
-[cols=",",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |cursorMark="true" |specify it to enable cursor for efficient result set scrolling
 |sort="id asc" |in this case it usually needs to specify sort param referencing uniqueKey field. see <<pagination-of-results.adoc#pagination-of-results,Pagination of Results>> for details.
@@ -676,7 +694,9 @@ The Data Import Handler contains several built-in transformers. You can also wri
 
 Solr includes the following built-in transformers:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="40,60",options="header"]
 |===
 |Transformer Name |Use
 |<<UploadingStructuredDataStoreDatawiththeDataImportHandler-ClobTransformer,ClobTransformer>> |Used to create a String out of a Clob type in database.
@@ -706,7 +726,9 @@ You can use the ClobTransformer to create a string out of a CLOB in a database.
 
 The ClobTransformer accepts these attributes:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Description
 |clob |Boolean value to signal if ClobTransformer should process this field or not. If this attribute is omitted, then the corresponding field is not transformed.
@@ -722,7 +744,9 @@ DateFormatTransformer applies only on the fields with an attribute `dateTimeForm
 
 This transformer recognizes the following attributes:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Description
 |dateTimeFormat |The format used for parsing this field. This must comply with the syntax of the http://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html[Java SimpleDateFormat] class.
@@ -780,10 +804,12 @@ NumberFormatTransformer will be applied only to fields with an attribute `format
 
 This transformer recognizes the following attributes:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Description
-|formatStyle |The format used for parsing this field. The value of the attribute must be one of (`number|percent|integer|currency`). This uses the semantics of the Java NumberFormat class.
+|formatStyle |The format used for parsing this field. The value of the attribute must be one of (`number\|percent\|integer\|currency`). This uses the semantics of the Java NumberFormat class.
 |sourceColName |The column on which the NumberFormat is to be applied. This is attribute is absent. The source column and the target column are the same.
 |locale |The locale to be used for parsing the strings. The locale. If not defined, the ROOT locale is used. It must be specified as language-country (https://tools.ietf.org/html/bcp47[BCP 47 language tag]). For example, `en-US`.
 |===
@@ -808,7 +834,9 @@ The regex transformer helps in extracting or manipulating values from fields (fr
 
 The table below describes the attributes recognized by the regex transformer.
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Attribute |Description
 |regex |The regular expression that is used to match against the column or sourceColName's value(s). If replaceWith is absent, each regex _group_ is taken as a value and a list of values is returned.
@@ -900,7 +928,9 @@ You can use the template transformer to construct or modify a field value, perha
 
 You can pass special commands to the DIH by adding any of the variables listed below to any row returned by any component:
 
-[width="100%",options="header",]
+// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed
+
+[cols="30,70",options="header"]
 |===
 |Variable |Description
 |$skipDoc |Skip the current document; that is, do not add it to Solr. The value can be the string `true\|false`.