You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/04 22:53:18 UTC

[GitHub] [lucene-solr] epugh opened a new pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

epugh opened a new pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306


   
   # Description
   
   Reduce security exposure for solr by moving the xslt to scripting.
   
   # Solution
   
   Migrating code.
   
   # Tests
   
   running existing tests.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [X ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [ X] I have created a Jira issue and added the issue ID to my pull request title.
   - [ X] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [X ] I have developed this patch against the `master` branch.
   - [ X] I have run `./gradlew check`.
   - [ X] I have added tests for my changes.
   - [ X] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571549045



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/XSLTResponseWriter.java
##########
@@ -14,7 +14,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.solr.scripting.response;
+package org.apache.solr.scripting;

Review comment:
       why not the xslt package?

##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/handler/XSLTOutputWriterTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.handler;
+
+import org.apache.solr.SolrTestCaseJ4;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Tests the ability to configure multiple query output writers, and select those
+ * at query time.  This is specific to the XSLT writer, which isn't part of the core.
+ * 
+ * See the related unit test OutputWriterTest.
+ *
+ */
+public class XSLTOutputWriterTest extends SolrTestCaseJ4 {

Review comment:
       Your recent edits here look good BTW.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler edited a comment on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler edited a comment on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775330463


   I have to again look at the code, to figure out the best interaction that is still performant and clean to implement. So we don't convert too often between different XML API representations.
   
   I think the converter protected method should take a Source impl and return a new Source impl, in base class it just returns the parameter unmodified.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571448617



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       *XXE_XMLSTREAMREADER:*  The XML parsing is vulnerable to XML External Entity attacks [(details)](https://find-sec-bugs.github.io/bugs.htm#XXE_XMLSTREAMREADER)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r575665009



##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/xslt/XSLTUpdateRequestHandlerTest.java
##########
@@ -39,11 +38,17 @@
 import org.junit.BeforeClass;
 import org.junit.Test;
 
-public class XsltUpdateRequestHandlerTest extends SolrTestCaseJ4 {
-  
+/**
+ * <p>
+ * This tests the XSLTUpdateRequestHandler ability to work with XSLT stylesheet and xml content.
+ * </p>
+*/
+public class XSLTUpdateRequestHandlerTest extends SolrTestCaseJ4 {

Review comment:
       I am down for making this change...  XSLT -> Xslt for the files touched in this PR, and then maybe a seperate JIRA for XML --> Xml  in the other classes?  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r575631373



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTUpdateRequestHandler.java
##########
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.scripting.xslt;
+
+import static org.apache.solr.scripting.xslt.XSLTConstants.*;
+
+import java.util.Map;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.XMLLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+/**
+ * Send XML formatted documents to Solr, transforming them from the original XML
+ * format to the Solr XML format using an XSLT stylesheet via the 'tr' parameter.
+ */
+public class XSLTUpdateRequestHandler extends UpdateRequestHandler {
+
+  @Override
+  public void init(@SuppressWarnings({"rawtypes"})NamedList args) {
+    super.init(args);
+    setAssumeContentType("application/xml");
+
+    SolrParams p = null;
+    if (args != null) {
+      p = args.toSolrParams();
+    }
+    final XsltXMLLoader loader = new XsltXMLLoader().init(p);
+    loaders = Map.of("application/xml", loader, "text/xml", loader);
+  }
+
+  @VisibleForTesting
+  static class XsltXMLLoader extends XMLLoader {
+
+    int xsltCacheLifetimeSeconds;
+
+    @Override
+    public XsltXMLLoader init(SolrParams args) {
+      super.init(args);
+
+      xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+      if (args != null) {
+        xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM, XSLT_CACHE_DEFAULT);
+      }
+      return this;
+    }
+
+    @Override
+    public void load(
+        SolrQueryRequest req,
+        SolrQueryResponse rsp,
+        ContentStream stream,
+        UpdateRequestProcessor processor)
+        throws Exception {
+
+      String tr = req.getParams().get(TR, null);
+      if (tr == null) {
+        super.load(req, rsp, stream, processor); // no XSLT; do standard processing
+        return;
+      }
+
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+        throw new SolrException(
+            SolrException.ErrorCode.UNAUTHORIZED,
+            "The configset for this collection was uploaded without any authentication in place,"
+                + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = TransformerProvider.getTransformer(req, tr, xsltCacheLifetimeSeconds);
+      final DOMResult result = new DOMResult();
+
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always
+      // produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try (var is = stream.getStream()) {
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(ContentStreamBase.getCharsetFromContentType(stream.getContentType()));
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch (TransformerException e) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.toString(), e);
+      }
+
+      // second step: feed the intermediate DOM tree into StAX parser:
+      XMLStreamReader parser = null; // does not implement AutoCloseable!
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       *XXE_XMLSTREAMREADER:*  The XML parsing is vulnerable to XML External Entity attacks [(details)](https://find-sec-bugs.github.io/bugs.htm#XXE_XMLSTREAMREADER)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571548842



##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/handler/XSLTOutputWriterTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.handler;
+
+import org.apache.solr.SolrTestCaseJ4;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Tests the ability to configure multiple query output writers, and select those
+ * at query time.  This is specific to the XSLT writer, which isn't part of the core.
+ * 
+ * See the related unit test OutputWriterTest.
+ *
+ */
+public class XSLTOutputWriterTest extends SolrTestCaseJ4 {

Review comment:
       Apparently there's an Eclipse plugin listed here: https://github.com/google/google-java-format
   The Lucene side issue is here: https://issues.apache.org/jira/browse/LUCENE-9564
   But you'll see in the comments were held off on Solr for a later date.
   
   BTW if I'm mistaken and you moved an existing source file, then I recommend doing no formatting changes as it may fool git rename detection.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r574271987



##########
File path: solr/docker/Docker-FAQ.md
##########
@@ -150,7 +150,7 @@ rather than the "mycores" directory.
 Can I run ZooKeeper and Solr clusters under Docker?
 ---------------------------------------------------
 
-At the network level the ZooKeeper nodes need to be able to talk to eachother,

Review comment:
       Feel free to just commit this typo stuff by itself directly (no PR needed)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh merged pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh merged pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571439997



##########
File path: solr/core/src/java/org/apache/solr/handler/UpdateRequestHandler.java
##########
@@ -134,12 +135,23 @@ protected void setAssumeContentType(String ct) {
   }
   private Map<String ,ContentStreamLoader> pathVsLoaders = new HashMap<>();
   protected Map<String,ContentStreamLoader> createDefaultLoaders(@SuppressWarnings({"rawtypes"})NamedList args) {
+
     SolrParams p = null;
     if(args!=null) {
       p = args.toSolrParams();
     }
     Map<String,ContentStreamLoader> registry = new HashMap<>();
-    registry.put("application/xml", new XMLLoader().init(p) );
+
+    // Try to load the scripting contrib modules XSLTLoader, and if not, fall back to the XMLLoader.
+    // Enable the XSLTLoader by including the Scripting contrib module.
+    try {
+      final Class<?> clazz = Class.forName( "org.apache.solr.scripting.util.xslt.XSLTLoader" );

Review comment:
       This was the part where I didn't like what I did...   And was putting *something* up as a starter..   I really appreciate your and @uschindler input on this.   Definitily lets find another approach!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571475376



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       ignore bug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571438270



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/response/package-info.java
##########
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Support for transforming via XSL stylesheets search responses.

Review comment:
       I think I was captured by the idea of mimicking what was in `/core/` too much..   I like.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775285291


   I think there is a bit of back and forth..  I added a boolean to `XMLLoader` constructor to decide if the tr parameter was acceptible.  I was going to have it throw an exception from the `UpdateRequestHandler` if a `tr` was passed in via solr params, but allow it if it was from the `XSLTUpdateRequestHandler`...    
   
   I'm open to any approach.   I wonder if the reason to keep the `TransformerProvider.java` in core is for people who write their own Solr plugins that use it, and not have to require the scripting module?   
   
   I'm going to push up a set of changes where all the paths work and tests pass momentarily.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778641991


   Sorry to hear that news @uschindler ....
   
   I did a pass through the Ref Guide to document how to use the XSLT stuff, and more explicitly called out the sample `.xsl` files that we have.  I can add more if needed.  
   
   So, if we aren't changing the name of the Java class in this PR, then I think we just need to update the Upgrade to Solr 9 page, and we're good to go?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler edited a comment on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler edited a comment on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775326263


   Yes, I tend to think of an interception point.
   
   By default in core the interception point just returns the original Source instance. In the subclass it applies the transformation and returns a new Source instance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775365880


   @uschindler if you want to push up an example/make the change the way you are thinking, I'm 100% happy to have that!  Your Java skillz are way beyond mine, and while I sort of understnad what you say, it may take you 20 minutes to get the change you want, and then I'll learn from you ;-)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r575675609



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTUpdateRequestHandler.java
##########
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.scripting.xslt;
+
+import static org.apache.solr.scripting.xslt.XSLTConstants.*;
+
+import java.util.Map;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.XMLLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+/**
+ * Send XML formatted documents to Solr, transforming them from the original XML
+ * format to the Solr XML format using an XSLT stylesheet via the 'tr' parameter.
+ */
+public class XSLTUpdateRequestHandler extends UpdateRequestHandler {
+
+  @Override
+  public void init(@SuppressWarnings({"rawtypes"})NamedList args) {
+    super.init(args);
+    setAssumeContentType("application/xml");
+
+    SolrParams p = null;
+    if (args != null) {
+      p = args.toSolrParams();
+    }
+    final XsltXMLLoader loader = new XsltXMLLoader().init(p);
+    loaders = Map.of("application/xml", loader, "text/xml", loader);
+  }
+
+  @VisibleForTesting
+  static class XsltXMLLoader extends XMLLoader {
+
+    int xsltCacheLifetimeSeconds;
+
+    @Override
+    public XsltXMLLoader init(SolrParams args) {
+      super.init(args);
+
+      xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+      if (args != null) {
+        xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM, XSLT_CACHE_DEFAULT);
+      }
+      return this;
+    }
+
+    @Override
+    public void load(
+        SolrQueryRequest req,
+        SolrQueryResponse rsp,
+        ContentStream stream,
+        UpdateRequestProcessor processor)
+        throws Exception {
+
+      String tr = req.getParams().get(TR, null);
+      if (tr == null) {
+        super.load(req, rsp, stream, processor); // no XSLT; do standard processing
+        return;
+      }
+
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+        throw new SolrException(
+            SolrException.ErrorCode.UNAUTHORIZED,
+            "The configset for this collection was uploaded without any authentication in place,"
+                + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = TransformerProvider.getTransformer(req, tr, xsltCacheLifetimeSeconds);
+      final DOMResult result = new DOMResult();
+
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always
+      // produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try (var is = stream.getStream()) {
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(ContentStreamBase.getCharsetFromContentType(stream.getContentType()));
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch (TransformerException e) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.toString(), e);
+      }
+
+      // second step: feed the intermediate DOM tree into StAX parser:
+      XMLStreamReader parser = null; // does not implement AutoCloseable!
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       not a bug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571375991



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.util.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       Muse is right.  But here, I think it's okay... we are taking a script (XSLT is a script), so all bets are off; include external entities if you want?  WDYT @uschindler ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-773656218


   I still have a broken test, and need to update docs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by Michael Sokolov <ms...@gmail.com>.
So sorry to hear that Uwe; take your time to grieve - that's a big one, I think

-Mike

On Sat, Feb 13, 2021 at 9:57 AM GitBox <gi...@apache.org> wrote:
>
>
> uschindler commented on pull request #2306:
> URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778630343
>
>
>    > @uschindler if you want to push up an example/make the change the way you are thinking, I'm 100% happy to have that! Your Java skillz are way beyond mine, and while I sort of understnad what you say, it may take you 20 minutes to get the change you want, and then I'll learn from you ;-)
>
>    Sorry, my father died on Thursday. So excuse my ignorance! To me @dsmiley changes look fine. I will review later as some "long term specialist" on XML processing.
>
>
> ----------------------------------------------------------------
> This is an automated message from the Apache Git Service.
> To respond to the message, please log on to GitHub and use the
> URL above to go to the specific comment.
>
> For queries about this service, please contact Infrastructure at:
> users@infra.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
> For additional commands, e-mail: issues-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778630343


   > @uschindler if you want to push up an example/make the change the way you are thinking, I'm 100% happy to have that! Your Java skillz are way beyond mine, and while I sort of understnad what you say, it may take you 20 minutes to get the change you want, and then I'll learn from you ;-)
   
   Sorry, my father died on Thursday. So excuse my ignorance! To me @dsmiley changes look fine. I will review later as some "long term specialist" 😉 on XML processing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r575675351



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTUpdateRequestHandler.java
##########
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.scripting.xslt;
+
+import static org.apache.solr.scripting.xslt.XSLTConstants.*;
+
+import java.util.Map;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.XMLLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+/**
+ * Send XML formatted documents to Solr, transforming them from the original XML
+ * format to the Solr XML format using an XSLT stylesheet via the 'tr' parameter.
+ */
+public class XSLTUpdateRequestHandler extends UpdateRequestHandler {
+
+  @Override
+  public void init(@SuppressWarnings({"rawtypes"})NamedList args) {
+    super.init(args);
+    setAssumeContentType("application/xml");
+
+    SolrParams p = null;
+    if (args != null) {
+      p = args.toSolrParams();
+    }
+    final XsltXMLLoader loader = new XsltXMLLoader().init(p);
+    loaders = Map.of("application/xml", loader, "text/xml", loader);
+  }
+
+  @VisibleForTesting
+  static class XsltXMLLoader extends XMLLoader {
+
+    int xsltCacheLifetimeSeconds;
+
+    @Override
+    public XsltXMLLoader init(SolrParams args) {
+      super.init(args);
+
+      xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+      if (args != null) {
+        xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM, XSLT_CACHE_DEFAULT);
+      }
+      return this;
+    }
+
+    @Override
+    public void load(
+        SolrQueryRequest req,
+        SolrQueryResponse rsp,
+        ContentStream stream,
+        UpdateRequestProcessor processor)
+        throws Exception {
+
+      String tr = req.getParams().get(TR, null);
+      if (tr == null) {
+        super.load(req, rsp, stream, processor); // no XSLT; do standard processing
+        return;
+      }
+
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+        throw new SolrException(
+            SolrException.ErrorCode.UNAUTHORIZED,
+            "The configset for this collection was uploaded without any authentication in place,"
+                + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = TransformerProvider.getTransformer(req, tr, xsltCacheLifetimeSeconds);
+      final DOMResult result = new DOMResult();
+
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always
+      // produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try (var is = stream.getStream()) {
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(ContentStreamBase.getCharsetFromContentType(stream.getContentType()));
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch (TransformerException e) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, e.toString(), e);
+      }
+
+      // second step: feed the intermediate DOM tree into StAX parser:
+      XMLStreamReader parser = null; // does not implement AutoCloseable!
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       It's ok.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778632312


   > Sorry, my father died on Thursday.
   
   Wow; so sorry to hear!  I fear the same for my own Dad; I can't take his existence for granted so I'll be visiting him tomorrow.
   
   > I will review later as some "long term specialist" 😉 on XML processing.
   
   You're not the only one; I'm a long-time XML fan :-)  It seems the new kids know little about such things :-/  "What is this weird syntax... " as he/she goes off to then write a dozen lines that an XPath does in one.
   
   I kept the all-caps XSLT despite my preferences because All of the XML related APIs we use are using all-caps for XML and thus it looked off to see it differently.  But then... the user doesn't know that and sees XML in the config file so...   Hmm.
   
   Maybe the XSLT response writer should demand "tr" and thus be more consistent with the XSLT response writer?  After all, you're choosing to send the response there so why would you send it there if it wasn't? I guess it gives you choice and the ability to be more similar to what we have today.  Shrug.
   
   We need some docs to show how to reference the XSLT update handler, or at the very least an upgrade note about this.
   
   I hate the crappy caching; it's inferior to a SolrCache but alas... I hold myself back from feature-creep.  This code was written forever-ago so I can appreciate it has excuses.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] mikemccand commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
mikemccand commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-779866635


   > Sorry, my father died on Thursday.
   
   So sorry to hear this @uschindler.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] TomMD commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
TomMD commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571474014



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       Oof, yeah @uschindler we feel your pain and are thinking about how to solve the problem.
   
   The bot is a bit like a dog with a bone. Once it finds and issue you can be sure any commit that "moves" the issue in a way that changes the identifier (ex changing the function name) will make it appear again as a "new" issue.  We are thinking on how to make better stable names so the above dismissal would also cover this case.
   
   A new `ignore bug` command is available if you comment with only `ignore bug` then muse will consider the issue resolved for the purpose of the below status bar and we'll use this data in the ML.  At this time I don't think github has an API to automatically resolve the comment but that is on our minds too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-776276696


   Okay, I've done a bunch of tweaking (banging my head?) against the ref guide docs, and they are working, and I think all the tests are passing.
   
   I don't like that the SolrJ tests depend on the sample_techproducts_configs directory, but I think adding some `startup=lazy` settings for the XSLT classes means that the links in the ref guide will work, and the solrj tests don't blow up on the missing xslt classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
muse-dev[bot] commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571294563



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.util.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       *XXE_XMLSTREAMREADER:*  The XML parsing is vulnerable to XML External Entity attacks [(details)](https://find-sec-bugs.github.io/bugs.htm#XXE_XMLSTREAMREADER)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r574273233



##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/xslt/XSLTUpdateRequestHandlerTest.java
##########
@@ -39,11 +38,17 @@
 import org.junit.BeforeClass;
 import org.junit.Test;
 
-public class XsltUpdateRequestHandlerTest extends SolrTestCaseJ4 {
-  
+/**
+ * <p>
+ * This tests the XSLTUpdateRequestHandler ability to work with XSLT stylesheet and xml content.
+ * </p>
+*/
+public class XSLTUpdateRequestHandlerTest extends SolrTestCaseJ4 {

Review comment:
       BTW, I the most common naming convention is for acronyms to use camel case instead of all-caps.  This has been a long trend over many years.  I know why you changed it here -- to match the class it's testing, and so it's a net improvement over what existed.  But arguably it's the class it's testing that should be renamed; not this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775330463


   I have to again look at the code, to figure out the best interaction that is still Performanz and clean to implement. So we don't convert too often between different XML API representations.
   
   I think the converter protected method should take a Source impl and return a new Source impl, in base class it just returns the parameter unmodified.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775366119


   I am testing out the idea of exporting in Solr XML, and then reimporting using a XSLT, which is something metnioned int he docs..  to see how well this works.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-773656218


   I still have a broken test, and need to update docs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775272067


   @epugh I see in your latest change that XMLLoader continues to know about XSLT stylesheets.  Couldn't/shouldn't that go in the XsltRequestHandler that you added?  I suppose you're getting somewhat conflicting feedback between me and @uschindler . My preference is that all XSLT stuff is in this new contrib.  Uwe had suggested some trivial toggle/setting to the existing code.  I think his latest feedback is basically the idea of adding an intercepting spot (in XMLLoader) so that it needn't have the XSLT code there but it could be added in by the contrib, say by subclassing XMLLoader.  @uschindler does this make sense to you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571438249



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/TransformerProvider.java
##########
@@ -14,7 +14,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.solr.util.xslt;

Review comment:
       Makes sense to me.  I'll make this change.   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778648072


   > So, if we aren't changing the name of the Java class in this PR, then I think we just need to update the Upgrade to Solr 9 page, and we're good to go?
   
   +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571368100



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/response/package-info.java
##########
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Support for transforming via XSL stylesheets search responses.

Review comment:
       The package name suggests nothing of XSLT yet we document it as only for XSLT?  Let's just put all XSLT stuff in the same package? (with or without a .response; I don't think that aspect is important)

##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/TransformerProvider.java
##########
@@ -14,7 +14,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.solr.util.xslt;

Review comment:
       Let's ditch this "util" intermediate package; okay?  I can see maybe it had a role to play before, but not in scripting contrib.

##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/handler/XSLTOutputWriterTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.handler;
+
+import org.apache.solr.SolrTestCaseJ4;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Tests the ability to configure multiple query output writers, and select those
+ * at query time.  This is specific to the XSLT writer, which isn't part of the core.
+ * 
+ * See the related unit test OutputWriterTest.
+ *
+ */
+public class XSLTOutputWriterTest extends SolrTestCaseJ4 {

Review comment:
       The formatting in this file is a bit off (indentation varies etc.).  Lucene recently adopted the Google Java Format.  I use an IntelliJ plugin for this.  Can you please reformat this file with that?  Even the built-in formatter is probably fine.

##########
File path: solr/core/src/java/org/apache/solr/handler/UpdateRequestHandler.java
##########
@@ -134,12 +135,23 @@ protected void setAssumeContentType(String ct) {
   }
   private Map<String ,ContentStreamLoader> pathVsLoaders = new HashMap<>();
   protected Map<String,ContentStreamLoader> createDefaultLoaders(@SuppressWarnings({"rawtypes"})NamedList args) {
+
     SolrParams p = null;
     if(args!=null) {
       p = args.toSolrParams();
     }
     Map<String,ContentStreamLoader> registry = new HashMap<>();
-    registry.put("application/xml", new XMLLoader().init(p) );
+
+    // Try to load the scripting contrib modules XSLTLoader, and if not, fall back to the XMLLoader.
+    // Enable the XSLTLoader by including the Scripting contrib module.
+    try {
+      final Class<?> clazz = Class.forName( "org.apache.solr.scripting.util.xslt.XSLTLoader" );

Review comment:
       If the scripting contrib is added as a package, Solr core.jar will not be able to load it in this way, I think.

##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/XSLTParams.java
##########
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.util.xslt;
+
+
+/**
+ * Parameters used across the XSLT related classes.
+ */
+public interface XSLTParams {

Review comment:
       I see why you added this here but I don't think there's a role for XSLTParams.  Just move this "tr" constant to the plugin that defines it for its internal use only.  CommonParams was in SolrJ deliberately; This XSLTParams you added is not (and should not be).

##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/handler/XsltUpdateRequestHandlerTest.java
##########
@@ -69,8 +77,8 @@ public void testUpdate() throws Exception
       "</random>";
 
     Map<String,String> args = new HashMap<>();
-    args.put(CommonParams.TR, "xsl-update-handler-test.xsl");
-      
+    args.put(XSLTParams.TR, "xsl-update-handler-test.xsl");

Review comment:
       In tests, you can just refer to the parameter directly: "tr".  Yonik advised this practice once on the reasoning that the test catches API changes you might not have meant to make.  Also, it's shorter :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-778617013


   Thanks for getting your hands dirty @dsmiley  and making these changes, they LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571471618



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {
+  private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  private static final AtomicBoolean WARNED_ABOUT_INDEX_TIME_BOOSTS = new AtomicBoolean();
+  static final XMLErrorLogger xmllog = new XMLErrorLogger(log);
+  
+  public static final String CONTEXT_TRANSFORMER_KEY = "xsltupdater.transformer";
+
+  private static final String XSLT_CACHE_PARAM = "xsltCacheLifetimeSeconds"; 
+
+  public static final int XSLT_CACHE_DEFAULT = 60;
+  
+  int xsltCacheLifetimeSeconds;
+  XMLInputFactory inputFactory;
+  SAXParserFactory saxFactory;
+
+  @Override
+  public XSLTLoader init(SolrParams args) {
+    // Init StAX parser:
+    inputFactory = XMLInputFactory.newInstance();
+    EmptyEntityResolver.configureXMLInputFactory(inputFactory);
+    inputFactory.setXMLReporter(xmllog);
+    try {
+      // The java 1.6 bundled stax parser (sjsxp) does not currently have a thread-safe
+      // XMLInputFactory, as that implementation tries to cache and reuse the
+      // XMLStreamReader.  Setting the parser-specific "reuse-instance" property to false
+      // prevents this.
+      // All other known open-source stax parsers (and the bea ref impl)
+      // have thread-safe factories.
+      inputFactory.setProperty("reuse-instance", Boolean.FALSE);
+    } catch (IllegalArgumentException ex) {
+      // Other implementations will likely throw this exception since "reuse-instance"
+      // isimplementation specific.
+      log.debug("Unable to set the 'reuse-instance' property for the input chain: {}", inputFactory);
+    }
+    
+    // Init SAX parser (for XSL):
+    saxFactory = SAXParserFactory.newInstance();
+    saxFactory.setNamespaceAware(true); // XSL needs this!
+    EmptyEntityResolver.configureSAXParserFactory(saxFactory);
+    
+    xsltCacheLifetimeSeconds = XSLT_CACHE_DEFAULT;
+    if(args != null) {
+      xsltCacheLifetimeSeconds = args.getInt(XSLT_CACHE_PARAM,XSLT_CACHE_DEFAULT);
+      log.debug("xsltCacheLifetimeSeconds={}", xsltCacheLifetimeSeconds);
+    }
+    return this;
+  }
+
+  @Override
+  public String getDefaultWT() {
+    return "xml";
+  }
+
+  @Override
+  public void load(SolrQueryRequest req, SolrQueryResponse rsp, ContentStream stream, UpdateRequestProcessor processor) throws Exception {
+    final String charset = ContentStreamBase.getCharsetFromContentType(stream.getContentType());
+    
+    InputStream is = null;
+    XMLStreamReader parser = null;
+
+    String tr = req.getParams().get(XSLTParams.TR,null);
+    if(tr!=null) {
+      if (req.getCore().getCoreDescriptor().isConfigSetTrusted() == false) {
+          throw new SolrException(ErrorCode.UNAUTHORIZED, "The configset for this collection was uploaded without any authentication in place,"
+                  + " and this operation is not available for collections with untrusted configsets. To use this feature, re-upload the configset"
+                  + " after enabling authentication and authorization.");
+      }
+
+      final Transformer t = getTransformer(tr,req);
+      final DOMResult result = new DOMResult();
+      
+      // first step: read XML and build DOM using Transformer (this is no overhead, as XSL always produces
+      // an internal result DOM tree, we just access it directly as input for StAX):
+      try {
+        is = stream.getStream();
+        final InputSource isrc = new InputSource(is);
+        isrc.setEncoding(charset);
+        final XMLReader xmlr = saxFactory.newSAXParser().getXMLReader();
+        xmlr.setErrorHandler(xmllog);
+        xmlr.setEntityResolver(EmptyEntityResolver.SAX_INSTANCE);
+        final SAXSource source = new SAXSource(xmlr, isrc);
+        t.transform(source, result);
+      } catch(TransformerException te) {
+        throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, te.getMessage(), te);
+      } finally {
+        IOUtils.closeQuietly(is);
+      }
+      // second step: feed the intermediate DOM tree into StAX parser:
+      try {
+        parser = inputFactory.createXMLStreamReader(new DOMSource(result.getNode()));

Review comment:
       Oh man f*ck up @muse-dev! This is not an issue here because this does not parse XML. It uses a Dom tree!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] epugh commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
epugh commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571439373



##########
File path: solr/contrib/scripting/src/test/org/apache/solr/scripting/handler/XSLTOutputWriterTest.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.handler;
+
+import org.apache.solr.SolrTestCaseJ4;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+/** Tests the ability to configure multiple query output writers, and select those
+ * at query time.  This is specific to the XSLT writer, which isn't part of the core.
+ * 
+ * See the related unit test OutputWriterTest.
+ *
+ */
+public class XSLTOutputWriterTest extends SolrTestCaseJ4 {

Review comment:
       I found this blog post: http://www.practicesofmastery.com/post/eclipse-google-java-style-guide/ (since I use Eclipse)...  Do we have a JIRA about moving towards this?   Should I look to figure out how to add the Google Java Format to the default Eclipse setup?  Or at least a message when you run `./gradlew eclipse` that recommends adding this formatter to your setup?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#discussion_r571376095



##########
File path: solr/contrib/scripting/src/java/org/apache/solr/scripting/util/xslt/XSLTLoader.java
##########
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.scripting.util.xslt;
+
+import javax.xml.parsers.SAXParserFactory;
+import javax.xml.stream.FactoryConfigurationError;
+import javax.xml.stream.XMLInputFactory;
+import javax.xml.stream.XMLStreamConstants;
+import javax.xml.stream.XMLStreamException;
+import javax.xml.stream.XMLStreamReader;
+import javax.xml.transform.Transformer;
+import javax.xml.transform.TransformerException;
+import javax.xml.transform.dom.DOMResult;
+import javax.xml.transform.dom.DOMSource;
+import javax.xml.transform.sax.SAXSource;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+import com.google.common.collect.Lists;
+import org.apache.commons.io.IOUtils;
+import org.apache.solr.common.EmptyEntityResolver;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.ModifiableSolrParams;
+import org.apache.solr.common.params.ShardParams;
+import org.apache.solr.common.params.SolrParams;
+import org.apache.solr.common.params.UpdateParams;
+import org.apache.solr.common.util.ContentStream;
+import org.apache.solr.common.util.ContentStreamBase;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.XMLErrorLogger;
+import org.apache.solr.core.SolrConfig;
+import org.apache.solr.handler.RequestHandlerUtils;
+import org.apache.solr.handler.UpdateRequestHandler;
+import org.apache.solr.handler.loader.ContentStreamLoader;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.update.AddUpdateCommand;
+import org.apache.solr.update.CommitUpdateCommand;
+import org.apache.solr.update.DeleteUpdateCommand;
+import org.apache.solr.update.RollbackUpdateCommand;
+import org.apache.solr.update.processor.UpdateRequestProcessor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.InputSource;
+import org.xml.sax.XMLReader;
+
+import static org.apache.solr.common.params.CommonParams.ID;
+import static org.apache.solr.common.params.CommonParams.NAME;
+
+
+public class XSLTLoader extends ContentStreamLoader {

Review comment:
       Looks like you substantially copied XMLLoader? Sorry; let's find another approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] uschindler commented on pull request #2306: SOLR-15121: Move XSLT (tr param) to scripting contrib

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #2306:
URL: https://github.com/apache/lucene-solr/pull/2306#issuecomment-775326263


   Yes, I tend to think of an interception point.
   
   By default in core the interception point just returns the original DOM tree (Document class). In the subclass it applies the transformation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org